## A Question of Distributions and Entropies

I thought I’d use the medium of this blog to pick the brains of my readers about some general questions I have about probability and entropy as described on the chalkboard above in order to help me with my homework.

Imagine that px(x) and py(y) are one-point probability density functions and pxy(x,y) is a two-point (joint) probability density function defined so that its marginal distributions are px(x) and py(y) and shown on the left-hand side of the board. These functions are all non-negative definite and integrate to unity as shown.

Note that, unless x and y are independent, in which case pxy(x,y) = px(x) py(y), the joint probability cannot be determined from the marginals alone.

On the right we have Sx, Sy and Sxy defined by integrating plogp for the two univariate distributions and the bivariate distributions respectively as shown on the right-hand side of the board. These would be proportional to the Gibbs entropy of the distributions concerned but that isn’t directly relevant.

My question is: what can be said in general terms (i.e. without making any further assumptions about the distributions involved) about the relationship between Sx, Sy and Sxy ?

### 9 Responses to “A Question of Distributions and Entropies”

1. David Malone Says:

I suspect that mutual information may be what you are looking for. https://en.wikipedia.org/wiki/Mutual_information#Relation_to_conditional_and_joint_entropy

2. This is not an answer to your question, but a comment on this statement: “Note that, unless x and y are independent, in which case $p_{xy}(x,y) = p_x(x) p_y(y)$, the joint probability cannot be determined from the marginals alone.”

Unfortunately, its proof is not constructive.

3. Anton Garrett Says:

If you consider the marginal distributions to be given and you assign the joint distribution by maximising its information entropy, given the marginals as constraints, then the result is just the product of the marginals (ie, uncorrelated). The information entropy of this joint distribution proves to be the sum of the information entropies of the marginal distributions. As this is a maximum entropy distribution, the information entropy of any other distribution having the same marginals is less, ie

S_{xy} =< S_x + S_y

This assumes that the maximum is global, not just local. I'm sure that this is true and can be made rigorous either by citing some convenient mathematical inequality or using convexity properties of entropy.

• Anton Garrett Says:

Yes you can prove it globally using

x ln x > x – 1

This is done in, for instance, the book version of The Many Worlds Interpretation of Quantum Mechanics, in a longer manuscript by Hugh Everett.

4. The Wikipedia page on the joint entropy has a list of inequalities and properties of the joint entropy (including the inequality $S_{xy} \leq S_x + S_y$):
https://en.wikipedia.org/wiki/Joint_entropy