## A Question of Distributions and Entropies

I thought I’d use the medium of this blog to pick the brains of my readers about some general questions I have about probability and entropy as described on the chalkboard above in order to help me with my homework.

Imagine that *p _{x}(x)* and

*p*are one-point probability density functions and

_{y}(y)*p*is a two-point (joint) probability density function defined so that its marginal distributions are

_{xy}(x,y)*p*and

_{x}(x)*p*and shown on the left-hand side of the board. These functions are all non-negative definite and integrate to unity as shown.

_{y}(y)Note that, unless x and y are independent, in which case *p _{xy}(x,y)* =

*p*, the joint probability cannot be determined from the marginals alone.

_{x}(x) p_{y}(y)On the right we have *S _{x}*,

*S*and

_{y}*S*defined by integrating

_{xy}*plogp*for the two univariate distributions and the bivariate distributions respectively as shown on the right-hand side of the board. These would be proportional to the

*Gibbs entropy*of the distributions concerned but that isn’t directly relevant.

My question is: what can be said in general terms (i.e. without making any further assumptions about the distributions involved) about the relationship between *S _{x}*,

*S*and

_{y}*S*?

_{xy}Answers ~~on a postcard~~ through the comments block please!

November 28, 2022 at 3:02 pm

I suspect that mutual information may be what you are looking for. https://en.wikipedia.org/wiki/Mutual_information#Relation_to_conditional_and_joint_entropy

November 28, 2022 at 3:14 pm

Thanks! That’s not quite the answer but is very helpful.

November 28, 2022 at 3:10 pm

This is not an answer to your question, but a comment on this statement: “Note that, unless x and y are independent, in which case $p_{xy}(x,y) = p_x(x) p_y(y)$, the joint probability cannot be determined from the marginals alone.”

There is Sklar’s theorem:

https://mathworld.wolfram.com/SklarsTheorem.html

https://en.wikipedia.org/wiki/Copula_(probability_theory)#Sklar's_theorem

Unfortunately, its proof is not constructive.

November 28, 2022 at 4:17 pm

If you consider the marginal distributions to be given and you assign the joint distribution by maximising its information entropy, given the marginals as constraints, then the result is just the product of the marginals (ie, uncorrelated). The information entropy of this joint distribution proves to be the sum of the information entropies of the marginal distributions. As this is a maximum entropy distribution, the information entropy of any other distribution having the same marginals is less, ie

S_{xy} =< S_x + S_y

This assumes that the maximum is global, not just local. I'm sure that this is true and can be made rigorous either by citing some convenient mathematical inequality or using convexity properties of entropy.

November 28, 2022 at 4:41 pm

Yes you can prove it globally using

x ln x > x – 1

This is done in, for instance, the book version of The Many Worlds Interpretation of Quantum Mechanics, in a longer manuscript by Hugh Everett.

November 28, 2022 at 4:47 pm

Ah yes, that’s an interesting one.

November 28, 2022 at 4:43 pm

The Wikipedia page on the joint entropy has a list of inequalities and properties of the joint entropy (including the inequality $S_{xy} \leq S_x + S_y$):

https://en.wikipedia.org/wiki/Joint_entropy

The corresponding Wikipedia page in German has links to several presentations and lecture notes with proofs.

November 28, 2022 at 4:48 pm

That inequality is particularly interesting.

November 30, 2022 at 11:45 am

It seems to me that any value \leq S_x + S_y is possible