He’s published it lots of times, often as part of a paper that uses it. I’ll scan and email you the clearest version I have on my shelves. Also, Ed Jaynes wrote about the continuum version of Shannon entropy being

– \int p(x) \log [ p(x) / m(x) ]

where m(x) is the measure on x-space, and derived it as the discrete version of

\sum_i p_i \log (p_i / m_i)

where m_i is a degeneracy factor which becomes a density of states in the continuum limit. When you understand it like that, you see that m is not something that can be tampered with in the way a probability distribution can: it is fixed by factors relating to the space on which probabilities are defined, and therefore more fundamental that the factors that determine probabilities.

]]>Do you have a reference to John Skilling’s proof?

P. S. Must get him here for a talk…

]]>\sum_i p_i \log (p_i / q_i)

and the Jensen-Shannon version symmetrises this in the simplest way by interchanging the p’s and q’s and taking half the sum of original and interchanged expressions.

John Skilling has shown using some basic criteria of consistency that any expression quantifying the information in a probability distribution, which is to be used in an optimising process, *must* be of Kullback-Leibler form. The asymmetry in this form is not something to be squeamish of – you have to decide what is fundamental and what you are going to optimise. The authors should have done that, rather than seek an ad hoc symmetrical expression.

]]>