Thursday, 8 September 2011

Mutual Information

Chapter 6 of Nelson introduces Shannon’s formula as a method for determining the information entropy contained within a string of words; an analogous problem to quantifying the entropy of a physical system. I was therefore not surprised to learn that Claude E. Shannon drew on much of the work of Boltzmann and Gibbs when devising his information theory. However, instead of focusing on this aspect of Shannon’s work (i.e. the connection with thermodynamics) I thought I’d give a brief overview of another of his legacies, namely, mutual information. What connection could this possibly have to biophysics I hear you ask? Well it is of critical importance to understanding the encoding, decoding, sending and receiving of information in the brain; a problem being tackled by biophysicists around the globe. (Not necessarily on a neurophysiology level but at a “we’re not scared of the maths” level.)

Shannon proposed that the key features of any information transmission system are:








where the noise source is modeled as an external rather than internal entity, but it is effectively the same thing.

As Nelson points out, the information entropy H for a distribution P(r) is a measure of the amount of information one can expect to gain on average from a single sample of that distribution. Suppose we have a distribution of Quantitatively this takes the form:




Lets take r to represent the received signal at the destination. This description of information only holds if there is perfect transmission, but all transmission introduces noise which in turn causes a loss of information so something is missing. One method of quantifying the noise entropy is to calculate how much of the response entropy comes from the variability in the response to the same signal, averaged across all signals. The entropy of responses to a given signal is given by:




and now averaged over all stimuli:





The mutual information is then given by I = H – H_noise which can be shown to be:






This result has a very nice property, note that if the signal and response are independent then P(r,s) = P(r)P(s) and since log(1) = 0 thus I = 0; which is what we would expect.

It should be stressed that there are many nuances to “optimal” information processing in our brains as different types of signals have different priorities. For example, while walking through a jungle at night, mistaking a cluster of leaves for a tiger is much safer than mistaking a tiger for a cluster of leaves. Bias and redundancy are as much a feature of our information processing system as optimizing mutual information.

2 comments:

  1. Is there a particular paper you read on this? It sounds interesting.

    ReplyDelete
  2. Many of the basic ideals were published by Shannon in the 40's but I haven't been diligent enough to seek them out. What's presented here mostly came from a text book called Theoretical Neuroscience by Dayan and Abbott via lecture notes from MATH3104. If you're really keen it's also covered in most electrical engineering text books.

    ReplyDelete