Friday, 19 August 2011

When statistics becomes interesting - correlated events

In chapter 3 Nelson introduces some very basic statistical ideas: addition rule, multiplication rule, mean, standard deviation (and variance). However he missed the joys of conditional and joint probabilities. Perhaps they are not as pertinent to biophysics but in my opinion worth mentioning none the less!

The joint probability P(A,B) or “the probability of A and B occurring” is simply the product of P(A) and P(B) IF they are not correlated. If they are correlated [the interesting case!], then P(A,B) = P(A|B)P(B) = P(B|A)P(A) where P(A|B) is the conditional probability of A given B. That is, the probability of A occurring given that we’ve observed B [note that if A and B are independent then by definition P(A|B) = P(B) and we get that P(A,B)=P(B)P(A) ]. Unwittingly this gives us Bayes’ theorem: P(A|B)P(B) = P(B|A)P(A) => P(A|B) = P(B|A) P(A)/P(B).

The significance of this identity is very hotly debated by so called “Bayesians” and “frequentists”. [Interesting how readily people form factions and feverishly denounce the ideas and legitimacy of other equally zealous factions.] At the heart of the controversy is a subjective vs. objective interpretation of probability and whether or not one can reasonably choose values for the prior probabilities [that is, the quantity P(A)/P(B)]. It’s a fascinating debate!

To illustrate Bayes’ theorem I offer the following challenge question1:

You go to see the doctor about an in-growing toenail. The doctor selects you at random to have a blood test for swine flu, which for the purposes of this exercise we will say is currently suspected to affect 1 in 10,000 people in Australia. The test is 99% accurate, in the sense that the probability of a false positive is 1%. The probability of a false negative is zero. You test positive. What is the new probability that you have swine flu?

Hint: Another statistical identity is that P(A) = SUM[over all B] P(A|B)P(B) i.e. P(A) = P(A|B)P(B) + P(A| not B)P(not B)

----------------
1
Question was taken from MATH3104 lecture notes by Prof Geoff Goodhill


No comments:

Post a Comment