In today’s blog, Junglemom will not be discussing baby neck cheese, elitist diapers, or the rampant pineapple thievery going on in her neighborhood. If you’ve come looking for cute anecdotes about flying cockroaches, my neighbor’s psychotic dogs, or the way my twin baby girls talk like pirates with emphysema, you’ve come on the wrong day my friend. Today I tackle a topic sorely neglected in the world of mommy blogs. I am speaking of course, of Bayesian statistics.
Better put the baby down now so you don’t drop her when I systematically deconstruct everything you thought you knew about probability. Put the formula in the fridge, put the diaper in the genie, grab your free trade coffee and prepare for me to tear your universe in two.
Here comes example 1: (I think this is from Scientific American, but who knows, I’m summarizing what could be a number of sources, that’s the beauty of blogs) : A doctor tests a patient for a somewhat rare disease that you might expect to find in 1% of the population. The test is advertised as being 99% accurate. The patient tests positive and freaks out, but does she actually have the disease?
Now I’m laying down some equations, yo.
|(Bayes Theorem)||P(D|+) =||P(+|D)P(D)
This ain’t arithmetic. P means the probability of whatever is in the bracket after it, and “+” doesn’t mean addition. On the left side is the question we’re asking, “what is the probability that the person actually has the disease (D) given that there is a positive (+) test result?” The right side of the equation combines all of our prior knowledge and beliefs (yes beliefs, in science, I kid you not) about the patient, the disease, and the test.
Bayes weighs the probability that a person with the disease will test positive P(+|D), and that this is such a person P(D), against the probability that any randomly selected person might test positive for any reason whatsoever P(+). Like eating a poppyseed bagel.
Say the medical literature shows disease D affects 1% of the general populace, so P(D)=0.01. The manufacturer of the test claims it is 99% accurate (prego test, anyone?), so P(+|D)=0.99, meaning that if a trembling knocked up teenager takes the test, there is a 99% chance that the test will reveal the “little miracle.” So far we have 0.99 X 0.01 = 0.0099 (0.99%) and things are looking bleak for our prom queen.
But if we look at the seemingly innocuous P(+) “the likelihood that a randomly selected person will test positive” the odds change dramatically. Ask yourself, “how many circumstances can lead to a positive test result”, and how likely is each to occur?
Okay, now I’m going to go back to saying we’re testing for a disease rather than a baby because it would be really hard to carry a baby all the way through this (I should know, I delivered 7wks early).
First we can imagine the disease is present and the test registers it: P(+|D) = 0.99 (99%) which as before we multiply by the random incidence of the disease 0.99*0.01=0.0099
But, what about those pesky “false positives” that seemingly plague Dr House M.D. and his team each Tuesday night on Fox? How likely are they really?
Well, false positive means literally: (+|notD), i.e. the test is positive and disease D is not present, which we can write as P(+|N). The way probabilities work is they always add up to 100%. So if the incidence of the disease is 1% (P(D)=0.01) then it follows that a random person on the street is 99% certain to NOT have the disease (P(N)=0.99).
If the test really is 99% accurate it also follows that P(+|N)=1%. If the test is right 99 times out of 100, it must be wrong once. Hence, the total probability of a false positive is P(+|N)P(N) = 0.01 * 0.99 = 0.0099
Finally plugging the number back into Bayes equation, we have 0.0099/(0.0099+0.0099) = 0.5
Yes thats right. 50% Fifty Percent.
In a nutshell, the likelihood that a person with this disease got a positive result, and that this patient is such a person, must be weighed against all the ways a positive test could occur whether or not this patient has the disease.
Doesn’t that blow your mind?
I know what you’re asking yourself, you’re saying, “Junglemom, what are the implications of this? What does it all mean? Are you saying I can just ignore test results? That the DNA on O.J.’s knife was meaningless?” NO! What it means is, if you go to some dubious website and fork over 3 Gs for a battery of tests for rare diseases, the results will be largely meaningless. If this is true, then why do doctors, by and large being rational beings with very expensive educations, rely so heavily on lab test results? You smartypants peeps out there may have noticed that the result of Bayes theorim hinges on the phrase “…likelihood……that our patient is such a person.” Tests performed in Doctors offices have a far higher probability of being correct purely because they are being performed in a doctors office. How come? Well, when we think of the reliability of tests, we always think in terms of their ability to detect a disease, but as we see in the example, they are useless for this purpose in isolation, and this is the key to the paradox. The moment we walk into the doctor’s office and start describing our complaints and symptoms, the doctor’s mind starts consciencely and unconsciencely inferring what the symptoms might point to. By the time time the doctor thinks ‘hey, let’s biopsy his brain” the patient is no longer random, he is already a fishy in a much smaller pond in which incidence of blue brain disease is no longer 1%, its much higher, and how much higher depends only on the doctor’s diagnosic skill. If you run Bayes Theorem with the new numbers its curtains for old Blue Brain. In fact, although the patient and doctor would rarely see it this way, the test result is just another symptom. Bare this in mind when reading the claims printed on take home tests. The manufactor tried the tests out on people who already had a reason to take the test. Back to the prego tests. Home pregnancy tests usually claim 99% accuracy. If 100 men took the test, I don’t know how many sticks would turn blue, but I know none would be pregnant.
Statistics are a dirty minefield. This is why most gamblers always lose. With the ponies, there is always a bookie’s favorite, maybe’s she’s 10 to 1 on. She’s 10 to 1 on based on her recent performances, that’s the top of the equation. What about the bottom half? That’s in the form guide and the weather forecast. How does she do on certain tracks, in certain conditions, how hungry was she for her breakfast this morning, etc. The best gamblers are not the ones with a system, not the ones who always go with the favorite and keep doubling up. Betting on the bookie’s odds isn’t much better than blindly guessing. The best gamblers are like doctors. They improve their odds because they use experience to weigh all of the factors involved.
Conclusion : The Baysian paradigm enables mathematics to mimic the human mind, so don’t waste yours by betting on the ponies. Listen to your doctor.