January 13, 2003

One Hundred Interesting Mathematical Calculations, Number 9

**One Hundred Interesting Mathematical Calculations, Number 9: False Positives **

Suppose that we have a test for a disease that is 98% accurate: if one has the disease, the test comes back "yes" 98% of the time (and "no" 2% of the time), and if one does not have the disease, the test comes back "no" 98% of the time (and "yes" 2% of the time). Suppose further that 0.5% of people--one out of every two hundred--actually has the disease.

Your test comes back "yes." How worried should you be? How likely is it that you have the disease?

Suppose just for ease of calculation that we have a population of 10000, of whom 50--one in every two hundred--have the disease. On average, the fifty who have cancer will contribute 49 "yes" tests and one "no" test. On average, the 9950 who do not have cancer will contribute 9751 "no" tests and 199 "yes" tests.

If you test "no" you can be very happy indeed: there is only one chance in 9752 that you are the unlucky guy who had the disease and yet tested negative.

If you test "yes" you are less happy. But there are 248 "yes" tests, and only 49 of those people have the disease. The chances that you are disease-free are 80.24 percent.

This is the so-called *false positive* problem: it shows itself wherever you have an imperfect signal of an unlikely event, and it leads to situations in which most of your positive signals are* false positives*: fake signals, not real indicators of the problem or the event at all.

From John Allen Paulos's *Innumeracy*.

Email this entry

This Bayesian false positive calculation has frightening real world implications.

First, lots of the inexpensive screening tests for diseases such as HIV are set to trigger at sensitive levels to avoid the very high cost of false negatives. So the rate of false negatives is very low, but at the cost of a fairly high rate of false positives.

The high rate of false positives is often combated by rerunning the initial screening test using the same sample, and in the case of HIV an additional screening test (called the Western Blot) is performed. But. Note that the tests are NOT independent, because they're run on the same test sample. So the errors are serially correlated and some of the benefit from multiple tests is lost. (Some experts think that some kind of sample contamination may be at work, but nobody really knows why the serial correlation is so high).

SO, if one goes in to a standard lab for an HIV test and it comes back positive (which means, Positive once for ELISA, then Positive again on the second ELISA run AND Positive on the Western Blot), what are the real world odds of a true positive?

Now this is an emotional subject, and the accuracy of the ELISA in combination with the Western Blot is debated by experts. Based on my sample of Google literature, I'm going with a 99% specificity rate - so there's one false positive per 100 tests. Now there are 281 million people in the United States, give or take, and the CDC believes there have been about 816,000 cumulative cases of HIV - or an incidence rate of 0.29%.

So if you tested all 281 million people (and for simplicity assume a false negative rate of zero), you'd get 816,000 true positives and about 2.8 million false positives. So if you have a screening HIV test that comes back positive, without any information about lifestyle risk, the odds you actually have HIV is only about 23% (816,000 / 3,617,840).

PS: Both the ELISA and the Western Blot test for antibodies to the HIV virus, and do not test for the virus itself. Following a couple of positive antibody tests, the next round of testing is usually a DNA probe test for the virus itself in the blood as well as a separate culture test to grow the virus in vitro. This is very expensive, which is why its not done sooner.

Posted by: Anarchus on January 13, 2003 06:52 PM

Anyone hoping to use this reasoning in a live medical situation should note that it *only* applies to a case in which your taking the test in the first place is utterly independent of having the disease.

Posted by: dsquared on January 15, 2003 04:47 AMPost a comment