(Updated 12/2014)
[mwm-aal-display]
Introduction
This is a technical discussion of how a test interacts with the prevalence of the disease in question. This interaction determines the value (“positive predictive value”) of the test result: how is a doctor supposed to interpret the test result? How likely is that interpretation to be wrong?
This page is also a historical record, which shows how I got very interested in this topic. The letters below are an exchange from about 2002 or so. I’m still actively pursuing this issue in 2014.
In this 2014 rewrite, I’ve cut the original page into sections with the more important stuff up here at the top, the original table and the main point.
Using a 2 x 2 table to examine diagnostic accuracy
The main point here, before I lose you: the number of people with the illness in the group being diagnosed has a huge impact on the accuracy of a test like the MDQ. More so than the MDQ itself.
That is demonstrated in the following examples. What changes here is the number of people with the illness before the MDQ is applied. The MDQ characteristics (sensitivity, specificity) are held constant. Look at what happens to predictive values (positive and negative, respectively, in the right hand column) when the prevalence of the problem goes from low to high in Scenario A and then B.
Scenario A: 200 true positives |
sensitivity: 0.73; specificity 0.90 |
|||
gold standard positive | gold standard negative | predictive value | ||
test positive | 146 | 80 | 0.65 | |
test negative | 54 | 720 | 0.93 | |
sum | 200 | 800 | ||
Scenario B: 800 true positives |
||||
gold standard positive | gold standard negative | predictive value | ||
test positive | 584 | 20 | 0.97 | |
test negative | 216 | 180 | 0.45 | |
sum | 800 | 200 |
Response to Dr. Goldberg
The discussion below was my response to Dr. Ivan Goldberg who raised a concern about the way I explained interpreting MDQ test results. His letter prompting this reply follows below.
Hello Ivan
Well, as you can tell by this delay I’ve been mulling your recommendation to add a comment about the sensitivity problem of the MDQ for quite a while. I think I’ve finally figured out where I was getting stuck.
You comment that “Unfortunately, the main problem with that test is not the sensitivity, but rather the large number of false positives resulting from the test’s low specificity. ” But, as you know, there is the issue of the “prior probability” which critically determines the “predictive value” of a particular result.
So, if I’m remembering how to calculate these correctly, and I taught it for a while so I sure hope that I am (remember a/c+d versus a/a+b ?), here’s what happens depending on the prior probabilities. Please correct me if I’ve got these wrong by your calculations.
Suppose you have 1000 readers, and only 200 of them “have” bipolar disorder by some gold standard. Using the 0.73 and 0.9 figures for sensitivity and specificity, respectively, the predictive value of a positive result is only 0.65, pretty miserable. That’s eighty false positives out of the 800 people who don’t really have bipolar disorder. The predictive value of a negative test result, in this scenario, looks pretty good: 0.93, though this includes fifty-four false negatives).
However, compare the outcome if the population of 1000 readers included 800 people who really did have bipolar disorder by the gold standard. The predictive value of a positive test in this group is 0.97, with 20 false positives out of the 1000 people tested. Not great, but then comes the whole argument about how much you’re going to hurt those people by being wrong relative to how much you could hurt the false negatives with an antidepressant, etc. (The negative predictive value, for comparison, suffers in this population: 0.45, with 216 false negatives).
So, mightn’t we conclude that the specificity being lower relative to the sensitivity is really a problem only if the population taking it has a low prior probability; and that in fact the sensitivity remains a problem even at 0.73 when the prior probability is high, producing a frightfully large number of false negatives, given the potential consequences of such a result?
Dr. Ivan’s letter
Hi . . .
I am a psychiatrist who often educates peers and patients about the sensitivity and specificity (S&S) of medical tests. I think it is important that you included an discussion of S&S in relation to the MDQ.
Unfortunately, the main problem with that test is not the sensitivity, but rather the large number of false positives resulting from the test’s low specificity. I hope you will consider adding some information about the false positive problem.
Best regards . . .
Ivan