Spectacularly unreliable MRI “results”

PainSci » bibliography » Herzog et al 2016
updated Apr 14, 2025

Tags: back pain, sciatica, counter-intuitive, bad news, mri, radiculopathy, leg, study_prospective, study_cohort, study_observational, pain problems, spine, anatomy, spine pain, buttocks, hip, herniation, intervertebral disc, neck pain, head/neck, limbs, neuropathy, pain, etiology, pro, vibe, imaging, diagnosis, medical tests, medicine

Nine pages on PainSci cite Herzog 2016: 1. When to Worry About Low Back Pain 2. The Complete Guide to Low Back Pain 3. The Complete Guide to Neck Pain & Cricks 4. The Mind Game in Low Back Pain 5. MRI and X-Ray Often Worse than Useless for Back Pain 6. Is Diagnosis for Pain Problems Reliable? 7. Digital Motion X-Ray: A Dangerous Illusion of Diagnostic Power 8. A Rational Guide to Fibromyalgia 9. Spectacularly unreliable MRI “results”

PainSci commentary on Herzog 2016: ?This page is one of thousands in the PainScience.com bibliography. It is not a general article: it is focused on a single scientific paper, and it may provide only just enough context for the summary to make sense. Links to other papers and more general information are provided wherever possible.

People mostly assume that MRI is a reliable technology, but if you send the same patient to get ten different MRIs, interpreted by ten different radiologists from different facilities, apparently you get ten markedly different explanations for her symptoms. A 63-year-old volunteer with sciatica allowed herself to be scanned again and again and again for science. The radiologists — who did not know they were being tested — cooked up forty-nine distinct “findings.” Sixteen were unique; not one was found in all ten reports, and only one was found in nine of the ten. On average, each radiologist made about a dozen errors, seeing one or two things that weren’t there and missing about ten things that were. That’s a lot of errors, and not a lot of reliability. The authors clearly believe that some MRI providers are better than others, and that’s probably true, but we also need to ask the question: is any MRI reliable?

(See also my more informal description of this study, which includes an amazing personal example of an imaging error.)

~ Paul Ingraham

original abstract †Abstracts here may not perfectly match originals, for a variety of technical and practical reasons. Some abstacts are truncated for my purposes here, if they are particularly long-winded and unhelpful. I occasionally add clarifying notes. And I make some minor corrections.

BACKGROUND CONTEXT: In today’s health-care climate, magnetic resonance imaging (MRI) is often perceived as a commodity-a service where there are no meaningful differences in quality and thus an area in which patients can be advised to select a provider based on price and convenience alone. If this prevailing view is correct, then a patient should expect to receive the same radiological diagnosis regardless of which imaging center he or she visits, or which radiologist reviews the examination. Based on their extensive clinical experience, the authors believe that this assumption is not correct and that it can negatively impact patient care, outcomes, and costs.

PURPOSE: This study is designed to test the authors’ hypothesis that radiologists’ reports from multiple imaging centers performing a lumbar MRI examination on the same patient over a short period of time will have (1) marked variability in interpretive findings and (2) a broad range of interpretive errors.

STUDY DESIGN: This is a prospective observational study comparing the interpretive findings reported for one patient scanned at 10 different MRI centers over a period of 3 weeks to each other and to reference MRI examinations performed immediately preceding and following the 10 MRI examinations.

PATIENT SAMPLE: The sample is a 63-year-old woman with a history of low back pain and right L5 radicular symptoms.

OUTCOME MEASURES: Variability was quantified using percent agreement rates and Fleiss kappa statistic. Interpretive errors were quantified using true-positive counts, false-positive counts, false-negative counts, true-positive rate (sensitivity), and false-negative rate (miss rate).

METHODS: Interpretive findings from 10 study MRI examinations were tabulated and compared for variability and errors. Two of the authors, both subspecialist spine radiologists from different institutions, independently reviewed the reference examinations and then came to a final diagnosis by consensus. Errors of interpretation in the study examinations were considered present if a finding present or not present in the study examination’s report was not present in the reference examinations.

RESULTS: Across all 10 study examinations, there were 49 distinct findings reported related to the presence of a distinct pathology at a specific motion segment. Zero interpretive findings were reported in all 10 study examinations and only one finding was reported in nine out of 10 study examinations. Of the interpretive findings, 32.7% appeared only once across all 10 of the study examinations’ reports. A global Fleiss kappa statistic, computed across all reported findings, was 0.20±0.06, indicating poor overall agreement on interpretive findings. The average interpretive error count in the study examinations was 12.5±3.2 (both false-positives and false-negatives). The average false-negative count per examination was 10.9±2.9 out of 25 and the average false-positive count was 1.6±0.9, which correspond to an average true-positive rate (sensitivity) of 56.4%±11.7 and miss rate of 43.6%±11.7.

CONCLUSIONS: This study found marked variability in the reported interpretive findings and a high prevalence of interpretive errors in radiologists’ reports of an MRI examination of the lumbar spine performed on the same patient at 10 different MRI centers over a short time period. As a result, the authors conclude that where a patient obtains his or her MRI examination and which radiologist interprets the examination may have a direct impact on radiological diagnosis, subsequent choice of treatment, and clinical outcome.

original abstract †Abstracts here may not perfectly match originals, for a variety of technical and practical reasons. Some abstacts are truncated for my purposes here, if they are particularly long-winded and unhelpful. I occasionally add clarifying notes. And I make some minor corrections.

related content