Detailed guides to painful problems, treatments & more

Diagnose, schmiagnose! Physical exam accuracy for back pain is poor (Member Post)

 •  • by Paul Ingraham
Get posts in your inbox:
Weekly nuggets of pain science news and insight, usually 100-300 words, with the occasional longer post. The blog is the “director’s commentary” on the core content of PainScience.com: a library of major articles and books about common painful problems and popular treatments. See the blog archives or updates for the whole site.

“We can put a man on the moon” … but diagnosing specific problems in the back is still too giant a leap! Mostly.

Especially with a physical exam and the so-called “special” orthopedic tests — a hand-crafted, artisanal style of diagnosis, famously more of an art than a science. The science reveals that manual examination is only a little better than pointing at random anatomy while blindfolded — even under ideal conditions, never mind the exasperating pressure cooker of healthcare in the real world.

A big part of the problem is just that looking for “specific” causes of back pain is largely barking up the wrong tree in the first place — many popular “biomechanical bogeyman” are much less common and serious than their reputation suggests. The causes of back pain are often more subtle and complex than the failure of one bit of anatomy.

An extremely simple pen-and-ink illustration of a physical therapist doing a "straight leg raise" test with a patient sitting on a medical exam table.

One of the best known of all manual exam tests: the “straight leg raise.”

But even when specific points of failure do exist in the back — when abnormalities that matter actually are haunting a spine — they are really tough to find. Healthcare professionals have an extremely hard time confirming them, either with a physical exam and/or with any kind of medical test or scan. It’s not hopeless, but confident and specific diagnosis of spinal pain is a bit of medical unicorn — unlike the kind of definitive and specific explanations we have for so many other diseases.

This is a major reason why the idea of “non-specific” spinal pain is a thing: because specific causes are both rare and elusive.

This post is an excerpted chapter of my back pain e-book, and only members and book customers can read it. It’s about 2500 words (10 minutes of reading), and there’s also an audio version (40 minutes). It contains:

  • Manual versus medical testing
  • “Special” orthopedic testing
  • Maybe not so special
  • Featured science example: failing to find herniations manually
  • A diagnostic dictionary
  • Other evidence about the reliability of physical diagnosis
  • The salamander’s science summary
  • You’re not bad at your job if you can’t pinpoint the source of back pain
  • The take-home message … and the take-to-work message

To read it, either buy the book, or become a member to unlock this post (and many others like it)…

 MEMBERS-ONLY AREA

Manual versus medical testing

This chapter zooms in on physical testing: “Can you move like so? Does that hurt?” Medical testing is introduced in the next, and then there are other chapters devoted to specific kinds of medical testing (MRI, nerve blocks) — but there is some overlap, and they are often combined. So let’s compare and contrast…

A competent physical exam for back may be somewhat helpful, and it’s more accessible than medical testing, but also not as good (assuming equal competence). But a lot of physical exams aren’t even competent! In the real world — ordinary people going to see ordinary professionals — manual therapists often vomit up a torrent of overconfident and bogus claims about “exactly” what’s wrong.

In many cases, they aren’t even looking for real problems.

And what about the value of a physical exam from a particularly experienced and well-educated clinician? Is that worthwhile? If you could somehow be sure that’s what you were getting! That’s much better. But I’ll still take it with a grain of salt.

Medical testing isn’t quite as shamefully corruptible by poor clinical reasoning,1 but it is definitely still not great on average. Consider how MRI is notoriously overused and abused! The medical tests also need better-than-average skill and knowledge to interpret wisely.

“Special” orthopedic testing

Most manual therapists aspire to the ability to pick up on subtle clues with advanced palpation skills and diagnostic cleverness and sophistication. (And a great many doctors seem to avoid it like the plague, because, ew, touching!) Hunting specific biomechanical bogeymen is the entire basis of some professions, most obviously chiropractic, where the whole point is to “correct” spinal glitches … which must found first, of course. Similarly, massage therapists cannot “release” trigger points if they cannot point to them.

And physical therapists cannot prescribe corrective exercises (wisely or not) without something in mind that needs correction.

To this end, the body-mechanic business has generated hundreds of “special orthopedic tests” to flush out the origin of pain. It is an unfortunate label that seems to want to be mocked. Many of these tests are named for the authority figures and pioneers who first developed and taught them. A classic example of a back SOT is the SLR, or “straight leg raise” (AKA Lasègue’s Test, although that sounds really old school now). The SLR checks for lumbar nerve root compression by putting the leg in a position (straight! raised!) that pulls on the sciatic nerve — which will probably irritate it, if there’s actually a problem with the nerve root.

A lot of SOTs have this slightly sadistic quality. Does this hurt? And how about THIS?2

Maybe not so special

Many of the special orthopedic tests aren’t particularly “special” in any sense, and some barely even deserve the name “test.” They aren't all bad, of course — some are rather clever and elegantly simple. But many others are just simplistic, blunt instruments that were never a good idea to begin with. Some are based on outdated-yet-still-common ideas about how conditions work and what we should even be looking for. Some are simple to perform, but do require experience and wisdom to choose and interpret — which often fails. Others are actually difficult to do — and with more complexity comes less reliability.

Even when the principle is sound, SOTs are often warped by clinician fallibility and overconfidence. In the messy art of a physical exam, it is extremely easy for a pro to look only where the light is (“streetlight effect,”), or see/feel/hear only what they expect to (confirmation bias).

And so special orthopedic testing has a long, embarrassing history of sucking (as the science will show below).

Clinicians feel blessed by an illusion of knowledge and certainty, when in fact they are effectively guessing at the diagnosis, guided by extremely sketchy data. If only I had a buck for every time I heard something like this…

“It’s my sacroiliac joint. My physiotherapist told me.”

Patients get told this kind of thing a lot. The sacroiliac joint is an especially popular scapegoat, but there are many reasons to doubt that the sacroliac joint is a cause of pain nearly as often as it is diagnosed, beyond the fact that there are just so many possibilities. For instance, consider its legendary toughness:

I talked to a trauma surgeon that had been to a workshop where they talked about the sacrum being “out of place.” He just said this is ridiculous: we see people who are in motor vehicle accidents with every bit of their body smashed but the sacroiliac joint is intact. It is so strong.

Peter O’Sullivan, Professor of Musculoskeletal Physiotherapy at Curtin University, Perth, Australia

So patients are routinely given specific pathoanatomical explanations for their back pain, based on the results of “special” tests, and it all seems quite impressive to the patient. But a great deal of it is probably nonsense and a (misleading) waste of everyone’s time.

Featured science example: failing to find herniations manually

Let’s start with a 2010 paper by van der Windt et al about physical tests for a disc herniation causing low back pain with leg pain, AKA sciatica.3 Not exactly a fresh citation, but acceptable.4

They looked at all the data up to 2010 and found that “diagnostic performance of most physical tests was poor.” A few tests performed “slightly” better (forward flexion, hyper-extension, slump). And the classic straight leg raise? It was quite “sensitive” (good at confirming), but even that was probably partly an illusion.5 And it was also not super “specific” (too many false alarms, AKA overdiagnosing). Combined tests helped some, but testing combos have barely been tested.

When used in isolation, current [2010] evidence indicates poor diagnostic performance of most physical tests used to identify lumbar disc herniation.

Not terrible, not great. And if you can’t consistently confirm herniations — one of the most glaring possibilities! — it doesn’t bode well for manual testing as a whole.

Other evidence about the reliability of physical diagnosis

Here’s an assortment of “meh” news about manual spinal diagnosis from the literature, for both necks and backs:

  • In 2007, King et al infamously refuted the results of a 1988 study by Jull et al that had become the go-to citation for the claim that “manual therapists can diagnose symptomatic joints in the neck by manual examination.” King et al could not replicate that infamous old result, and instead concluded that manual examination of the neck “lacks validity.” Sad trombone.6
  • Also in 2007, the European Spine Journal published a big review of the ability of physical exam and MRI “to identify the disc, SIJ or facet joint as the source of low back pain.” They looked at 40 other studies that tested such tests, concluding that their diagnostic power was “usually small and at best moderate” and their value was “unclear.”7 An updated version in 2023 focussed more on medical tests (see next chapter), which had “better” results … but still not great. And then there’s this: physical tests were mostly left out of the new version!8 I suspect that’s because they are mostly so underwhelming that they didn’t even merit attention.
  • Maigne et al tested two physicians with training in manual therapy to see if they could detect the painful side by feeling for tension in the spinal muscles. In almost two hundred patients, they were correct for only 65% of lower back pain and 59% of neck pain — barely better than chance.9

  • Nezari et al reported that neurological testing had “limited overall diagnostic accuracy in detecting disc herniation,” strongly echoing van der Windt and Hancock. Accuracy was “poor,” and “all tests demonstrated low sensitivity.”10
  • Hanney et al was a small but interesting trial, comparing the results of physical exams of patients with “mechanical neck pain” done two days apart (no treatment between), finding moderate-at-best agreement. Either neck pain is rather volatile and/or the tests just aren’t that trustworthy.11
  • Lemeunier et al did a sprawling series of scientific reviews related to neck pain in 2017, concluding that “little evidence exists to support the use of clinical tests to evaluate the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders.”12 While not uniformly negative,13 that wasn’t a very promising way to sum up.
  • Thoomes et al reviewed the literature on tests for cervical radiculopathy confirmed by diagnostic imaging or surgery.14 Absence of evidence was their main theme: they set the bar high, and so found none of the kind of higher-quality evidence that’s actually needed for this, which “seriously limits the level of evidence” they could report. So they didn’t conclude much except that clusters of tests might be used to increase/decrease the “likelihood” of a diagnosis.15

The salamander’s science summary

There are many ways for a back to hurt that are simply impervious to physical examination in the first place. Some problems just cannot be ruled in or out this way.

For problems that can be diagnosed in principle, the studies make it collectively clear that physical exams and most special orthopedic tests fall well short of being accurate — even under ideal conditions. In the real world, it’s effectively a crapshoot. There are just too many things to look for, too many ways to do it, and too many ways for it to go wrong.

Once in a while, a specific diagnosis based on some clever and skilled testing is going to be correct — but how would we confirm that? Mostly we cannot, unless it is ratified by medical testing and/or a brilliantly successful treatment based on the diagnosis.

And it’s important to know this because so many clinicians act like they are diagnostic savants, like it’s their whole thing to tell you exactly what the problem is. Sadly, it’s standard for patients to walk out of a chiropractic office with a highly specific and wildly overconfident diagnosis.

You’re not bad at your job if you can’t pinpoint the source of back pain

Once upon a time, I went through a phase of worrying that other healthcare professionals must be much smarter than I was, because they seemed so good at identifying exactly where back pain is coming from. Also, they told me they were smarter than I was.

I needn’t have worried. Nor should any professional suffering from the same neurosis.

What my years of experience and study have taught me is that humility and restraint are wiser, and less wrong, than most of these “specific pathanatomical explanations.” From Ben Cormack, having a bit of a rant about pseudoscience and amateurism in back pain diagnosis:

“The worlds best researchers admit they can’t reliably diagnose [back pain], so why can you?”

People do constantly talk like they “know” what causes back pain, or at least have a very strong suspicion, and it’s almost always … er, optimistic. The specific mechanisms of back pain are mostly as impossible to know as what Bilbo had in his pocketses. We cannot generally trust professionals to identify a structural origin for your pain, even if there is one. Which there may well not be.

We need a lot more diagnostic humility.

None of this means that specific causes don’t exist (as discussed back in the chapter “It’s not structure, except when it is: “specific” back pain”). But the evidence strongly hints that the causes you can point to on an anatomy chart are not present as much as anyone thinks … and even when they are, they’re fiendishly hard to confirm … and not as severe and consistently problematic as people assume and fear.

The take-home message … and the take-to-work message

Take-home message for patients: Doubt every specific explanation for your back pain that you hear, especially the ones based only on a physical exam, and especially the more confident ones. Take them with more than just a grain of salt — it needs a brick of it.

Take-to-work message for pros: By all means, do your exams … but please don’t “believe” your own results. Think and talk about them in terms of the odds, and aim low. You’re not generating diagnostic conclusions: you’re generating clues that change the odds, and not as much as you’d like to think!

Notes

  1. Why? Two reasons: more training/experience/skill on average, and more objective data to work with. Physical exam involves very little “hard data,” and it’s just easier for practitioners to get fanciful and irrational with it.
  2. I’m kidding, of course. Most clinicians are careful and respectful with how they perform “provocative” diagnostic tests. Not all. But most. And some tests are just inherently unpleasant (I’m looking at you, Nerve Conduction Testing).
  3. van der Windt DA, Simons E, Riphagen II, et al. Physical examination for lumbar radiculopathy due to disc herniation in patients with low-back pain. Cochrane Database Syst Rev. 2010 Feb;(2):CD007431. PubMed 20166095 ❐

    Windt et al. extracted data from 16 studies comparing the accuracy of physical examination for lumbar disc herniation to more definitive tests (MRI scans or surgical findings).

    The accuracy of common physical tests was not great, and there are good reasons to suspect that more and better data probably would have produced even worse results.

    Most tests weren’t good detecting disc herniations when used on their own. Many tests—like checking for muscle weakness or reflex changes—had poor diagnostic accuracy. A few did slightly better (such as the slump test and certain bending movements), but there weren’t enough studies to draw strong conclusions about that.

    One classic test seemed more promising at first glance: the “straight leg raise” test showed high sensitivity, meaning it was good at catching cases of disc herniation. But that’s when using it on surgical patients who already had a high likelihood of having a herniated disc. And its specificity — how well it rules out a herniation in people without one — was all over the map.

    The crossed SLR test (where the pain shows up in the opposite leg) did a little better on specificity, but still wasn’t reliable on its own.

    The review suggests that combining multiple tests might improve accuracy, but the data on that is thin: test combinations have not be tested.

    According to this data, relying on these tests alone would miss a lot of cases, and falsely diagnose herniations in many others.

    A notable weakness of the review is that only one study focused on people in a primary care setting…but more data like that would likely have resulted in even more negative results!

  4. I’d like to cite something newer, but there hasn’t been a lot of research on this lately — perhaps because the verdict was already quite clear by the early 2000s. So this will have to do for now. But not all old citations are obsolete!
  5. They were testing the reliability of SLR on the people most likely to have herniations to find, patients who had already been defined as candidates for surgery for disc herniation. Guess what happens when you look for evidence of a herniation in the people most likely to have them? You find more herniations! And you get fewer false positives. So it’s shooting fish in a barrel. And that makes the test look better than it actually is, an illusion of sensitivity — and a good example of a research artifact, and also of the distinction between reliability and validity. (Reliability is how good a test is at producing the same results in the same conditions; validity is whether it means what we think, or even matters. Shooting fish in a barrel doesn't make someone good at fishing.)
  6. King W, Lau P, Lees R, Bogduk N. The validity of manual examination in assessing patients with neck pain. Spine Journal. 2007;7(1):22–26. PubMed 17197328 ❐

    “Manual therapists believe that they can diagnose symptomatic joints in the neck by manual examination,” King et al. wrote, and they had been supporting that belief with citations to a very small study Jull et al since 1988. Almost 20 years later, this study was an attempt to finally replicate those positive results. It’s worth noting that one of the authors of the original paper, Nikolai Bogduk, is also one of the authors here.

    Instead of just 20 subjects, King et al. compared physiotherapist diagnoses of 173 people to more reliable testing done by “blocking” — anesthetizing — the joint.

    The physical therapists seemed to be able to identify facet joints that really were hurting (good sensitivity) … but that was a hollow victory given the high prevalence of trouble at C2–C3 and C5–C6. In other words, they were merely confirming the likeliest diagnosis, like shooting fish in a barrel. “Under these conditions,” King et al. write, “the real measure of validity lies in the specificity of the test.”

    But the specificity really sucked: they were terrible at ruling out facet joints that were actually just fine, thank you very much. In other words, they were finding facet flaws where there were none. In one word: overdiagnosing! Predictably, I would add.

    And so the King et al. concluded that manual examination for facet joint pain “lacks validity.” That is, the results don’t mean what manual therapists think they means — despite the appearance of some decent sensitivity. Ouch.

    The present study has answered the call by Jull et al. for further validation studies, but its results were negative. This outcome leaves manual examination without a sound scientific basis, and calls into question much of what is done in manual medicine and manual therapy.

  7. Hancock MJ, Maher CG, Latimer J, et al. Systematic review of tests to identify the disc, SIJ or facet joint as the source of low back pain. Eur Spine J. 2007 Oct;16(10):1539–50. PubMed 17566796 ❐ PainSci Bibliography 55023 ❐

    This is an older review of the ability of professionals to suss out the specific causes of spinal pain using a variety of diagnostic tests, mostly physical exam stuff. The Hancock et al. review of about 40 studies that tested such tests. (An updated version of this review was published in 2023, see Han, but it mostly ignored the physical exam tests, so this older paper is still distinct and useful.)

    Could the pros pinpoint where back pain is coming from? Sometimes, but mostly not:

    • “Centralisation was the only clinical feature found to increase the likelihood of the disc as the source of pain.” (“Centralization” is the tendency of pain distribution to shrink in response to specific repeated movements or sustained postures.)
    • None of the tests for facet joint pain were found to be informative.” And that result really made me cringe, because I “tested” for facet joint involvement frequently during my clinical career. That was a couple minutes of my patients’ time and money wasted every time I did it (to say nothing of the misleading results). And the evidence that such testing is largely futile had already existed for years even back then, but had not trickled down to me through my texts, instructors, or continuing education. Yet another great example of how important it is for clinicians to keep up with their journal reading.
    • “Single manual tests of the sacroiliac joint were uninformative,” but perhaps slightly more helpful in combination. Unfortunately, an effective combo is much harder to confirm, and much less likely to be used in practice.
    • Conclusions about MRI were too all over the map to make much of. An absence of degeneration was “the only test found to reduce the likelihood of the disc as the source of pain.”

    So there was some good news, but overall the diagnostic power of the tests was “usually small and at best moderate” and their value was “unclear.” This theme of mediocre reliability and dubious validity will be continued in similar reviews like van der Windt and Nezari.

    (See more detailed commentary on this paper.)

  8. The only informative physical tests were: (1) a positive centralisation phenomenon (to identify discogenic pain), and (2) the absence of midline low back pain and a combination of sacroiliac joint pain provocation tests (to identify sacroiliac joint pain).
  9. Maigne JY, Cornelis P, Chatellier G. Lower back pain and neck pain: is it possible to identify the painful side by palpation only? Ann Phys Rehabil Med. 2012 Mar;55(2):103–11. PubMed 22341057 ❐ PainSci Bibliography 54321 ❐

    The results are obviously underwhelming. Although they did a little better than just guessing, the results suggest that it’s difficult even for expert examiners to detect the epicentre of neck and back pain by feel. As well, they were only attempting to detect the side of pain — not exactly precise! A low bar to clear. Imagine how much worse their performance would have been if they had to identify the location more specifically, or if the pain could have been anywhere or nowhere. So they barely passed the easiest possible test, and probably would have failed a harder one and done no better than guessing.

    An obvious weakness of the study is that only two examiners were tested. More experienced examiners might have yielded different results. But one would still hope for better than this from anyone with any training and experience at all.

  10. Al Nezari NH, Schneiders AG, Hendrick PA. Neurological examination of the peripheral nervous system to diagnose lumbar spinal disc herniation with suspected radiculopathy: a systematic review and meta-analysis. Spine J. 2013 Jun;13(6):657–74. PubMed 23499340 ❐

    Researchers sifted through six medical databases to find 14 studies of how well neurological exams detect disc herniations. They all compared exam results to “gold standard” diagnostic methods: imaging techniques like MRIs and CT scans, as well as findings from actual spinal surgeries.

    Across all 14 studies, the standard tests for sensation, muscle weakness, and reflexes all showed low sensitivity — meaning they often missed cases of disc herniation. Sensory testing, for example, caught only about up to 40% of otherwise confirmed herniations. Muscle strength tests fared similarly. Reflex testing was even worse, indicating herniation only 29% of the time at best.

    On the bright side, these tests were moderately good at ruling out disc herniation when the results were normal — but still not reliably so.

    The researchers suggest several reasons for this: there’s no universal definition of what counts as a disc herniation, the tests themselves vary in reliability, and the condition itself is complicated. In other words, while these exams might be part of the diagnostic toolkit, they should not be taken too seriously.

  11. Hanney WJ, George SZ, Kolber MJ, et al. Inter-rater reliability of select physical examination procedures in patients with neck pain. Physiother Theory Pract. 2014 Jul;30(5):345–52. PubMed 24377665 ❐
  12. Lemeunier N, da Silva-Oolup S, Chow N, et al. Reliability and validity of clinical tests to assess the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders: Part 1-A systematic review from the Cervical Assessment and Diagnosis Research Evaluation (CADRE) Collaboration. Eur Spine J. 2017 Sep;26(9):2225–2241. PubMed 28608175 ❐
  13. They also reported “preliminary” (in 2017!) evidence to “support the use of the extension-rotation test, neurological examination, Spurling’s and the upper limb neurodynamic tests.” In other words, they found some thin evidence that those tests might have some value.
  14. Thoomes EJ, van Geest S, van der Windt DA, et al. Value of physical tests in diagnosing cervical radiculopathy: a systematic review. Spine J. 2018 Jan;18(1):179–189. PubMed 28838857 ❐
  15. Based on the tests with the most promising sensitivity and specificity, Thoomes et al. speculate that “a combination of a positive Spurling’s test, axial traction test, and Arm Squeeze test may be used to increase the likelihood of a cervical radiculopathy, whereas a negative outcome of combined upper limb neural tension tests and Arm Squeeze test may be used to decrease the likelihood.” But the evidence of higher sensitivity might be misleading.

↑ MEMBERS-ONLY AREA ↑

PainSci Member Login » Submit your email to unlock member content. If you can’t remember/access your registration email, please contact me. ~ Paul Ingraham, PainSci Publisher