Why those positive studies aren’t good enough
And what kind of evidence is good enough? Even if the effect size is small?
2025 will probably deliver the biggest-ever harvest of not-so-promising studies about what works for pain. If you’re going to wrap your head around that research — if that’s a skill you want, as a person in pain, or a professional trying to help — then you must understand why so much science isn’t fit to line a birdcage. Many people either don’t get this, or badly underestimate it, or swing to the opposite extreme and apply far too much cynicism, refusing to take any science seriously … which is definitely throwing the baby out with the bathwater.
There might be a few things worth learning even from weaker studies showing only small benefits. Small isn’t always unimportant, but you’ve really got to know what to look for.
PubMed muggles can always naively point to a few studies that seem positive. “Laser therapy works!” And then they read what I have to say about it, and they see that I am not impressed, and so they ask: “Why aren’t these studies good enough for you, Mr. Smartypants? PubMed says lasers work!”
Well, Mr. Stupidpants, don’t believe everything you read — no, not even on PubMed. I know it looks all science-y, but there are countless crappy little studies declaring eureka! or at least promising! You can find false positives for anything in alternative medicine, no matter how silly:
- applied kinesiology: “great clinical feasibility, showed good accuracy in diagnosing sacroiliac joint dysfunction”
- iridology: “Fifteen of those [reviews] are in favor of the method”
- ozone therapy: “The study confirms the efficacy of ozone concerning pain relief, functional improvement, and quality of life in patients with knee osteoarthritis.”
- emotional freedom technique: “Reductions in stress (p < .001), anxiety (p < .001), and burnout (p < .001) reached high levels of statistical significance for the intervention group.”
- magnet therapy: “demonstrated analgesic, anti-inflammatory, and structure-modifying effects of this type of physiotherapeutic treatment”
“Proof” that these things work, all just a quick PubMed search away! But most of them are actually bogus.
Someone, somewhere on social media recently said that PubMed has bad citations to support “anything.” Not quite anything. But a great deal! And probably every single bad idea in alternative medicine.
There have of course always been bogus citations and junky studies — there are many ways to lie about studies, and many ways for studies to lie. But this problem is a much worse than it used to be, thanks to the explosive growth of low quality, extremely partisan, pay-to-play, and/or outright fraudulent scientific publications over the last 20 years, journals that churn out “positive” studies of anything and everything.
In 2002, a few small positive trials actually were promising. Today they are a bad sign, a pattern strongly associated with worse-than-useless evidence, just muddying the waters with data that is nearly guaranteed to be p-hacked into garbage, and so biased it’s effectively just propaganda wearing a lab coat.
So what’s enough evidence to be persuasive?
I want at least three decent trials with clearly positive results. Double that requirement if the hypothesis is implausible or self-serving, with at least one replication from relatively unbiased researchers.
None of this constitutes “proof,” of course! That bar is much higher still. This is just the level of evidence I need to raise my eyebrows and say, “Okay, that might work.”
Oh, and it can’t just have “statistically significant” positive results — it has to have clinically significant positive results, an adequate “effect size,” big enough that we should care.
But it doesn’t have to be huge to matter. (“That’s what she said,” ba-dum-tss.)
“Dubious Study” xkcd #1847 © xkcd.com by Randall Munroe
Give small a chance
I’ve been too critical of small effect sizes over the years. This is a bit of a mea culpa.
We use microscopes to look for small things, and we (mainly) use clinical trials to look for small and specific effects. The more obvious the the benefit of a treatment, the less we need to test it.1 We only need a randomized controlled trial when the apparent effect size is so small that it might be a fluke, and so we have to confirm that it’s genuine, rather than illusion or delusion.
Small benefits may still matter for many reasons:
- A small effect may be worthwhile if it’s cheap, easy, and safe to achieve. We never just want to know about the juice — we also want to know if it’s worth the squeeze.
- It might be the average of a wide range of outcomes: some people will get less, some more — sometimes predictably. There’s a huge difference between “always just 5%” and “5% on average but sometimes 0% and sometimes 30%, depending on your genes.”
- It’s just one of several effects, maybe part of an unmeasured process or whole that’s greater than the sum of its parts — which defines good therapy. A small effect on pain may come with more meaningful benefits, like less suffering and more function (pain self-efficacy).
- Small effects can matter much more to people who need them the most. Dropping from 9 to 8 on the pain scale may seem like a much bigger deal than 3 to 2.
- The mechanism might be important in principle, the tip of an iceberg of potential. Proving that a drug works at all could suggest ways to make it work better, and send researchers back to their labs with new ideas. Adjustments to the intervention, and/or exactly what is measured, could reveal a more substantial effect.
- Not all worthwhile benefits can be felt. No one can feel 30% less likely to get injured, or that they will recover from a tendinitis 20% faster.
- The truth is inherently valuable. If homeopathy actually worked, even just a bit, I’d want to know how it could possibly work at all!
And yet clinical significance is still significant, and so a tiny benefit is still mostly unimpressive — assuming it’s even real, which it probably isn’t. Most good news from weak trials of pain treatments are as wrong as the 1989 cold fusion claim — just with less media attention.
Notes
(Or note, rather.)
-
Scientific tests are mostly designed to identify or confirm relatively subtle relief that we can’t be sure we can feel. If a headache nostrum made your headache vanish in thirty minutes, every time, you could do a carefully controlled trial to confirm it … but it would be about as surprising as a test of the moistening power of showers.
The shower analogy is mine, but the classic satirical example is parachutes: we don’t need an RCT to know they work (see Smith et al).
But that blasted parachute thing has been thoroughly abused. It was cooked up originally to denigrate EBM absolutism, but then widely exploited to just sneer at an EBM straw man, and now it’s just used by quacks and cranks to argue that what they believe is so obvious that it doesn’t need to be tested any more than parachutes do. For instance, a doctor friend of mine “once had an iodine quack (‘Harvard trained!’) proudly tell me there had never been an RCT of parachutes when I asked for evidence that ‘everybody needs more iodine.’”