Correlation-causation clarifications
[Update: this post has now been integrated into the bizarre, deep, dorky article The Power of Barking: Correlation, causation, and how we decide what treatments work.]
I got some interesting gripes when I posted last week that “correlation kinda does imply causation.” One Facebook commenter said he’d only ever heard that correlation doesn’t “equal” causation; another skeptic thought it was “not a very helpful explanation”; and Dr. David Colquhoun tweeted at me: “Hmm dangerous”; and of course several smartypants were quick to remind me of the many amusing examples of spurious correlations that can (and have) been mined from data.
All of this points to an inescapable conclusion: I probably screwed something up. But not this bit …
Is so “imply”!
The common wording was not what I screwed up. The standard phrase does indeed employ “imply.” Although “equal” does get used occasionally, “imply” is the more common usage for sure. Also, try the Google search autocomplete results for “correlation does not ____.”
If at first you don’t emphasize the right thing …
I should have made this super clear on the first try, so allow me to overcompensate today:
The human knack for inferring causation is fantastically unreliable and our failures in this department are legion and disastrous. By far the most important thing anyone needs to understand about the relationship between correlation and causation is that A did not necessarily cause B just because B followed A, and making this mistake is one of the Greatest Hits of human thinking glitches.
This problem has been emphasized ad nauseam by so many smart people for so long that I personally just kind of take it for granted, and so I wrote my post last week without bothering to make it clear enough. As it probably should be every time correlation is discussed, because, as Barker Bausell put it (Snake Oil Science), we have a problem with “confusion between correlation and cause on an industrial scale.”
So what was I on about?
It was just some intellectual musing on my part. My griping about “imply” was not original. I was paraphrasing Edward Tufte, an American statistician who made the same point quite a while ago. So I’m in good company. Tufte suggested that a good informal re-wording would be, “Correlation is not causation but it sure is a hint.” I just wanted to make that same point, and I should have cited him, but I was in a hurry (penny wise and pound stupid, because now this is all taking me three times as long as if I’d just done it right in the first place).
I was mostly keen on the curious mental phenomenon of causality inference. It’s fascinating how aggressively the human mind infers causality from adjacent events … and how often we get it right about simple things. Exactly how much we get it right depends totally on the context and domain. We get causality right constantly when the variables are simple and readily observable; we get rarely get it right in health care, or any other complex endeavour, where the variables counts are high and many are subjective or otherwise murky.
One of many examples of correlations that absolutely exist … but definitely do not mean a damn thing. At least, I’m pretty sure cheese eating doesn’t cause fatal bedsheet tangling. Thanks to TylerVigen.com for this graph and many others like it.
The difference between general and specific causality inference
I also wrote about this last week because I wanted to separate two things that are often mixed up: the inference of causality and the attribution of mechanism. General versus specific causes, basically. We can and routinely do correctly detect causes when correlation gives us a strong enough hint, but we routinely screw up exactly what caused what.
For example:
Most people will assume that when a very stubborn old pain goes away during a one-hour acupuncture session that the experience must have caused the relief, because the relief followed the experience. And that assumption is probably correct. The appearance of relief probably isn’t a coincidence, probably not just regression to the mean (too quick).
But then most people will then (carelessly or self-servingly) move on to another assumption: that the treatment caused the relief because acupuncture works as advertised. (It doesn’t.)
We can be right about the causality in a wide view — somehow or other, that appointment really did lead to feeling better, so yay — but still be hopeless wrong about what specifically caused what. Most people will ignore the possibility that the true mechanism of relief was not the efficacy of acupuncture, but the efficacy of a caring professional promising aid and performing fascinating rituals that reek of implied potency: the power of “surely no one would do this if it didn’t work!” These factors are wildly underestimated by most acupuncture patients. And acupuncturists.
Enough said, I hope
Causality inference is a potent defining feature of human intelligence. It serves us well in many situations. Our ability to suss out how things work is largely based on this “one weird trick” that our brains can do. Flick the switch, light turns on: probably causally related! Touch fire, get burned … throw rock, break window … eat too much, get sick. There are countless simple correlations like this that we master effortlessly before we can even tie our shoes. We see A follow B and we just kinda get it that A caused B, just like humans somehow understand pointing, but most dogs will just lick your finger.
But we also constantly get it wrong, unfortunately.