Interrater reliability: the kappa statistic
original abstract †Abstracts here may not perfectly match originals, for a variety of technical and practical reasons. Some abstacts are truncated for my purposes here, if they are particularly long-winded and unhelpful. I occasionally add clarifying notes. And I make some minor corrections.
The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
This page is part of the PainScience BIBLIOGRAPHY, which contains plain language summaries of thousands of scientific papers & others sources. It’s like a highly specialized blog. A few highlights:
- Inciting events associated with lumbar disc herniation. Suri 2010 Spine J.
- Prediction of an extruded fragment in lumbar disc patients from clinical presentations. Pople 1994 Spine (Phila Pa 1976).
- Characteristics of patients with low back and leg pain seeking treatment in primary care: baseline results from the ATLAS cohort study. Konstantinou 2015 BMC Musculoskelet Disord.
- Effectiveness and cost-effectiveness of universal school-based mindfulness training compared with normal school provision in reducing risk of mental health problems and promoting well-being in adolescence: the MYRIAD cluster randomised controlled trial. Kuyken 2022 Evid Based Ment Health.
- Is there a relationship between throbbing pain and arterial pulsations? Mirza 2012 J Neurosci.