The Decline Effect, the Structure of Science, and ESP? How should sociologists think about replicability?
An intriguing article in the December 13th issue of The New Yorker by Jonah Lehrer explained the “decline effect” and presented some hypotheses for its existence; here I provide a summary of the article and discuss its relevance for the issue of scientific replicability in sociology. The article set the blog world abuzz; Google returns 947 cases of “decline effect” from November 30th to December 16th this year, while the same range last year yields 158 cases. The “decline effect” first referred to a 1930s study of extrasensory perception where a student who at first seemed to have ESP appeared to lose his ability; it has become applied in general to the tendency for the effect size of scientific studies to get smaller—or disappear— as scientists try to replicate earlier findings.
Lehrer discusses the decline effect in the case of verbal overshadowing in cognitive psychology (where individuals who are asked to describe something verbally seemed to remember it less well in the future), fluctuating asymmetry in ecology (where females of different species seemed to use external asymmetry as an indicator of genetic quality; more symmetrical males reproduced more often), and in medical clinical trials (where, for example, antipsychotic drugs seemed less effective than initially thought). He catalogs (both explicitly and less explicitly) a number of possible explanations for the effect: a) Time, where the power of an effect “loses truth” as time proceeds, b) Individual psychology, where loss of belief in an effect alters results (this seems to be the case with the ESP example), and c) Regression to the mean, where initially extreme cases are revealed to be outliers as the sample size increases or as the study is repeated. He is, however, particularly interested in three other explanations, which, working in concert, seem to best explain the effect:
- Publication bias, where positive results are far more likely than negative results to be published, something that has been widely observed across fields, and which has prompted repeated calls for a journal to publish null results,
- Selective reporting of results, where scientists select what data to initially document and they may obtain biased results through the unconscious influence of their prior beliefs and expectations, and
- Randomness, where many scientific results are the product of noise, or “a byproduct of invisible variables we don’t understand” (p. 57).
Lehrer closes with the observation that, beyond the reminder that scientists are biased, and that the structure of scientific fields facilitate fads (Kuhn’s paradigms), the decline effect serves as a reminder of how difficult it is to obtain definitive scientific proof.
The article raises plenty of interesting questions (e.g. to what extent are the effects of publication bias and the selective reporting of results amenable to organizational and institutional changes? Does ESP exist?!), including how much it upset many science bloggers (or, more likely, blogging scientists). There is much to say about the institutional features of scientific disciplines and the peer-review process that facilitates these types of outcomes, but I will focus instead on the meaning of replicability in sociology.
That I had not heard of the decline effect may be the byproduct of a much-bemoaned, little addressed feature of sociology: few attempts at replication coupled with such diversity in the cases used to establish effects that even the possibility of comparison across cases seems wildly bold. There are two issues here: first, attempts to replicate results with the same or similar data are extremely rare (there have, of course, been calls for replication in quantitative sociology—notably, the November 2007 issue of Sociological Methods & Research—mostly calling for sharing data and standards for analysis). Second, replicating effects across domains (and reporting the success or failure) is afforded little attention. Notably, the Blackwell Encyclopedia of Sociology entry on replication devotes most of its space to methods developed for testing the internal replicability of datasets, while noting the importance, and paucity, of external replicability. Sociology may be particularly susceptible to the whims of randomness—a leading culprit in the decline effect according to Lehrer— given how far removed most of us are from labs where dozens of factors can be controlled. More interestingly, beyond reproducing the same results from the same datasets, what would reproducing social facts, canonical findings and effects, look like across domains of inquiry? It seems worth considering whether sociologists should focus on replication in the way that those running mice through mazes do, or whether there is another way to think about the replicability of our results.