Comments on “Statistics and the replication crisis”


A team led by Brian Nosek

A team led by Brian Nosek repeated a hundred well-known psychology experiments, and found a statistically significant result in less than half. (“Estimating the reproducibility of psychological science,” Science, 28 August 2015, 349:6251.)

This sounds somewhat misleading; “a hundred well-known experiments” implies that they were particularly famous or important studies, but the selection criteria was “articles published in 2008 in three important psychology journals, for which we could find replication teams with relevant interests and expertise”. Most of the studies in question had been cited less than a hundred times; one had as few as 6 citations. ( )

In fairness to psychology, it might also be worth noting that at least one project to specifically investigate famous and well-known findings ended up replicating all of them (… ).

Over 50% of claims made on the internet are false

Sian's picture

“There are no right or wrong answers (so long as you stay within the formal domain of applicability) “…”Science can’t be reduced to any fixed method, nor evaluated by any fixed criterion.”

Colour me confused, but don’t you make lots of rather strong claims that most/many (> 50%) of research claims in (some/many/most?) sciences are false, with a right or wrong answer based on a fixed method and criterion?

“In large-scale replication efforts, the false positive rate typically comes out greater than 50%.”

Well, that is not really “true”. There are a growing number of large scale replication studies, with different headline numbers attached to them. One for example, as a previous commenter notes, show 100% success in replication. Though convenient for science-trashing polemical purposes, there is no “typically”, because it is not meaningful to say typical because the studies are quite different with varying goals. And sadly we can’t calculate what the real false positive rate is (as compared to Ioannidis’s implausible’s and fairly well discredited models), because aside from some reasonably rare exceptions (e.g. ESP) we don’t normally know whether an underlying effect is “true” or not. I think what you really mean to say is that when a study is replicated (say, done many years later with a different sample and non-exact methods) where the replication was significant and the original was, then the original finding was “false”. This view is not really tenable for a number of reasons. One is that the methodological issue arises that many studies (i.e. in some of those from replication projects) are not actually falsifiable, as the original studies are so vague and under powered that they can’t be tested and verified (see “Why most of psychology is statistically unfalsifiable” by Lakens and Morey). So this would be a case of “not even false”, rather than false. But these days it is common to refer to replications being consistent or inconsistent with a previous finding (though even that can be tricky).

Now obviously this is still a significant problem for some scientific disciplines, and doesn’t detract massively part of the argument you are making but I find the framing interesting.

So you suggest the deep cause of the replication crises in many fields is incentives, which seems about right. But in terms of solutions we have a problem, because “form” or “certainty” is a powerful incentive, no matter how comfortable you are with uncertainty. We can see this in your post here (and probably in my response too), where you make stronger conclusions than are warranted, presumably due to some incentives to put forward that view.

I am reminded of someone I get into arguments with (as much as I try to avoid them). They have some deeply held convictions on the state of the world, but when I offer my own view they play the nebulosity card, and I am told “well, we don’t know really know that for sure, who really knows”. This can be quite annoying.


Thanks; I’ve changed that footnote to say “In several large-scale replication efforts, the false positive rate was found to be greater than 50%.”

Could you point me to papers discrediting Ioannidis’ methodology? I don’t know about this.

Plausibility isn't as sexy as bold

Sian's picture

I have may have over egged “discredited” (because it is still widely cited, though when it comes to contrarian Covid claims, quite applicable:…) but will stand by implausible.

Jager and Leek (2014) put their estimate at 14% (based on evidence) for biomedical research. There is from a special issue devoted to this, and I think the summary of that is >50% is implausible (and 14% optimistic). But importantly Ioannidis is a medical researcher, and the model was based on a hypothetical genomic study, which didn’t take into account the corrections for multiple comparisons which actually takes place in this research field leading to a “unrealistic straw man” (Samsa, 2013)

Critical to the Ioannidis example was the prior probability of a hypothesis being true. As noted in the comment above, this is unknown (and not subject to cross field generalisations), but I suspect that the number Ioannidis used was geared to get a > 50% result, so that he could get a sexy “most” paper title (and it sure has earned him citations).

This prior can’t apply to “most” fields as a general rule - Ioannidis uses an atheoretical example with a very low discovery rate, which likely isn’t very reflective of most scientific endeavour, such as of social psychology (Stroebe, 2016). The prior probability will vary a lot across fields. Based on an empirically derived estimate, Schimmack puts a false discovery risk for some fields of psychology between 17.6% and 8.6% (see:… ).
Of course, even if it is 20% instead of > 50%, that is still pretty bad. But it isn’t quite as good as a headline.

Add new comment


This page is in the section Part One: Taking rationalism seriously,
      which is in In the Cells of the Eggplant.

The next page in this section is Acting on the truth.

The previous page is The probability of green cheese.

General explanation: Meaningness is a hypertext book. Start with an appetizer, or the table of contents. Its “metablog” includes additional essays that are not part of the book.

To hear about new content, Subscribe by email subscribe to my email newsletter, Follow Meaningness on Twitter follow me on Twitter, use the Syndicate content RSS feed, or see the list of recent pages.

Click on terms with dotted underlining to read a definition.

The book is a work in progress; pages marked ⚒︎ are under construction.