THC Science

Ioannidis (2005), "Why Most Published Research Findings Are False".^[1]

The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which it has been found that the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,^[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

The replication crisis is frequently discussed in relation to psychology and medicine, where considerable efforts have been undertaken to re-investigate classic results, to determine both their reliability and, if found unreliable, the reasons for the failure.^[3]^[4] Data strongly indicate that other natural, and social sciences are affected as well.^[5]

The phrase replication crisis was coined in the early 2010s^[6] as part of a growing awareness of the problem. Considerations around causes and remedies have given rise to a new scientific discipline called metascience,^[7] which uses methods of empirical research to examine empirical research practice.

Since empirical research involves both obtaining and analyzing data, considerations about its reproducibility fall into two categories. The validation of the analysis and interpretation of the data obtained in a study runs under the term reproducibility in the narrow sense. The task of repeating the experiment or observational study to obtain new, independent data with the goal of reaching the same or similar conclusions as an original study is called replication.

Background[edit]

Replication has been referred to as "the cornerstone of science".^[8]^[9] Environmental health scientist Stefan Schmidt began a 2009 review with this description of replication:

Replication is one of the central issues in any empirical science. To confirm results or hypotheses by a repetition procedure is at the basis of any scientific conception. A replication experiment to demonstrate that the same findings can be obtained in any other place by any other researcher is conceived as an operationalization of objectivity. It is the proof that the experiment reflects knowledge that can be separated from the specific circumstances (such as time, place, or persons) under which it was gained.^[10]

However, there is limited consensus on how to define replication and potentially related concepts.^[11]^[12]^[10] A number of types of replication have been identified:

Direct or exact replication, where an experimental procedure is repeated as closely as possible.^[10]^[13]
Systematic replication, where an experimental procedure is largely repeated, with some intentional changes.^[13]
Conceptual replication, where a finding or hypothesis is tested using a different procedure.^[10]^[13] Conceptual replication allows testing for generalizability and veracity of a result or hypothesis.^[13]

Reproducibility can also be distinguished from replication, as referring to reproducing the same results using the same dataset. Reproducibility of this type is why many researchers make their data available to others for testing.^[14]

The replication crisis does not necessarily mean these fields are unscientific.^[15]^[16]^[17] Rather this process is part of the scientific process in which old ideas or those that cannot withstand careful scrutiny are pruned,^[18]^[19] although this pruning process is not always effective.^[20]^[21]

A hypothesis is generally considered to be supported when the results match the predicted pattern and that pattern of results is found to be statistically significant. Results are generally considered significant when statistical testing determines that there is a 5% (or less) probability that the measured effects are inconsequential.^[a] This is depicted as p < 0.05, where p (typically called the p-value) is the probability level. This should result in 5% of hypotheses that are supported being false positives (an incorrect hypothesis being erroneously found correct), assuming the studies meet all of the statistical assumptions. Some fields use smaller p-values, such as p < 0.01 (1% chance of a false positive) or p < 0.001 (0.1% chance of a false positive). However, a smaller chance of a false positive often requires greater sample sizes or a greater chance of a false negative (a correct hypothesis being erroneously found incorrect). Although P-value testing is the most commonly used method, it is not the only method.

Prevalence[edit]

In psychology[edit]

Despite issues with replicability being pervasive across scientific fields, several factors have combined to put psychology at the center of the conversation.^[22] Some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.^[23] Much of the focus has been on the area of social psychology,^[24] although other areas of psychology such as clinical psychology,^[25]^[26] developmental psychology,^[27] and educational research have also been implicated.^[28]^[29]

In August 2015, the first open empirical study of reproducibility in psychology was published, called The Reproducibility Project. Coordinated by psychologist Brian Nosek, researchers redid 100 studies in psychological science from three high-ranking psychology journals (Journal of Personality and Social Psychology, Journal of Experimental Psychology: Learning, Memory, and Cognition, and Psychological Science). 97 of the original studies had significant effects, but out those 97, only 36% of the replications yielded significant findings (p value below 0.05).^[11] The mean effect size in the replications was approximately half the magnitude of the effects reported in the original studies. The same paper examined the reproducibility rates and effect sizes by journal and discipline. Study replication rates were 23% for the Journal of Personality and Social Psychology, 48% for Journal of Experimental Psychology: Learning, Memory, and Cognition, and 38% for Psychological Science. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%).^[30]

A study published in 2018 in Nature Human Behaviour replicated 21 social and behavioral science papers from Nature and Science, finding that only about 62% could successfully reproduce original results.^[31]^[32]

Similarly, in a study conducted under the auspices of the Center for Open Science, a team of 186 researchers from 60 different laboratories (representing 36 different nationalities from 6 different continents) conducted replications of 28 classic and contemporary findings in psychology.^[33]^[34] The focus of the study was not only on whether or not the findings from the original papers replicated but also on the extent to which findings varied as a function of variations in samples and contexts. Overall, 50% of the 28 findings failed to replicate despite massive sample sizes. However, if a finding replicated, then it replicated in most samples. If a finding was not replicated, then it failed to replicate with little variation across samples and contexts. This evidence is inconsistent with a proposed explanation that failures to replicate in psychology are likely due to changes in the sample between the original and replication study.^[34]

Early analysis of result-blind peer review, which is less affected by publication bias, has estimated that 61 percent of result-blind studies have led to null results, in contrast to an estimated 5 to 20 percent in earlier research.^[35]

In medicine[edit]

Results from The Reproducibility Project: Cancer Biology suggest most studies of the cancer research sector may not be replicable

Out of 49 medical studies from 1990 to 2003 with more than 1000 citations, 92% found that the studied therapies were effective. Out of these studies, 16% were contradicted by subsequent studies, 16% had found stronger effects than did subsequent studies, 44% were replicated, and 24% remained largely unchallenged.^[36] A 2011 analysis by researchers with pharmaceutical company Bayer found that, at most, a quarter of Bayer's in-house findings replicated the original results.^[37] However, the analysis of Bayer's results found that the results which did replicate could often be successfully used for clinical applications.^[38]

In cancer research[edit]

In a paper published in 2012, C. Glenn Begley, a biotech consultant working at Amgen, and Lee Ellis, a medical researcher at the University of Texas, found that only 11% of 53 pre-clinical cancer studies had replications that could confirm conclusions from the original studies.^[39]

In late 2021, The Reproducibility Project: Cancer Biology examined 53 top papers about cancer published between 2010 and 2012 and showed that among studies that provided sufficient information to be redone, the effect sizes were 85% smaller on average than the original findings.^[40]^[41] A survey on cancer researchers found that half of them had been unable to reproduce a published result.^[42]

In economics[edit]

Economics has lagged behind other social sciences and psychology in its attempts to assess replication rates and increase the number of studies which attempt replication.^[12] A 2016 study in the journal Science replicated 18 experimental studies published in two top-tier economics journals (The American Economic Review and the Quarterly Journal of Economics) between 2011 and 2014. It found that about 39% failed to reproduce the original results.^[43]^[44]^[45] About 20% of studies published in The American Economic Review are contradicted by other studies despite relying on the same or similar datasets.^[46] A study of empirical findings in the Strategic Management Journal found that about 30% of 27 retested articles showed statistically insignificant results for previously significant findings, whereas about 4 percent showed statistically significant results for previously insignificant findings.^[47]

In phenotype association studies[edit]

Results of a study published in 2022 suggest that many earlier brain–phenotype studies ("brain-wide association studies" (BWAS)) produced invalid conclusions as reproducibility of such studies requires samples from thousands of individuals due to small effect sizes.^[48]^[49]

In water resource management[edit]

A 2019 study in Scientific Data estimated with 95% confidence that out of 1,989 articles on water resources and management published in 2017, study results might be reproduced for only 0.6% to 6.8% of all articles, even if each of these articles were to provide sufficient information that allowed for replication.^[50]

Across fields[edit]

A 2016 survey by Nature on 1,576 researchers who took a brief online questionnaire on reproducibility found that more than 70% of researchers have tried and failed to reproduce another scientist's experiment results (including 87% of chemists, 77% of biologists, 69% of physicists and engineers, 67% of medical researchers, 64% of earth and environmental scientists, and 62% of all others), and more than half have failed to reproduce their own experiments. However, less than 20% had been contacted by another researcher unable to reproduce their work. The survey found that less than 31% of researchers believe that failure to reproduce results means that the original result is probably wrong, although 52% do agree that a significant replication crisis exists. Most researchers said that they still trust the published literature.^[5]^[51]

Causes[edit]

The replication crisis may be triggered by the "generation of new data and scientific publications at an unprecedented rate" that leads to the "desperation to publish or perish" and a failure to adhere to good scientific practice.^[52]

Historical and sociological roots[edit]

Predictions of an impending crisis in the quality control mechanism of science can be traced back several decades. Derek de Solla Price—considered the father of scientometrics, the quantitative study of science—predicted that science could reach "senility" as a result of its own exponential growth.^[53] Some present day literature seems to vindicate this "overflow" prophecy, lamenting the decay in both attention and quality.^[54]^[55]

Historian Philip Mirowski offers another reading of the crisis in his 2011 book Science-Mart: Privatizing American Science. In the title, the word Mart is used by Mirowski as a metaphor for the commodification of science. In Mirowski's analysis, the quality of science collapses when it becomes a commodity being traded in a market. Mirowski argues his case by tracing the decay of science to the decision of major corporations to close their in-house laboratories. They outsourced their work to universities in an effort to reduce costs and increase profits. The corporations subsequently moved their research away from universities to an even cheaper option – Contract Research Organizations.^[56]

Social systems theory, as expounded in the work of German sociologist Niklas Luhmann, inspires a similar diagnosis. This theory holds that each system, such as economy, science, religion or media, communicates using its own code: true and false for science, profit and loss for the economy, news and no-news for the media, and so on.^[57]^[58] According to some sociologists, science's mediatization,^[59] its commodification ^[56] and its politicization,^[59]^[60] as a result of the structural coupling among systems, have led to a confusion of the original system codes. If science's code of true and false is substituted with those of the other systems, such as profit and loss or news and no-news, science enters into an internal crisis.^[61]

Economist Noah Smith suggests that a factor in the crisis has been the overvaluing of research in academia and undervaluing of teaching ability, especially in fields with few major recent discoveries.^[62]

Publish or perish culture in academia[edit]

Philosopher and historian of science Jerome R. Ravetz predicted in his 1971 book Scientific Knowledge and Its Social Problems that science—in its progression from "little" science composed of isolated communities of researchers, to "big" science or "techno-science"—would suffer major problems in its internal system of quality control. Ravetz recognized that the incentive structure for modern scientists could become dysfunctional, now known as the present publish-or-perish challenge, creating perverse incentives to publish any findings, however dubious. According to Ravetz, quality in science is maintained only when there is a community of scholars, linked by a set of shared norms and standards, who are willing and able to hold each other accountable.

Philosopher Brian D. Earp and psychologist Jim A. C. Everett argue that, although replication is in the best interests of academics and researchers as a group, features of academic psychological culture discourage replication by individual researchers. They argue that performing replications can be time-consuming, and will take away resources from projects that reflect the researcher's original thinking. They are harder to publish, largely because they are unoriginal, and even when they can be published they are unlikely to be viewed as major contributions to the field. Ultimately, replications "bring less recognition and reward, including grant money, to their authors.^[63]

A major cause of low reproducibility is the publication bias stemming from the fact that statistically insignificant results and seemingly unoriginal replications are rarely published. Only a very small proportion of academic journals in psychology and neurosciences explicitly welcomed submissions of replication studies in their aim and scope or instructions to authors.^[64]^[65] This does not encourage reporting on, or even attempts to perform, replication studies. Among 1,576 researchers surveyed by Nature in 2016, only a minority had ever attempted to publish a replication, and several respondents who had published failed replications noted that editors and reviewers demanded that they play down comparisons with the original studies.^[5]^[51] An analysis of 4,270 empirical studies in 18 business journals from 1970 to 1991 reported that less than 10% of accounting, economics, and finance articles and 5% of management and marketing articles were replication studies.^[43]^[66] Publication bias is augmented by the pressure to publish and the author's own confirmation bias,^[b] and is an inherent hazard in the field, requiring a certain degree of skepticism on the part of readers.^[68]

Certain publishing practices also make it difficult to conduct replications and to monitor the severity of the reproducibility crisis, for oftentimes the articles do not come with sufficient descriptions for other scholars to reproduce the study. The Reproducibility Project: Cancer Biology showed that of 193 experiments from 53 top papers about cancer published between 2010 and 2012, only 50 experiments from 23 papers have authors who provided enough information for researchers to redo the studies, sometimes with modifications. None of the 193 papers examined had its experimental protocols fully described and replicating 70% of experiments required asking for key reagents.^[40]^[41] The aforementioned study of empirical findings in the Strategic Management Journal found that 70% of 88 articles could not be replicated due to a lack of sufficient information for data or procedures.^[43]^[47] In water resources and management, the majority of 1,987 articles published in 2017 were not replicable because of a lack of available information shared online.^[50]

Questionable research practices and fraud[edit]

Questionable research practices (QRPs) are intentional behaviors which capitalize on the gray area of acceptable scientific behavior or exploit the researcher degrees of freedom (researcher DF), which can contribute to the irreproducibility of results.^[69]^[70] QRPs do not include explicit violations of scientific integrity, such as data falsification.^[69]^[70] Researcher DF are the often-arbitrary choices made throughout the experimental process which can be opportunistically exploited to result in false-positives and inflated effect size estimates, contributing to the irreproducibility of results.^[70] Researcher DF are seen in hypothesis formulation, design of experiments, data collection and analysis, and reporting of research.^[70] Some examples of QRPs are data dredging,^[70]^[71]^[72]^[c] selective reporting,^[69]^[70]^[71]^[72]^[d] and HARKing (hypothesising after results are known).^[70]^[71]^[72]^[e] In medicine, irreproducible studies had six features in common. These included investigators not being blinded to the experimental versus the control arms, a failure to repeat experiments, a lack of positive and negative controls, failing to report all the data, inappropriate use of statistical tests, and use of reagents that were not appropriately validated.^[74]

The US Food and Drug Administration estimated that 10–20% of medical studies in 1977–1990 involved questionable research practices.^[75] A 2012 survey of over 2,000 psychologists indicated that over 90% of respondents admitted to using at least one QRP,^[72]^: 527^[f] although the methodology of this survey and its results have been called into question.^[76] Psychology has also been at the center of several scandals involving outright fraudulent research, such as scientific fraud by social psychologist Diederik Stapel,^[77]^[13] cognitive psychologist Marc Hauser^[13] and social psychologist Lawrence Sanna.^[13] Despite these scandals, scientific fraud appears to be uncommon.^[13]

In 2009, a meta-analysis found that 2% of scientists across fields admitted to falsifying studies at least once and 14% admitted to personally knowing someone who did. Such misconduct was, according to one study, reported more frequently by medical researchers than by others.^[78]

Statistical issues[edit]

According to a 2018 analysis of 200 meta-analyses, "psychological research is, on average, afflicted with low statistical power",^[14] meaning that most studies do not have a high probability of accurately finding an effect when one exists.^[g] Findings from original studies which have low power will often fail to replicate, and replication studies with low power are susceptible to false negatives.^[14] Low statistical power is a substantial contributor to the replication crisis.^[14]

Within economics, the replication crisis may be exacerbated because econometric results are fragile:^[79] using different but plausible estimation procedures or data preprocessing techniques can lead to obtaining conflicting results.^[80]^[81]^[82]

Base rate of hypothesis accuracy[edit]

Philosopher Alexander Bird argues that high rates of failed replications can be consistent with quality science. He argues that this depends on the base rate of hypotheses - a field with a high rate of incorrect hypotheses would see a high rate of failed reproductions. Given the parameters of statistical testing, 5% of studies testing incorrect hypotheses would be significant (a false positive). If there are almost no correct hypotheses (true positives), then the false positive findings would outnumber the true positives. When trying to replicate these results, a further 95% of the false positives would then be identified, resulting in a high number of failed replications.^[83]

Consequences[edit]

When effects are wrongly stated as relevant in the literature, failure to detect this by replication will lead to the canonization of such false facts.^[84]

A 2021 study found that papers in leading general interest, psychology and economics journals with findings that could not be replicated tend to be cited more over time than reproducible research papers - likely because these results are surprising or interesting. The trend is not affected by publication of failed reproductions, after which only 12% of papers which cite the original research will mention the failed replication.^[85]^[86] Further, experts are able to predict which studies will be replicable, leading the authors of the 2021 study, Marta Serra-Garcia and Uri Gneezy, to conclude that experts apply lower standards to interesting results when deciding whether to publish them.^[86]

Political repercussions[edit]

The crisis of science's quality control system is affecting the use of science for policy. This is the thesis of a recent work by a group of science and technology studies scholars, who identify in "evidence based (or informed) policy" a point of present tension.^[87]^[88]^[89]^[90] In the US, science's reproducibility crisis has become a topic of political contention, linked to the attempt to diminish regulations – e.g. of emissions of pollutants, with the argument that these regulations are based on non-reproducible science.^[91]^[90] Previous attempts with the same aim accused studies used by regulators of being non-transparent.^[92]

Public awareness and perceptions[edit]

Concerns have been expressed within the scientific community that the general public may consider science less credible due to failed replications.^[93] Research supporting this concern is sparse, but a nationally representative survey in Germany showed that more than 75% of Germans have not heard of replication failures in science.^[94] The study also found that most Germans have positive perceptions of replication efforts: only 18% think that non-replicability shows that science cannot be trusted, while 65% think that replication research shows that science applies quality control, and 80% agree that errors and corrections are part of science.^[94]

Response in academia[edit]

With the replication crisis of psychology earning attention, Princeton University psychologist Susan Fiske drew controversy for speaking against critics of psychology for what she described as bullying and undermining the science.^[95]^[96]^[97]^[98] She labeled these unidentified "adversaries" with names such as "methodological terrorist" and "self-appointed data police", saying that criticism of psychology should only be expressed in private or through contacting the journals.^[95] Columbia University statistician and political scientist Andrew Gelman responded to Fiske, saying that she had found herself willing to tolerate the "dead paradigm" of faulty statistics and had refused to retract publications even when errors were pointed out.^[95] He added that her tenure as editor had been abysmal and that a number of published papers edited by her were found to be based on extremely weak statistics; one of Fiske's own published papers had a major statistical error and "impossible" conclusions.^[95]

Remedies[edit]

Focus on the replication crisis has led to renewed efforts in psychology to re-test important findings.^[68]^[99] A 2013 special edition of the journal Social Psychology focused on replication studies.^[12]

A 2016 article by John Ioannidis, Professor of Medicine and of Health Research and Policy at Stanford University School of Medicine and a Professor of Statistics at Stanford University School of Humanities and Sciences, elaborated on "Why Most Clinical Research Is Not Useful".^[100] Ioannidis describes what he views as some of the problems and called for reform, characterizing certain points for medical research to be useful again; one example he made was the need for medicine to be patient centered (e.g. in the form of the Patient-Centered Outcomes Research Institute) instead of the current practice to mainly take care of "the needs of physicians, investigators, or sponsors".

Reform in scientific publishing[edit]

Metascience[edit]

Metascience is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing waste. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and where improvements can be made. Metascience concerns itself with all fields of research and has been described as "a bird's eye view of science."^[101] In the words of Ioannidis, "Science is the best thing that has happened to human beings ... but we can do it better."^[102]

Meta-research continues to be conducted to identify the roots of the crisis and to address them. Methods of addressing the crisis include pre-registration of scientific studies and clinical trials as well as the founding of organizations such as CONSORT and the EQUATOR Network that issue guidelines for methodology and reporting. There are continuing efforts to reform the system of academic incentives, to improve the peer review process, to reduce the misuse of statistics, to combat bias in scientific literature, and to increase the overall quality and efficiency of the scientific process.

Presentation of methodology[edit]

Some authors have argued that the insufficient communication of experimental methods is a major contributor to the reproducibility crisis and that better reporting of experimental design and statistical analyses would help improve the situation. These authors tend to plead for both a broad cultural change in the scientific community of how statistics are considered and a more coercive push from scientific journals and funding bodies.^[103] However, concerns have been raised about the potential for standards for transparency and replication to be misapplied to qualitative as well as quantitative studies.^[104]

Business and management journals that have introduced editorial policies on data accessibility, replication, and transparency include the Strategic Management Journal, the Journal of International Business Studies, and the Management and Organization Review.^[43]

Result-blind peer review[edit]

In response to concerns in psychology about publication bias and data dredging, more than 140 psychology journals have adopted result-blind peer review. In this approach to peer review, studies are accepted not on the basis of their findings and after the studies are completed, but before the studies are conducted and upon the basis of the methodological rigor of their experimental designs, and the theoretical justifications for their statistical analysis techniques before data collection or analysis is done.^[105] Early analysis of this procedure has estimated that 61 percent of result-blind studies have led to null results, in contrast to an estimated 5 to 20 percent in earlier research.^[35] In addition, large-scale collaborations between researchers working in multiple labs in different countries and that regularly make their data openly available for different researchers to assess have become much more common in psychology.^[106]

Pre-registration of studies[edit]

Scientific publishing has begun using pre-registration reports to address the replication crisis.^[107]^[108] The registered report format requires authors to submit a description of the study methods and analyses prior to data collection. Once the method and analysis plan is vetted through peer-review, publication of the findings is provisionally guaranteed, based on whether the authors follow the proposed protocol. One goal of registered reports is to circumvent the publication bias toward significant findings that can lead to implementation of questionable research practices. Another is to encourage publication of studies with rigorous methods.

The journal Psychological Science has encouraged the preregistration of studies and the reporting of effect sizes and confidence intervals.^[109] The editor in chief also noted that the editorial staff will be asking for replication of studies with surprising findings from examinations using small sample sizes before allowing the manuscripts to be published.

Metadata and digital tools for tracking replications[edit]

It has been suggested that "a simple way to check how often studies have been repeated, and whether or not the original findings are confirmed" is needed.^[85] Categorizations and ratings of reproducibility at the study or results level, as well as addition of links to and rating of third-party confirmations, could be conducted by the peer-reviewers, the scientific journal, or by readers in combination with novel digital platforms or tools.

Statistical reform[edit]

Requiring smaller p-values[edit]

Many publications require a p-value of p < 0.05 to claim statistical significance. The paper "Redefine statistical significance",^[110] signed by a large number of scientists and mathematicians, proposes that in "fields where the threshold for defining statistical significance for new discoveries is p < 0.05, we propose a change to p < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields." Their rationale is that "a leading cause of non-reproducibility (is that the) statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating 'statistically significant' findings with p < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems."^[110]

This call was subsequently criticised by another large group, who argued that "redefining" the threshold would not fix current problems, would lead to some new ones, and that in the end, all thresholds needed to be justified case-by-case instead of following general conventions.^[111]

Addressing misinterpretation of p-values[edit]

Although statisticians are unanimous that the use of "p < 0.05" as a standard for significance provides weaker evidence than is generally appreciated, there is a lack of unanimity about what should be done about it. Some have advocated that Bayesian methods should replace p-values. This has not happened on a wide scale, partly because it is complicated and partly because many users distrust the specification of prior distributions in the absence of hard data. A simplified version of the Bayesian argument, based on testing a point null hypothesis was suggested by pharmacologist David Colquhoun.^[112]^[113] The logical problems of inductive inference were discussed in "The Problem with p-values" (2016).^[114]

The hazards of reliance on p-values arises partly because even an observation of p = 0.001 is not necessarily strong evidence against the null hypothesis.^[113] Despite the fact that the likelihood ratio in favor of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a prior probability of a real effect being 0.1, even the observation of p = 0.001 would have a false positive risk of 8 percent. It would still fail to reach the 5 percent level.

It was recommended that the terms "significant" and "non-significant" should not be used.^[113] p-values and confidence intervals should still be specified, but they should be accompanied by an indication of the false-positive risk. It was suggested that the best way to do this is to calculate the prior probability that would be necessary to believe in order to achieve a false positive risk of a certain level, such as 5%. The calculations can be done with various computer software.^[113] ^[115] This reverse Bayesian approach, which was suggested by physicist Robert Matthews in 2001,^[116] is one way to avoid the problem that the prior probability is rarely known.

Encouraging larger sample sizes[edit]

To improve the quality of replications, larger sample sizes than those used in the original study are often needed.^[117] Larger sample sizes are needed because estimates of effect sizes in published work are often exaggerated due to publication bias and large sampling variability associated with small sample sizes in an original study.^[118]^[119]^[120] Further, using significance thresholds usually leads to inflated effects, because particularly with small sample sizes, only the largest effects will become significant.^[121]

Replication efforts[edit]

Funding[edit]

In July 2016 the Netherlands Organisation for Scientific Research made €3 million available for replication studies. The funding is for replication based on reanalysis of existing data and replication by collecting and analysing new data. Funding is available in the areas of social sciences, health research and healthcare innovation.^[122]

In 2013 the Laura and John Arnold Foundation funded the launch of The Center for Open Science with a $5.25 million grant and by 2017 had provided an additional $10 million in funding.^[123] It also funded the launch of the Meta-Research Innovation Center at Stanford at Stanford University run by Ioannidis and another medical scientist Steven Goodman to study ways to improve scientific research.^[123] It also provided funding for the AllTrials initiative led in part by medical scientist Ben Goldacre.^[123]

Emphasis in post-secondary education[edit]

Based on coursework in experimental methods at MIT, Stanford, and the University of Washington, it has been suggested that methods courses in psychology and other fields should emphasize replication attempts rather than original studies.^[124]^[125]^[126] Such an approach would help students learn scientific methodology and provide numerous independent replications of meaningful scientific findings that would test the replicability of scientific findings. Some have recommended that graduate students should be required to publish a high-quality replication attempt on a topic related to their doctoral research prior to graduation.^[127]

Final year thesis[edit]

Some institutions require undergraduate students to submit a final year thesis that consists of an original piece of research. Daniel Quintana, a psychologist at the University of Oslo in Norway, has recommended that students should be encouraged to perform replication studies in thesis projects, as well as being taught about open science.^[128]

Involving original authors[edit]

Psychologist Daniel Kahneman argued that, in psychology, the original authors should be involved in the replication effort because the published methods are often too vague.^[129]^[130] Others, such as psychologist Andrew Wilson, disagree, arguing that the original authors should write down the methods in detail.^[129] An investigation of replication rates in psychology in 2012 indicated higher success rates of replication in replication studies when there was author overlap with the original authors of a study^[131] (91.7% successful replication rates in studies with author overlap compared to 64.6% successful replication rates without author overlap).

Broader changes to scientific approach[edit]

Emphasize triangulation, not just replication[edit]

Psychologist Marcus R. Munafò and Epidemiologist George Davey Smith argue, in a piece published by Nature, that research should emphasize triangulation, not just replication, to protect against flawed ideas. They claim that,

replication alone will get us only so far (and) might actually make matters worse ... [Triangulation] is the strategic use of multiple approaches to address one question. Each approach has its own unrelated assumptions, strengths and weaknesses. Results that agree across different methodologies are less likely to be artefacts. ... Maybe one reason replication has captured so much interest is the often-repeated idea that falsification is at the heart of the scientific enterprise. This idea was popularized by Karl Popper's 1950s maxim that theories can never be proved, only falsified. Yet an overemphasis on repeating experiments could provide an unfounded sense of certainty about findings that rely on a single approach. ... philosophers of science have moved on since Popper. Better descriptions of how scientists actually work include what epistemologist Peter Lipton called in 1991 "inference to the best explanation".^[132]

Complex systems paradigm[edit]

The dominant scientific and statistical model of causation is the linear model.^[133] The linear model assumes that mental variables are stable properties which are independent of each other. In other words, these variables are not expected to influence each other. Instead, the model assumes that the variables will have an independent, linear effect on observable outcomes.^[133]

Social scientists Sebastian Wallot and Damian Kelty-Stephen argue that the linear model is not always appropriate.^[133] An alternative is the complex system model which assumes that mental variables are interdependent. These variables are not assumed to be stable, rather they will interact and adapt to each specific context.^[133] They argue that the complex system model is often more appropriate in psychology, and that the use of the linear model when the complex system model is more appropriate will result in failed replications.^[133]

...psychology may be hoping for replications in the very measurements and under the very conditions where a growing body of psychological evidence explicitly discourages predicting replication. Failures to replicate may be plainly baked into the potentially incomplete, but broadly sweeping failure of human behavior to conform to the standard of independen[ce] ...^[133]

The linear causal assumptions underlying conventional statistics are being questioned across many scientific fields.^[134]

Replication should seek to revise theories[edit]

Replication is fundamental for scientific progress to confirm original findings. However, replication alone is not sufficient to resolve the replication crisis. Replication efforts should seek not just to support or question the original findings, but also to replace them with revised, stronger theories with greater explanatory power. This approach therefore involves pruning existing theories, comparing all the alternative theories, and making replication efforts more generative and engaged in theory-building.^[135]^[136]

Open science[edit]

Tenets of open science.

Open data, open source software and open source hardware all are critical to enabling reproducibility in the sense of validation of the original data analysis. The use of proprietary software, the lack of the publication of analysis software and the lack of open data prevents the replication of studies. Unless software used in research is open source, reproducing results with different software and hardware configurations is impossible.^[137] CERN has both Open Data and CERN Analysis Preservation projects for storing data, all relevant information, and all software and tools needed to preserve an analysis at the large experiments of the LHC. Aside from all software and data, preserved analysis assets include metadata that enable understanding of the analysis workflow, related software, systematic uncertainties, statistics procedures and meaningful ways to search for the analysis, as well as references to publications and to backup material.^[138] CERN software is open source and available for use outside of particle physics and there is some guidance provided to other fields on the broad approaches and strategies used for open science in contemporary particle physics.^[139]

Online repositories where data, protocols, and findings can be stored and evaluated by the public seek to improve the integrity and reproducibility of research. Examples of such repositories include the Open Science Framework, Registry of Research Data Repositories, and Psychfiledrawer.org. Sites like Open Science Framework offer badges for using open science practices in an effort to incentivize scientists. However, there have been concerns that those who are most likely to provide their data and code for analyses are the researchers that are likely the most sophisticated.^[140] Ioannidis suggested that "the paradox may arise that the most meticulous and sophisticated and method-savvy and careful researchers may become more susceptible to criticism and reputation attacks by reanalyzers who hunt for errors, no matter how negligible these errors are".^[140]

Notes[edit]

^ More accurately, the null hypothesis (the hypothesis that the results are not reflecting a true pattern) is rejected when the probability of the null hypothesis being true is less than 5%. A rejection of the null hypothesis results in the alternative hypothesis (which corresponds to the hypothesis set by the researcher) being accepted.
^ According to the APA Dictionary of Psychology, confirmation bias is "the tendency to gather evidence that confirms preexisting expectations, typically by emphasizing or pursuing supporting evidence while dismissing or failing to seek contradictory evidence".^[67]
^ Data dredging, also known as p-hacking or p-fishing, is misuse of data, through myriad techniques, to find support for hypotheses that the data is inadequate for.^[73]
^ Selective reporting is also known as partial publication. Reporting is an opportunity to disclose all of the researcher degrees of freedom used or exploited. Selective reporting is a failure to report relevant details or choices, such as some independent and dependent variables, missing data, data exclusions, and outlier exclusions.^[70]
^ HARKing, also known as post-hoc storytelling, is when an exploratory analysis is framed as a confirmatory analysis. It involves changing a hypothesis after research has been done, so that the new hypothesis is able to be confirmed by the results of the experiment.^[70]
^ In this survey, falsifying data was included as a QRP.^[72]
^ In a more technical sense, statistical power is the probability that the null hypothesis will be correctly rejected. Adequate statistical power is widely accepted as 80%, meaning that the chance of a false negative (a Type II error) would be 20%.^[14]

References[edit]

^ Ioannidis, John P. A. (1 August 2005). "Why Most Published Research Findings Are False". PLOS Medicine. 2 (8): e124. doi:10.1371/journal.pmed.0020124. ISSN 1549-1277. PMC 1182327. PMID 16060722.
^ John, Staddon (8 December 2017). Scientific Method. New York, NY: Routledge. doi:10.4324/9781315100708. ISBN 978-1-315-10070-8. S2CID 201781341.
^ Lehrer, Jonah (13 December 2010). "The Truth Wears Off". The New Yorker. Retrieved 2020-01-30.
^ Marcus, Gary (1 May 2013). "The Crisis in Social Psychology That Isn't". The New Yorker. Retrieved 2020-01-30.
^ ^a ^b ^c Baker, Monya (25 May 2016). "1,500 scientists lift the lid on reproducibility". Nature (News Feature). Springer Nature. 533 (7604): 452–454. Bibcode:2016Natur.533..452B. doi:10.1038/533452a. PMID 27225100. S2CID 4460617. (Erratum: [1])
^ Pashler, Harold; Wagenmakers, Eric Jan (2012). "Editors' Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?". Perspectives on Psychological Science (Editorial). 7 (6): 528–530. doi:10.1177/1745691612465253. PMID 26168108. S2CID 26361121.
^ Fidler, Fiona; Wilcox, John (2018). "Reproducibility of Scientific Results". The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. Retrieved 2019-05-19.
^ Moonesinghe, Ramal; Khoury, Muin J.; Janssens, A. Cecile J. W. (27 February 2007). "Most Published Research Findings Are False—But a Little Replication Goes a Long Way". PLOS Med. 4 (2): e28. doi:10.1371/journal.pmed.0040028. PMC 1808082. PMID 17326704.
^ Simons, Daniel J. (1 January 2014). "The Value of Direct Replication". Perspectives on Psychological Science. 9 (1): 76–80. doi:10.1177/1745691613514755. ISSN 1745-6916. PMID 26173243. S2CID 1149441.
^ ^a ^b ^c ^d Schmidt, Stefan (2009). "Shall we Really do it Again? The Powerful Concept of Replication is Neglected in the Social Sciences". Review of General Psychology. SAGE Publications. 13 (2): 90–100. doi:10.1037/a0015108. ISSN 1089-2680.
^ ^a ^b Open Science Collaboration (28 August 2015). "Estimating the reproducibility of psychological science" (PDF). Science. 349 (6251): aac4716. doi:10.1126/science.aac4716. hdl:10722/230596. ISSN 0036-8075. PMID 26315443. S2CID 218065162.
^ ^a ^b ^c Duvendack, Maren; Palmer-Jones, Richard; Reed, W. Robert (May 2017). "What Is Meant by "Replication" and Why Does It Encounter Resistance in Economics?". American Economic Review. 107 (5): 46–51. doi:10.1257/aer.p20171031. ISSN 0002-8282.
^ ^a ^b ^c ^d ^e ^f ^g ^h Shrout, Patrick E.; Rodgers, Joseph L. (4 January 2018). "Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis". Annual Review of Psychology. Annual Reviews. 69 (1): 487–510. doi:10.1146/annurev-psych-122216-011845. ISSN 0066-4308.
^ ^a ^b ^c ^d ^e Stanley, T. D.; Carter, Evan C.; Doucouliagos, Hristos (2018). "What meta-analyses reveal about the replicability of psychological research". Psychological Bulletin. 144 (12): 1325–1346. doi:10.1037/bul0000169. ISSN 1939-1455. PMID 30321017. S2CID 51951232.
^ Meyer, Michelle N.; Chabris, Christopher (31 July 2014). "Why Psychologists' Food Fight Matters". Slate.
^ Aschwanden, Christie (19 August 2015). "Science Isn't Broken". FiveThirtyEight. Retrieved 2020-01-30.
^ Aschwanden, Christie (27 August 2015). "Psychology Is Starting To Deal With Its Replication Problem". FiveThirtyEight. Retrieved 2020-01-30.
^ Etchells, Pete (28 May 2014). "Psychology's replication drive: it's not about you". The Guardian.
^ Wagenmakers, Eric-Jan; Wetzels, Ruud; Borsboom, Denny; Maas, Han L. J. van der; Kievit, Rogier A. (1 November 2012). "An Agenda for Purely Confirmatory Research". Perspectives on Psychological Science. 7 (6): 632–638. doi:10.1177/1745691612463078. ISSN 1745-6916. PMID 26168122. S2CID 5096417.
^ Ioannidis, John P. A. (1 November 2012). "Why Science Is Not Necessarily Self-Correcting". Perspectives on Psychological Science. 7 (6): 645–654. doi:10.1177/1745691612464056. ISSN 1745-6916. PMID 26168125. S2CID 11798785.
^ Pashler, Harold; Harris, Christine R. (1 November 2012). "Is the Replicability Crisis Overblown? Three Arguments Examined". Perspectives on Psychological Science. 7 (6): 531–536. doi:10.1177/1745691612463401. ISSN 1745-6916. PMID 26168109.
^ Achenbach, Joel. "No, science's reproducibility problem is not limited to psychology". The Washington Post. Retrieved 2015-09-10.
^ Bartlett, Tom (30 January 2013). "Power of Suggestion". The Chronicle of Higher Education.
^ Dominus, Susan (18 October 2017). "When the Revolution Came for Amy Cuddy". The New York Times. ISSN 0362-4331. Retrieved 2017-10-19.
^ Leichsenring, Falk; Abbass, Allan; Hilsenroth, Mark J.; Leweke, Frank; Luyten, Patrick; Keefe, Jack R.; Midgley, Nick; Rabung, Sven; Salzer, Simone; Steiner, Christiane (April 2017). "Biases in research: risk factors for non-replicability in psychotherapy and pharmacotherapy research". Psychological Medicine. 47 (6): 1000–1011. doi:10.1017/S003329171600324X. PMID 27955715. S2CID 1872762.
^ Hengartner, Michael P. (28 February 2018). "Raising Awareness for the Replication Crisis in Clinical Psychology by Focusing on Inconsistencies in Psychotherapy Research: How Much Can We Rely on Published Findings from Efficacy Trials?". Frontiers in Psychology. Frontiers Media. 9: 256. doi:10.3389/fpsyg.2018.00256. PMC 5835722. PMID 29541051.
^ Frank, Michael C.; Bergelson, Elika; Bergmann, Christina; Cristia, Alejandrina; Floccia, Caroline; Gervain, Judith; Hamlin, J. Kiley; Hannon, Erin E.; Kline, Melissa; Levelt, Claartje; Lew-Williams, Casey; Nazzi, Thierry; Panneton, Robin; Rabagliati, Hugh; Soderstrom, Melanie; Sullivan, Jessica; Waxman, Sandra; Yurovsky, Daniel (9 March 2017). "A Collaborative Approach to Infant Research: Promoting Reproducibility, Best Practices, and Theory‐Building". Infancy. 22 (4): 421–435. doi:10.1111/infa.12182. hdl:10026.1/9942. PMC 6879177. PMID 31772509.
^ Tyson, Charlie (14 August 2014). "Failure to Replicate". Inside Higher Ed. Retrieved 2018-12-19.
^ Makel, Matthew C.; Plucker, Jonathan A. (1 August 2014). "Facts Are More Important Than Novelty: Replication in the Education Sciences". Educational Researcher. 43 (6): 304–316. doi:10.3102/0013189X14545513. S2CID 145571836. Retrieved 2018-12-19.
^ "Summary of reproducibility rates and effect sizes for original and replication studies overall and by journal/discipline". Retrieved 2019-10-16.
^ Roger, Adam (27 August 2018). "The Science Behind Social Science Gets Shaken Up—Again". Wired. Retrieved 2018-08-28.
^ Camerer, Colin F.; Dreber, Anna; et al. (27 August 2018). "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015" (PDF). Nature Human Behaviour. 2 (9): 637–644. doi:10.1038/s41562-018-0399-z. PMID 31346273. S2CID 52098703.
^ Klein, R.A. (2018). "Many Labs 2: Investigating Variation in Replicability Across Samples and Settings". Advances in Methods and Practices in Psychological Science. 1 (4): 443–490. doi:10.1177/2515245918810225.
^ ^a ^b Witkowski, Tomasz (2019). "Is the glass half empty or half full? Latest results in the replication crisis in Psychology" (PDF). Skeptical Inquirer. Vol. 43, no. 2. pp. 5–6. Archived from the original (PDF) on 2020-01-30.
^ ^a ^b Allen, Christopher; Mehler, David M. A. (1 May 2019). "Open science challenges, benefits and tips in early career and beyond". PLOS Biology. Public Library of Science. 17 (5): e3000246. doi:10.1371/journal.pbio.3000246. ISSN 1545-7885. PMC 6513108. PMID 31042704.
^ Ioannidis JA (13 July 2005). "Contradicted and initially stronger effects in highly cited clinical research". JAMA. 294 (2): 218–228. doi:10.1001/jama.294.2.218. PMID 16014596.
^ Prinz, Florian (31 August 2011). "Believe it or not: how much can we rely on published data on potential drug targets". Nature Reviews Drug Discovery. 10 (712): 712. doi:10.1038/nrd3439-c1. PMID 21892149.
^ Wheeling, Kate (12 May 2016). "Big Pharma Reveals a Biomedical Replication Crisis". Pacific Standard. Retrieved 2020-01-30. Updated on 14 June 2017
^ Begley, C. G.; Ellis, L. M. (2012). "Drug Development: Raise Standards for Preclinical Cancer Research". Nature (Comment article). 483 (7391): 531–533. Bibcode:2012Natur.483..531B. doi:10.1038/483531a. PMID 22460880. S2CID 4326966. (Erratum: doi:10.1038/485041e)
^ ^a ^b Haelle, Tara (7 December 2021). "Dozens of major cancer studies can't be replicated". Science News. Retrieved 2022-01-19.
^ ^a ^b "Reproducibility Project: Cancer Biology". www.cos.io. Center for Open Science. Retrieved 2022-01-19.
^ Mobley, A.; Linder, S. K.; Braeuer, R.; Ellis, L. M.; Zwelling, L. (2013). Arakawa, Hirofumi (ed.). "A Survey on Data Reproducibility in Cancer Research Provides Insights into Our Limited Ability to Translate Findings from the Laboratory to the Clinic". PLOS ONE. 8 (5): e63221. Bibcode:2013PLoSO...863221M. doi:10.1371/journal.pone.0063221. PMC 3655010. PMID 23691000.
^ ^a ^b ^c ^d Tsui, Anne S. (21 January 2022). "From Traditional Research to Responsible Research: The Necessity of Scientific Freedom and Scientific Responsibility for Better Societies". Annual Review of Organizational Psychology and Organizational Behavior. 9 (1): 1–32. doi:10.1146/annurev-orgpsych-062021-021303. ISSN 2327-0608. Retrieved 2022-03-21.
^ Camerer, Colin F.; Dreber, Anna; Forsell, Eskil; Ho, Teck-Hua; Huber, Jürgen; Johannesson, Magnus; Kirchler, Michael; Almenberg, Johan; Altmejd, Adam (25 March 2016). "Evaluating replicability of laboratory experiments in economics". Science. 351 (6280): 1433–1436. Bibcode:2016Sci...351.1433C. doi:10.1126/science.aaf0918. ISSN 0036-8075. PMID 26940865.
^ Bohannon, John (3 March 2016). "About 40% of economics experiments fail replication survey". Science. Retrieved 2017-10-25.
^ Goldfarb, Robert S. (1 December 1997). "Now you see it, now you don't: emerging contrary results in economics". Journal of Economic Methodology. 4 (2): 221–244. doi:10.1080/13501789700000016. ISSN 1350-178X.
^ ^a ^b Bergh, Donald D; Sharp, Barton M; Aguinis, Herman; Li, Ming (6 April 2017). "Is there a credibility crisis in strategic management research? Evidence on the reproducibility of study findings". Strategic Organization. 15 (3): 423–436. doi:10.1177/1476127017701076. ISSN 1476-1270. Retrieved 2022-03-22.
^ Richtel, Matt (16 March 2022). "Brain-Imaging Studies Hampered by Small Data Sets, Study Finds". The New York Times.
^ Marek, Scott; Tervo-Clemmens, Brenden; Calabro, Finnegan J.; Montez, David F.; Kay, Benjamin P.; Hatoum, Alexander S.; Donohue, Meghan Rose; Foran, William; Miller, Ryland L.; Hendrickson, Timothy J.; Malone, Stephen M.; Kandala, Sridhar; Feczko, Eric; Miranda-Dominguez, Oscar; Graham, Alice M.; Earl, Eric A.; Perrone, Anders J.; Cordova, Michaela; Doyle, Olivia; Moore, Lucille A.; Conan, Gregory M.; Uriarte, Johnny; Snider, Kathy; Lynch, Benjamin J.; Wilgenbusch, James C.; Pengo, Thomas; Tam, Angela; Chen, Jianzhong; Newbold, Dillan J.; Zheng, Annie; Seider, Nicole A.; Van, Andrew N.; Metoki, Athanasia; Chauvin, Roselyne J.; Laumann, Timothy O.; Greene, Deanna J.; Petersen, Steven E.; Garavan, Hugh; Thompson, Wesley K.; Nichols, Thomas E.; Yeo, B. T. Thomas; Barch, Deanna M.; Luna, Beatriz; Fair, Damien A.; Dosenbach, Nico U. F. (March 2022). "Reproducible brain-wide association studies require thousands of individuals". Nature. 603 (7902): 654–660. doi:10.1038/s41586-022-04492-9. ISSN 1476-4687.
^ ^a ^b Stagge, James H.; Rosenberg, David E.; Abdallah, Adel M.; Akbar, Hadia; Attallah, Nour A.; James, Ryan (26 February 2019). "Assessing data availability and research reproducibility in hydrology and water resources". Scientific Data. 6: 190030. Bibcode:2019NatSD...690030S. doi:10.1038/sdata.2019.30. ISSN 2052-4463. PMC 6390703. PMID 30806638.
^ ^a ^b Nature Video (28 May 2016). "Is There a Reproducibility Crisis in Science?". Scientific American. Retrieved 2019-08-15.
^ Begley, C. Glenn; Ioannidis, John P. A. (2015). "Reproducibility in Science: Improving the Standard for Basic and Preclinical Research". Circulation Research. 116 (1): 116–126. doi:10.1161/CIRCRESAHA.114.303819. PMID 25552691. S2CID 3587510.
^ De Solla Price; Derek J. (1963). Little science big science. Columbia University Press. p. 32. ISBN 9780231085625.
^ Siebert, S.; Machesky, L. M. & Insall, R. H. (2015). "Overflow in science and its implications for trust". eLife. 4: e10825. doi:10.7554/eLife.10825. PMC 4563216. PMID 26365552.
^ Della Briotta Parolo, P.; Kumar Pan; R. Ghosh; R. Huberman; B.A. Kimmo Kaski; Fortunato, S. (2015). "Attention decay in science". Journal of Informetrics. 9 (4): 734–745. arXiv:1503.01881. Bibcode:2015arXiv150301881D. doi:10.1016/j.joi.2015.07.006. S2CID 10949754.
^ ^a ^b Mirowski, P. (2011). Science-Mart. Harvard University Press. pp. 2, 24. ISBN 978-0-674-06113-2.
^ Moeller (2006). Luhmann explained: from souls to systems. Chicago: Open Court. p. 25. ISBN 0-8126-9598-4. OCLC 68694011.
^ Luhmann, Niklas (1995). Social systems. Stanford, CA: Stanford University Press. p. 288. ISBN 978-0-8047-2625-2. OCLC 31710315.
^ ^a ^b Scheufele, D. A. (15 September 2014). "Science communication as political communication". Proceedings of the National Academy of Sciences. 111 (Supplement 4): 13585–13592. Bibcode:2014PNAS..111S3585S. doi:10.1073/pnas.1317516111. ISSN 0027-8424. PMC 4183176. PMID 25225389.
^ Pielke, Roger (2007). The honest broker : making sense of science in policy and politics. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511818110. ISBN 978-0-511-81811-0. OCLC 162145073.
^ Saltelli, Andrea; Boulanger, Paul-Marie (2020). "Technoscience, policy and the new media. Nexus or vortex?". Futures. Elsevier BV. 115: 102491. doi:10.1016/j.futures.2019.102491. ISSN 0016-3287. S2CID 211538470.
^ Smith, Noah (14 December 2016). "Academic signaling and the post-truth world". Noahpinion. Retrieved 2017-11-05.
^ Everett, Jim A. C.; Earp, Brian D. (1 January 2015). "A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers". Frontiers in Psychology. 6: 1152. doi:10.3389/fpsyg.2015.01152. PMC 4527093. PMID 26300832.
^ Martin, G. N.; Clarke, Richard M. (2017). "Are Psychology Journals Anti-replication? A Snapshot of Editorial Practices". Frontiers in Psychology. 8: 523. doi:10.3389/fpsyg.2017.00523. ISSN 1664-1078. PMC 5387793. PMID 28443044.
^ Yeung, Andy W. K. (2017). "Do Neuroscience Journals Accept Replications? A Survey of Literature". Frontiers in Human Neuroscience. 11: 468. doi:10.3389/fnhum.2017.00468. ISSN 1662-5161. PMC 5611708. PMID 28979201.
^ Hubbard, Raymond; Vetter, Daniel E. (1 February 1996). "An empirical comparison of published replication research in accounting, economics, finance, management, and marketing". Journal of Business Research. 35 (2): 153–164. doi:10.1016/0148-2963(95)00084-4. ISSN 0148-2963. Retrieved 2022-03-22.
^ "Confirmation bias". APA Dictionary of Psychology. Washington, DC: American Psychological Association. n.d. Retrieved 2022-02-02.
^ ^a ^b Simmons, Joseph; Nelson, Leif; Simonsohn, Uri (November 2011). "False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant". Psychological Science. 22 (11): 1359–1366. doi:10.1177/0956797611417632. ISSN 0956-7976. PMID 22006061.
^ ^a ^b ^c "Research misconduct - The grey area of Questionable Research Practices". www.vib.be. Vlaams Instituut voor Biotechnologie. 30 September 2013. Archived from the original on 2014-10-31. Retrieved 2015-11-13.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ Wicherts, Jelte M.; Veldkamp, Coosje L. S.; Augusteijn, Hilde E. M.; Bakker, Marjan; van Aert, Robbie C. M.; van Assen, Marcel A. L. M. (2016). "Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking". Frontiers in Psychology. 7: 1832. doi:10.3389/fpsyg.2016.01832. ISSN 1664-1078. PMC 5122713. PMID 27933012.
^ ^a ^b ^c Neuroskeptic (1 November 2012). "The Nine Circles of Scientific Hell". Perspectives on Psychological Science (Opinion). 7 (6): 643–644. doi:10.1177/1745691612459519. ISSN 1745-6916. PMID 26168124. S2CID 45328962.
^ ^a ^b ^c ^d ^e John, Leslie K.; Loewenstein, George; Prelec, Drazen (1 May 2012). "Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling" (PDF). Psychological Science. 23 (5): 524–532. doi:10.1177/0956797611430953. ISSN 0956-7976. PMID 22508865. S2CID 8400625.
^ "Data dredging". APA Dictionary of Psychology. Washington, DC: American Psychological Association. n.d. Retrieved 2022-01-09. The inappropriate practice of searching through large files of information to try to confirm a preconceived hypothesis or belief without an adequate design that controls for possible confounds or alternate hypotheses. Data dredging may involve selecting which parts of a large data set to retain to get specific, desired results.
^ Begley, C. G. (2013). "Six red flags for suspect work". Nature (Comment article). 497 (7450): 433–434. Bibcode:2013Natur.497..433B. doi:10.1038/497433a. PMID 23698428. S2CID 4312732.
^ Glick, J. Leslie (1992). "Scientific data audit—A key management tool". Accountability in Research. 2 (3): 153–168. doi:10.1080/08989629208573811.
^ Fiedler, Klaus; Schwarz, Norbert (19 October 2015). "Questionable Research Practices Revisited". Social Psychological and Personality Science. 7: 45–52. doi:10.1177/1948550615612150. ISSN 1948-5506. S2CID 146717227.
^ Shea, Christopher (13 November 2011). "Fraud Scandal Fuels Debate Over Practices of Social Psychology". The Chronicle of Higher Education.
^ Fanelli, Daniele (29 May 2009). "How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data". PLOS ONE. 4 (5): e5738. Bibcode:2009PLoSO...4.5738F. doi:10.1371/journal.pone.0005738. PMC 2685008. PMID 19478950.
^ Moosa, Imad A. (2 October 2019). "The fragility of results and bias in empirical research: an exploratory exposition". Journal of Economic Methodology. 26 (4): 347–360. doi:10.1080/1350178X.2018.1556798. ISSN 1350-178X. S2CID 158504639.
^ Granger, Clive W. J.; Granger, Professor of Economics Clive W. J. (30 September 1999). Empirical Modeling in Economics: Specification and Evaluation. Cambridge University Press. ISBN 978-0-521-77825-1.
^ Maziarz, Mariusz (1 December 2021). "Resolving empirical controversies with mechanistic evidence". Synthese. 199 (3): 9957–9978. doi:10.1007/s11229-021-03232-2. ISSN 1573-0964. S2CID 236249427.
^ Morgan, Mary S.; Magnus, Jan R. (September 1997). "The experiment in applied econometrics". Journal of Applied Econometrics. 12 (5): 459–661. ISSN 1099-1255.
^ Bird, Alexander (17 December 2020). "Understanding the Replication Crisis as a Base Rate Fallacy". The British Journal for the Philosophy of Science. 72 (4): 965–993. doi:10.1093/bjps/axy051. ISSN 0007-0882.
^ Nissen, Silas Boye; Magidson, Tali; Gross, Kevin; Bergstrom, Carl (20 December 2016). "Research: Publication bias and the canonization of false facts". eLife. 5: e21451. arXiv:1609.00494. doi:10.7554/eLife.21451. PMC 5173326. PMID 27995896.
^ ^a ^b "A new replication crisis: Research that is less likely to be true is cited more". phys.org. Retrieved 2021-06-14.
^ ^a ^b Serra-Garcia, Marta; Gneezy, Uri (1 May 2021). "Nonreplicable publications are cited more than replicable ones". Science Advances. 7 (21): eabd1705. Bibcode:2021SciA....7D1705S. doi:10.1126/sciadv.abd1705. ISSN 2375-2548. PMC 8139580. PMID 34020944.
^ Saltelli, A.; Funtowicz, S. (2017). "What is science's crisis really about?". Futures. 91: 5–11. doi:10.1016/j.futures.2017.05.010.
^ Benessia, A.; Funtowicz, S.; Giampietro, M.; Guimarães Pereira, A.; Ravetz, J.; Saltelli, A.; Strand, R.; van der Sluijs, J. (2016). The Rightful Place of Science: Science on the Verge. Consortium for Science, Policy and Outcomes at Arizona State University.
^ Saltelli, Andrea; Ravetz, Jerome R. & Funtowicz, Silvio (25 June 2016). "A new community for science". New Scientist. No. 3079. p. 52.
^ ^a ^b Saltelli, Andrea (December 2018). "Why science's crisis should not become a political battling ground". Futures. 104: 85–90. doi:10.1016/j.futures.2018.07.006.
^ Oreskes, N. (2018). "Beware: Transparency rule is a trojan horse". Nature. 557 (7706): 469. Bibcode:2018Natur.557..469O. doi:10.1038/d41586-018-05207-9. PMID 29789751. (Erratum: [2])
^ Michaels, D. (2008). Doubt is their product: How industry's assault on science threatens your health. Oxford University Press. ISBN 9780195300673.
^ Białek, Michał (2018). "Replications can cause distorted belief in scientific progress". Behavioral and Brain Sciences. 41: e122. doi:10.1017/S0140525X18000584. ISSN 0140-525X. PMID 31064528. S2CID 147705650.
^ ^a ^b Mede, Niels G.; Schäfer, Mike S.; Ziegler, Ricarda; Weißkopf, Markus (2020). "The "replication crisis" in the public eye: Germans' awareness and perceptions of the (ir)reproducibility of scientific research". Public Understanding of Science. 30 (1): 91–102. doi:10.1177/0963662520954370. PMID 32924865. S2CID 221723269.
^ ^a ^b ^c ^d Letzter, Rafi (22 September 2016). "Scientists are furious after a famous psychologist accused her peers of 'methodological terrorism'". Business Insider. Retrieved 2020-01-30.
^ "Draft of Observer Column Sparks Strong Social Media Response". Association for Psychological Science. Retrieved 2017-10-04.
^ Fiske, Susan T. (31 October 2016). "A Call to Change Science's Culture of Shaming". APS Observer. 29 (9).
^ Singal, Jesse (12 October 2016). "Inside Psychology's 'Methodological Terrorism' Debate". NY Mag. Retrieved 2017-10-04.
^ Stroebe, Wolfgang; Strack, Fritz (2014). "The Alleged Crisis and the Illusion of Exact Replication" (PDF). Perspectives on Psychological Science. 9 (1): 59–71. doi:10.1177/1745691613514450. PMID 26173241. S2CID 31938129.
^ Ioannidis, JPA (2016). "Why Most Clinical Research Is Not Useful". PLOS Med. 13 (6): e1002049. doi:10.1371/journal.pmed.1002049. PMC 4915619. PMID 27328301.
^ Ioannidis, John P. A.; Fanelli, Daniele; Dunne, Debbie Drake; Goodman, Steven N. (2 October 2015). "Meta-research: Evaluation and Improvement of Research Methods and Practices". PLOS Biology. 13 (10): –1002264. doi:10.1371/journal.pbio.1002264. ISSN 1545-7885. PMC 4592065. PMID 26431313.
^ Bach, Becky (8 December 2015). "On communicating science and uncertainty: A podcast with John Ioannidis". Scope. Retrieved 2019-05-20.
^ Gosselin, Romain D. (2019). "Statistical Analysis Must Improve to Address the Reproducibility Crisis: The ACcess to Transparent Statistics (ACTS) Call to Action". BioEssays. 42 (1): 1900189. doi:10.1002/bies.201900189. PMID 31755115.
^ Pratt, Michael G.; Kaplan, Sarah; Whittington, Richard (6 November 2019). "Editorial Essay: The Tumult over Transparency: Decoupling Transparency from Replication in Establishing Trustworthy Qualitative Research". Administrative Science Quarterly. 65 (1): 1–19. doi:10.1177/0001839219887663. ISSN 0001-8392. Retrieved 2022-03-22.
^ Aschwanden, Christie (6 December 2018). "Psychology's Replication Crisis Has Made The Field Better". FiveThirtyEight. Retrieved 2018-12-19.
^ Chartier, Chris; Kline, Melissa; McCarthy, Randy; Nuijten, Michele; Dunleavy, Daniel J.; Ledgerwood, Alison (December 2018), "The Cooperative Revolution Is Making Psychological Science Better", Observer, 31 (10), retrieved 2018-12-19
^ "Registered Replication Reports". Association for Psychological Science. Retrieved 2015-11-13.
^ Chambers, Chris (20 May 2014). "Psychology's 'registration revolution'". The Guardian. Retrieved 2015-11-13.
^ Lindsay, D. Stephen (9 November 2015). "Replication in Psychological Science". Psychological Science. 26 (12): 1827–32. doi:10.1177/0956797615616374. ISSN 0956-7976. PMID 26553013.
^ ^a ^b Benjamin, Daniel J.; et al. (2018). "Redefine statistical significance". Nature Human Behaviour. 2 (1): 6–10. doi:10.1038/s41562-017-0189-z. PMID 30980045.
^ Lakens, Daniel; et al. (March 2018). "Justify your alpha". Nature Human Behaviour. 2 (3): 168–171. doi:10.1038/s41562-018-0311-x. hdl:21.11116/0000-0004-9413-F. ISSN 2397-3374. S2CID 3692182.
^ Colquhoun, David (2015). "An investigation of the false discovery rate and the misinterpretation of p-values". Royal Society Open Science. 1 (3): 140216. arXiv:1407.5296. Bibcode:2014RSOS....140216C. doi:10.1098/rsos.140216. PMC 4448847. PMID 26064558.
^ ^a ^b ^c ^d Colquhoun, David (2017). "The reproducibility of research and the misinterpretation of p-values". Royal Society Open Science. 4 (12): 171085. doi:10.1098/rsos.171085. PMC 5750014. PMID 29308247.
^ Colquhoun, David. "The problem with p-values". Aeon Magazine. Retrieved 2016-12-11.
^ Longstaff, Colin; Colquhoun, David. "Calculator for false positive risk (FPR)". UCL.
^ Matthews, R. A. J. (2001). "Why should clinicians care about Bayesian methods?". Journal of Statistical Planning and Inference. 94: 43–58. doi:10.1016/S0378-3758(00)00232-9.
^ Maxwell, Scott E.; Lau, Michael Y.; Howard, George S. (2015). "Is psychology suffering from a replication crisis? What does "failure to replicate" really mean?". American Psychologist. 70 (6): 487–498. doi:10.1037/a0039400. PMID 26348332.
^ IntHout, Joanna; Ioannidis, John P. A.; Borm, George F.; Goeman, Jelle J. (2015). "Small studies are more heterogeneous than large ones: a meta-meta-analysis". Journal of Clinical Epidemiology. 68 (8): 860–869. doi:10.1016/j.jclinepi.2015.03.017. PMID 25959635.
^ Button, Katherine S.; Ioannidis, John P. A.; Mokrysz, Claire; Nosek, Brian A.; Flint, Jonathan; Robinson, Emma S. J.; Munafò, Marcus R. (1 May 2013). "Power failure: why small sample size undermines the reliability of neuroscience". Nature Reviews Neuroscience. 14 (5): 365–376. doi:10.1038/nrn3475. ISSN 1471-003X. PMID 23571845.
^ Greenwald, Anthony G. (1975). "Consequences of prejudice against the null hypothesis" (PDF). Psychological Bulletin. 82 (1): 1–20. doi:10.1037/h0076157.
^ Amrhein, Valentin; Korner-Nievergelt, Fränzi; Roth, Tobias (2017). "The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research". PeerJ. 5: e3544. doi:10.7717/peerj.3544. PMC 5502092. PMID 28698825.
^ "NWO makes 3 million available for Replication Studies pilot". NWO. Retrieved 2016-08-02.
^ ^a ^b ^c Apple, Sam (22 January 2017). "The Young Billionaire Behind the War on Bad Science". Wired.
^ Frank, Michael C.; Saxe, Rebecca (1 November 2012). "Teaching Replication". Perspectives on Psychological Science. 7 (6): 600–604. doi:10.1177/1745691612460686. ISSN 1745-6916. PMID 26168118. S2CID 33661604.
^ Grahe, Jon E.; Reifman, Alan; Hermann, Anthony D.; Walker, Marie; Oleson, Kathryn C.; Nario-Redmond, Michelle; Wiebe, Richard P. (1 November 2012). "Harnessing the Undiscovered Resource of Student Research Projects". Perspectives on Psychological Science. 7 (6): 605–607. doi:10.1177/1745691612459057. ISSN 1745-6916. PMID 26168119.
^ Marwick, Ben; Wang, Li-Ying; Robinson, Ryan; Loiselle, Hope (22 October 2019). "How to Use Replication Assignments for Teaching Integrity in Empirical Archaeology". Advances in Archaeological Practice. 8: 78–86. doi:10.1017/aap.2019.38.
^ Everett, Jim Albert Charlton; Earp, Brian D. (1 January 2015). "A tragedy of the (academic) commons: interpreting the replication crisis in psychology as a social dilemma for early-career researchers". Frontiers in Psychology. 6: 1152. doi:10.3389/fpsyg.2015.01152. PMC 4527093. PMID 26300832.
^ Quintana, Daniel S. (September 2021). "Replication studies for undergraduate theses to improve science and education". Nature Human Behaviour (World View article). 5 (9): 1117–1118. doi:10.1038/s41562-021-01192-8. ISSN 2397-3374. PMID 34493847. S2CID 237439956.
^ ^a ^b Chambers, Chris (10 June 2014). "Physics envy: Do 'hard' sciences hold the solution to the replication crisis in psychology?". The Guardian.
^ Kahneman, Daniel (2014). "A New Etiquette for Replication". Social Psychology (Commentary). Commentaries and Rejoinder on. 45 (4): 310–311. doi:10.1027/1864-9335/a000202.
^ Makel, Matthew C.; Plucker, Jonathan A.; Hegarty, Boyd (1 November 2012). "Replications in Psychology Research: How Often Do They Really Occur?". Perspectives on Psychological Science. 7 (6): 537–542. doi:10.1177/1745691612460688. ISSN 1745-6916. PMID 26168110.
^ Munafò, Marcus R.; Smith, George Davey (23 January 2018). "Robust research needs many lines of evidence". Nature. 553 (7689): 399–401. Bibcode:2018Natur.553..399M. doi:10.1038/d41586-018-01023-3. PMID 29368721.
^ ^a ^b ^c ^d ^e ^f Wallot, Sebastian; Kelty-Stephen, Damian G. (1 June 2018). "Interaction-Dominant Causation in Mind and Brain, and Its Implication for Questions of Generalization and Replication". Minds and Machines. 28 (2): 353–374. doi:10.1007/s11023-017-9455-0. ISSN 1572-8641.
^ Siegenfeld, Alexander F.; Bar-Yam, Yaneer (2020). "An Introduction to Complex Systems Science and Its Applications". Complexity. 2020: 1–16. arXiv:1912.05088. doi:10.1155/2020/6105872.
^ Tierney, Warren; Hardy, Jay H.; Ebersole, Charles R.; Leavitt, Keith; Viganola, Domenico; Clemente, Elena Giulia; Gordon, Michael; Dreber, Anna; Johannesson, Magnus; Pfeiffer, Thomas; Uhlmann, Eric Luis (1 November 2020). "Creative destruction in science". Organizational Behavior and Human Decision Processes. 161: 291–309. doi:10.1016/j.obhdp.2020.07.002. ISSN 0749-5978. S2CID 224979451.
^ Tierney, Warren; Hardy, Jay; Ebersole, Charles R.; Viganola, Domenico; Clemente, Elena Giulia; Gordon, Michael; Hoogeveen, Suzanne; Haaf, Julia; Dreber, Anna; Johannesson, Magnus; Pfeiffer, Thomas (1 March 2021). "A creative destruction approach to replication: Implicit work and sex morality across cultures". Journal of Experimental Social Psychology. 93: 104060. doi:10.1016/j.jesp.2020.104060. ISSN 0022-1031. S2CID 229028797.
^ Ince, Darrel C.; Hatton, Leslie; Graham-Cumming, John (22 February 2012). "The case for open computer programs". Nature. 482 (7386): 485–488. Bibcode:2012Natur.482..485I. doi:10.1038/nature10836. PMID 22358837.
^ Vuong, Q.-H. (2018). "The (ir)rational consideration of the cost of science in transition economies". Nature Human Behaviour. 2 (1): 5. doi:10.1038/s41562-017-0281-4.
^ Junk, Thomas R.; Lyons, Louis (21 December 2020). "Reproducibility and Replication of Experimental Particle Physics Results". Harvard Data Science Review. 2 (4). arXiv:2009.06864. doi:10.1162/99608f92.250f995b. S2CID 221703733.
^ ^a ^b Ioannidis, John P. A. (2016). "Anticipating consequences of sharing raw data and code and of awarding badges for sharing". Journal of Clinical Epidemiology (Commentary). 70: 258–260. doi:10.1016/j.jclinepi.2015.04.015. PMID 26163123.