Peer Review and Daubert: The Uncertain Science of Evaluating Scientific Certainty

Peer review is the bedrock of scientific publication and is used by courts to determine the evidentiary reliability of a proffered expert witness. Should peer review play such a role, and if so, how much weight should it be given?

I.       Daubert: Judges as Gatekeepers

Daubert v. Merrell Dow Pharmaceuticals[1] established the role of federal judges as “gatekeepers” of scientific evidence, preventing “junk science” from being presented to a jury. Rather than the Frye test, a single-factor “general acceptance in the community” standard, the Court identified a non-dispositive and non-exhaustive list of elements affecting relevance and reliability. These elements include whether the data or methods had been tested, whether the research had been published and peer reviewed, whether there was a known error rate for the research results, and whether the research was generally accepted by the relevant scientific community.[2]

Daubert directs the judge to scrutinize the relevance and reliability of an expert’s testimony.[3] The question of relevance requires the court to make a factual examination, to which peer review can say nothing.[4] However, the reliability of proffered evidence depends on the soundness of the methods underpinning it. Despite repeated references to the “scientific method,” there is no single method employed by all scientific disciplines.[5] Individual practitioners of the science in question are in the best position to know what methods are best to address a given scientific question.[6] When a judge looks to peer review as an indicator of scientific reliability, they are in essence deputizing the scientists who reviewed the paper, asking, “were the methods of this paper sound? Are the conclusions warranted based on the evidence?” The question remains – how much weight should be granted to the deputy’s response?

II.    What is Peer Review?

The modern system of peer review is neither as old nor as universal as commonly thought.[7] In 1936, Albert Einstein, shocked that his manuscripts had been sent for evaluation before publication, withdrew his manuscript and vowed never to submit to that journal again.[8] Peer review was neither formalized nor universally adopted until the 1960’s.[9] However, since then, peer review has become the standard practice for publishing scientific findings.

Once a team of scientists completes a research project, they prepare a manuscript describing their findings and submit it to a journal.[10] After initial review by an editor, the majority of manuscripts are submitted to one or more subject-matter experts (referees) who generally review the paper both for scientific validity and importance.[11] Referees typically examine methodology, originality, amount of detail, writing quality, and (rarely) evidence of fraud.[12] Once the referees finish, they submit a brief report summarizing their findings to the editor of the scientific journal, who must then decide whether or not to accept the manuscript for publication.

III. Efficacy and Prevalence of Peer Review

Peer review is generally seen as a mark of article quality.[13] However, there is surprisingly little empirical data evaluating its effectiveness at this task. The authors of one systematic review on the efficacy of peer review[14] concluded, “the practice of peer review is based on faith in its effects, rather than on facts.”[15] It is generally accepted that peer review is ineffective at detecting outright fraud. [16] It may not be particularly good at detecting errors, either.[17] Finally, it may be biased,[18] susceptible to abuse,[19] and unduly burdensome on all parties involved.[20]

Despite its flaws, peer review is considered to be the “gold standard” of scientific communication.[21] Scientific publications are generally not considered to be part of the “body of scientific knowledge” until they have passed the gatekeepers of peer review.[22] Occasionally, however, valid scientific research is not peer-reviewed. The peer review process takes time, and a meritorious study may not yet have had time to be completely vetted.[23] Such may be the case for research conducted specifically in support of litigation.[24] In addition, many corporations conduct scientific research which they choose not to publish.[25] National security considerations might also prevent publication, especially in cryptography and microbiology/virology.[26]

IV. Conclusions for Judges

Journal referees and judges both act as gatekeepers, sifting the scientific wheat from the nonscientific chaff. The criteria they employ are even roughly analogous: the referee’s search for scientific validity is analogous to the judge’s determination of evidentiary reliability,[27] while scientific importance is roughly analogous to relevance (in that it relies on a fact-specific inquiry of the context in which the research is published).

Peer review “increases the chances that substantive flaws in the methodology will be detected.”[28] How often flaws are detected is a subject of intense debate, however, and it is undisputed that many errors escape detection. Because statistical errors are especially likely to be overlooked, judges should direct extra scrutiny to research involving sophisticated statistical manipulation. Judges should also consider post-publication indications of research quality. Although citations alone do not guarantee article quality, highly cited papers are more likely to have been scrutinized by experts, and there are more chances for methodological errors to be discovered.

The absence of peer review is not a dispositive inquiry.[29] Testimony based primarily on unpublished research may be admissible. However, the absence of peer review should (and usually does) raise red flags and prompt further inquiry because of what it signals. Driving on the left side of the road is not inherently dangerous (the British seem to do fine), but when everybody drives on the right, a left-side drive may signal a problem. In a scientific culture where peer-reviewed publication is an overwhelming norm, unreviewed results are suspect unless justified.

Before admitting testimony premised on unpublished research, the judge should first require a justification for such a deviation. The judge should also attempt to subject the research to a “comparable vetting process” to peer review.[30] Additional experts, preferably those without a stake in the outcome of the research, might be asked to submit reports analogous to those requested of referees. Such reports should not be restricted to the expert’s conclusions but should evaluate the strengths and limitations of the research.

V.    Emerging Trends in Peer Review

Within the past decade, new trends in scientific publishing have emerged to address the perceived deficiencies of the peer review system. Some of these trends have gained a modicum of acceptance, but few journals have significantly deviated from the general structure outlined above and the vast majority of published scientific papers are still peer reviewed as they were before. However, as these alternate forms of scientific publishing gain in prominence, judges may need to alter the way they think about peer review in the Daubert context.

V.A.                     Peer Review “Light”

Some journals have modified the criteria by which articles are evaluated by referees. The Public Library of Science (PLoS) is a nonprofit scientific publisher.[31] Referees for PLoS journals still evaluate the technical soundness of the articles, but pass no judgment on the importance or novelty of the research, since readers “are most qualified to determine what is of interest to them.”[32] PLoS recognizes the need to evaluate research significance, so they are developing article-level-metrics (ALMs) to assess the impact of a paper after it is published.[33] Since the inquiry into scientific reliability does not directly probe the innovativeness or novelty of a scientific discovery, the PLoS-style “peer review light,” if widely adopted, would not directly affect a Daubert analysis. However, scientists themselves need ways of evaluating article significance. Widespread adoption of peer review light would necessitate more ALM development, and judges might be able to use these metrics to gain a deeper insight into other relevant Daubert factors, such as general acceptance in the community.

V.B.                     Preprints

For most of the last century, a precondition of scientific publication was that the submitted manuscript not have been disclosed elsewhere.[34] However, within the last 20 years, the physics community has developed a system for pre-publication of scientific manuscripts, primarily because of the extreme delays in the peer-review process.[35] arXiv (pronounced “archive”) is a scientific pre-print service owned and operated by Cornell University,[36] where scientists upload manuscripts or other scientific documents prior to (or at the same time as) submission to a journal for peer review.[37] Recently, some life scientists have also begun uploading preprints.[38]

Preprint services are intended to supplement, not replace, traditional peer review.[39] Most scientists regard preprint results as preliminary, not definitive, and the court should adopt the same attitude. However, a small minority of scientists envision a more radical preprint service. Since authors can receive public feedback on pre-print manuscripts, some see a diminished need for formal peer review and emphasize the increased importance of post-publication evaluation.[40] If publication and peer review ever become disassociated, lack of peer review will cease to be a red flag, and judges may be required to dive more deeply into the merits of a published study before declaring it admissible.

V.C.                     Post-Publication Review

At least one journal, F1000Research, has attempted to invert the traditional publication timeline by subjecting papers to peer review after they have been accepted for publication.[41] This switch was intended to reduce publication delays and increase transparency. Publication times average 7 days between submission and publication, during which time the manuscript is checked for basic intelligibility and conformance to the scope of the journal.[42] Referees evaluate the paper for scientific merit and submit their recommendations, which are published alongside the article. All article revisions are available to the general public for comparison.

In theory, the order of publication events should not affect the usefulness of peer review in the Daubert analysis. In fact, this process may provide several distinct advantages. The open publication of referee reports may enable a judge to investigate, not simply the conclusions of the referees, but the specific strengths and weaknesses of the research. This may actually aid in evaluating the relevance of a research finding as well, as reviewers often identify limits to the conclusions that can be drawn.

VI. Conclusions

Although peer review is the cornerstone of scientific communication, it was not always so and may not always be so. Many have questioned its effectiveness, and some have compared the process to Churchill’s conception of democracy: it’s the worst system possible, except for everything else we’ve tried.[43]

To that end, peer review can serve as a useful signal for evidentiary reliability. Because of the current pervasiveness of peer review, scientific results which have not been peer reviewed should be subjected to careful scrutiny; a valid reason should be provided as to why they weren’t, and additional measures, such as informal peer review, would be helpful in identifying any methodological errors. If a result has been peer-reviewed, the inquiry does not end, because peer review is at best a minimum standard. The exact nature of the peer-review process should be discovered, and post-publication indicators, such as citations, should be examined to further reduce the risk of error.

The landscape of scientific publication is changing, and judges should be aware of how those changes affect the Daubert analysis. If peer review ever becomes truly decoupled from publication, additional questions will be warranted to ensure evidentiary reliability.

 

[1] Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993)

[2] Id. at 583.

[3] Id. at 597 (“The Rules of Evidence . . . do assign to the trial judge the task of ensuring that an expert’s testimony both rests on a reliable foundation and is relevant to the task at hand.”)

[4] Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 591 (1993), citing United States v. Downing, 753 F.2d 1224 (3rd Cir, 1985) (“An additional . . . aspect of relevancy . . . is whether expert testimony proffered in the case is sufficiently tied to the facts of the case. . .”)

[5] See generally Alan F. Chalmers, What is This Thing Called Science 148-160 (4th ed. 2014) (describing the adaptations of scientific methods that accompanied scientific progress)

[6] See generally Thomas Nickles, The Problem of Demarcation, in Philosophy of Pseudoscience: Reconsidering the Demarcation Problem 110 (Massimo Pigliucci, Maarten Boudry, eds., 2013) (Agreements between scientists about what is “good” and “bad” science derives primarily from implicit, practical knowledge, also referred to as “tacit” knowledge); see also Chalmers, supra note 5 at 7-9, 103-104 (Acquisition of tacit knowledge alters a scientist’s perception; such perceptions cannot be easily conveyed to a non-expert)

[7] Alex Csiszar, Troubled from the Start, 532 Nature 306, 306 (2016).

[8] Daniel Kennefick, Einstein Versus the Physical Review, 58, 9 Physics Today 43, 43-48. The manuscript was later published (without peer review) in another journal, but it had radically different conclusions than the original paper. Einstein may have been responsive to the criticism he received, even if offended with its manner of delivery.

[9] See Melinda Baldwin, In Referees We Trust, 70, 10 Physics Today 44, 46-47 (2017); Mark Ware, Peer Review: Benefits, Perceptions, and Alternatives at 6 (PRC Summary Papers 4, 2008)

[10] Peer review is somewhat like pornography: it can be difficult to define, but those who are familiar with it know it when they see it. See Jacobellis v. Ohio, 378 U.S. 184, 197 (1964) (Stewart, J., concurring). What follows is a high-level generalization of the peer review process, drawn from the author’s brief experience in a research laboratory. Most scientific journals adopt some version of this process.

[11] Ware, supra note 9 at 10-11

[12] Id.

[13] Scrutinizing Science: Peer Review, Understanding Science, https://undsci.berkeley.edu/20article/howscienceworks_16 (last visited December 8, 2017) (comparing peer review to the “Inspected by # 7” sticker on garments).

[14] Tom Jefferson et. al, Effects of Editorial Peer Review: A Systematic Review, 287 JAMA 2784 (2002) (A systematic review of experiments on the effects of peer review in biomedical publications)

[15] Caroline White, Little Evidence for Effectiveness of Scientific Peer Review, 326 BMJ 241 (2003).

[16] Richard Smith, Peer Review: A Flawed Process at the Heart of Science and Journals, 99 J. Royal Med. Soc. 178,178 (2006) (Raising the questions, “who is a peer?” and “what is review?”)

[17] Id. at 179, 182 (The author, an editor of a scientific journal, deliberately inserted errors into papers before sending them for review. The majority of the errors were never discovered by the referees).

[18] Kyle Siler, Kirby Lee, & Lisa Bero, Measuring the Effectiveness of Scientific Gatekeeping, 112 Proc. Nat’l. Acad. Sci. 360, 364 (2015) (Peer reviewers show a bias against novel or groundbreaking research); Robert K. Merton, The Matthew Effect in Science, 159 Science 56, 57 (1968) (A scientist with a career of high achievements is likely to receive more recognition for an individual paper than a lower-achieving scientist submitting the same results); Smith, supra note 16, at 180 (There is a demonstrable bias against reporting negative results).

[19] See generally Smith, supra note 16, at 180 (describing reviewers who plagiarize from articles they are asked to review); Rafael D’Andrea & James P. O’Dwyer, Can Editors Save Peer Review from Peer Reviewers?, 12 PLoS One 1,2 (2017) (Highlighting the dangers of lazy reviewers erroneously endorsing low-quality papers).

[20] Alicia Newton, The Sustainability of Peer Review, in SpotOn Report: What Might Peer Review Look Like in 2030? at 14 (2017) (In 2015, referees spent 13 – 20 billion unpaid hours participating in the peer review process.)

[21] Jonathan P. Tennant et. al, A Multi-Disciplinary Perspective on Emergent and Future Innovations in Peer Review [version 2; referees: 2 approved] 6 F1000 Research at 4 (2017).

[22] Id. at 8.

[23] Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 593 (1993) (“Some propositions, moreover, are too particular, too new, or of too limited interest to be published.”)

[24] See generally National Research Council, The National Academies, Review of the Scientific Approaches Used During the FBI’s Investigation of the 2001 Anthrax Letters, (2011)

[25] Committee on Science, Engineering and Public Policy, The National Academies, On Being a Scientist: A Guide to Responsible Conduct in Research, 34 (3rd ed., 2009)

[26] Id.

[27] 509 U.S. at 590 n.9 (“In a case involving scientific evidence, evidentiary reliability will be based upon scientific validity.”) (emphasis in original).

[28] 509 U.S. at 593.

[29] Id at 594. (“The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration. . . .”)

[30] Committee on Science, Engineering and Public Policy, supra note 25, at 33.

[31] Who We Are, PLoS.org https://www.plos.org/who-we-are (last visited Dec. 10, 2017).

[32] Journal Information, PLoS.org: PLoS One, http://journals.plos.org/plosone/s/journal-information (last visited Dec. 10, 2017).

[33] A Comprehensive Assessment of Impact with Article-Level Metrics (ALMs), PLoS.org, https://www.plos.org/article-level-metrics (last visited Dec 10, 2017).

[34] Id. at 34

[35] Claire Fiala & Eleftherios P. Diamandis, The Emerging Landscape of Scientific Publishing, 50 Clinical Biochemistry 651, 653-654 (2017).

[36] General Information About arXiv, arXiv.org, https://arxiv.org/help/general (last visited Dec. 10, 2017)

[37] Id.

[38] Fiala & Diamandis, supra note 35 at 653-654. Physicists primarily use a single preprint depository, arXiv. In contrast, there are multiple competing biological preprint depositories. Id.

[39] Id. See also Thomas Annesely et. al, Biomedical Journals and Preprint Services: Friends or Foes?, 63 Clinical Chem. 453 (2017) (Most studies agree that the vast majority of preprints – not including conference proceedings or similar documents – are correspond to a published journal article)

[40] See Dawlmeet Singh Chawla, When a Preprint Becomes the Final Paper, Nature Research Highlights: Social Selection (Jan. 20, 2017) https://www.nature.com/news/when-a-preprint-becomes-the-final-paper-1.21333 (describing an evolutionary geneticist who announced via Twitter that he regarded a preprint uploaded to bioRxiv as his “final version.”).

[41] How it Works, F1000Research.com, https://f1000research.com/about (last visited Dec. 10, 2017)

[42] Id.

[43] See, e.g. Karim Khan, Is Open Peer Review the Fairest System? No., 341 BMJ 1 (2010) http://www.bmj.com/content/341/bmj.c6425. (Last visited Feb. 13, 2018)

Comments are closed.