Predictive Coding is Coming. Let It.

“Predictive coding”, named the 2011 buzzword in legal technology on Above the Law, had an even bigger year in 2012. Though the benefits of the technology have been made clear (and are multifold), many litigants and attorneys remain skeptical. I argue that attorneys (and judges) ought to seek to better and more quickly understand predictive coding, so that they may more warmly and smoothly embrace its inevitable proliferation.

“Predictive coding”, also referred to as “technology-assisted review” or “content-based advance analytics,” is a technique for sorting through enormous collections of Electronically Stored Information (“ESI”) in order to produce only those documents which are responsive of an opposing litigant’s request and which are not subject to a privilege exempting them from discovery. There are multiple approaches to predictive coding, but the basic process begins when a computer creates a set of rules derived from the review and subsequent indexing (or “coding”) of a sample set of documents by the party responding to the discovery request. Those rules are then applied to the entire set of documents, producing a set of only those documents that conform to the rules established in the sample. Samples of the excluded documents are then evaluated and recoded, the rules are adjusted accordingly, and the process is iterated until the parties are sufficiently confident that the documents found by this process are only those necessary to fulfill the obligations of discovery. Usually, those documents are then reviewed by human eyes.

Predictive coding is an alternative to the more traditional “keyword search.” Keyword searches rely on human review of documents that contain any of a set of keywords, as defined through discussion among litigants. A keyword search reviews only a limited set of results; it is not, like a predictive coding search, a review of the entire body of ESI. Though they remain relevant–even preferable in certain situations–keyword searches have frequently been compared to a game of “Go Fish.” Stated perhaps less derisively “[s]earching for an answer on Google (or Westlaw or Lexis) is very different from searching for all responsive documents in the FOIA or e-discovery context.” Nat’l Day Laborer Org. Network v. U.S. Immigration & Customs Enforcement Agency.

2012 marked the first time a court compelled the use of predictive coding in discovery over the objection of a party. See Global Aerospace v. Landow Aviation. Similarly, In EORHB, Inc. v. HOA Holdings, LLC, Delaware’s Chancery Court, sua sponte, requested that both parties not only use predictive coding but also use a common provider.

The rulings of two other courts on predictive coding, however, reveal precisely where the battle for the future of e-discovery will be fought: in the details of the process (this is confirmed by academic proponents of the technology).  The marquee case on predictive coding of the year, Moore v. Publicis Groupe SA, like Global Aerospace, compelled the use of predictive coding over the objections of a party. Unlike Global Aerospace, however, the objections in Moore were over the procedure for executing predictive coding, not the use of the technology itself. Similarly, an Illinois court refused to step in to demand the use of predictive coding in Kleen Products LLC v. Packaging Corp. of Am., for fear that the litigants would not be able to come to a mutually permissible plan for implementing the technology.

In many of these cases, the courts rely on the recommendations of the Sedona Conference, a forum that recommends best practices for e-Discovery. The Sedona Conference’s proposals are published and submitted to the judiciary for their perusal, and sometimes endorsement. The Sedona Conference envisions a cooperative discovery process, in which the litigants agree on the standard for confidence in the system’s accuracy (e.g., Is it sufficient to re-sample the excluded set three times? Or must the responding party test the set four times?).

The current dialogue on predictive coding demonstrates two important points: (1) its inevitable proliferation is all but certain, and (2) opposing litigants will need to cooperate in order to ensure its effectiveness. As this article (also linked above) demonstrates, the technology–particularly its shortcomings–is misunderstood. With such misinformation, and the fear it induces, the cooperation necessary for the beneficial use of predictive coding will be elusive. Legal practitioners should quickly become more acquainted with this technology, its benefits and criticisms, and the newly expanding body of law surrounding it so that their first reaction to the words “predictive coding” is not one of fear, but of hope for a cheaper, faster, and more accurate discovery process.


About Bill Toth

Bill Toth is a 3L at Columbia Law School, Class of 2014. He is the Editor-in-Chief of the Columbia Science and Technology Law Review Volume XV (2013-2014). Bill will be clerking for Judge William H. Alsup of the U.S. District Court for the Northern District of California.

Leave a Reply