The recent emergence of “predictive” coding tools, particularly in the field of e-discovery, represents a sea change in the market, offering another opportunity to use technology to improve the quality of document review with significantly greater accuracy and speed, and lower costs. In an effort to obtain buy-in from law firms and their clients, many vendors have intentionally oversimplified their predictive coding solutions – in both development and marketing. However, predictive coding is not trivial; the challenge of accurately and defensibly classifying huge volumes of unstructured data requires a sophisticated mathematical and scientific solution and experts to attest to its validity. Rational Retention’s intelligent coding offering is a uniquely powerful technology and service that allows us to codify the decision processes of an expert reviewer into a statistical model, and then automatically apply that model across a broad set of data. Each model is completely customized for the issue and the dataset at hand. In this way, Rational Intelligence will ensure a cost-efficient, speedy, accurate, transparent, and defensible output.
Challenges of the Current Review Model
Today’s complex litigations involve overwhelming volumes of data. Growth in the amount of data is not only driven by the ease of communicating in writing today, but also by the inability of corporations to enforce retention policy and discard documents no longer required to be maintained. It should be no surprise then that review efforts are equally large and inefficient: during the first phase review, hundreds of often poorly trained staff and contract lawyers rush through to review documents before discovery deadlines. Even if quality could be controlled, the number of lawyers required to review millions of documents and the grueling nature of the task inevitably results in inconsistent and inaccurate coding decisions. High costs, long lead times, and human error have all become accepted byproducts of today’s typical review process.
RR’S Intelligent Coding Technology
Rational Intelligence (“RI”) enables customers to model characteristics unique to a small sample of documents and to automatically apply that model to code any number of documents. Since the quality of the output depends heavily on the input, subject-matter experts are needed to create a training set of documents for each issue by reviewing documents in a straightforward and natural manner. Typical training sets comprise of only a fraction of the total document population, allowing clients to leverage the knowledge of the best reviewers across the entire dataset. RI’s technology experts work directly with clients to design and advise on creating representative and compact training populations, specifically tailored to the document population and the substantive needs of the matter. RI is not limited to coding for responsiveness and privilege – subject-matter experts can train the system for any and every relevant issue. Once a model is run against the entire corpus of available documents, the population is winnowed down to a manageable size for more senior lawyers to review. The result is a more accurate, more consistent, faster document review, conducted at a fraction of the expense of manual review.
Rational Intelligence accepts data from a variety of sources, including all of the primary e-discovery review platforms. RI clients will receive access to a custom, dedicated review environment specifically designed to allow for model training and validation during engagements. Clients will also have access to our robust, industry-leading data processing technologies should they be required.
The process of applying our leading technology in litigation is extremely important to increase accuracy, and most importantly, ensure the defensibility of the coding decisions. During each checkpoint, RI experts review the results to continually guide the classification process. At the end of the process, the client is provided with a comprehensive report detailing the information gathered throughout.
The Team Behind Rational Intelligence
To ensure that our document classification technology was built on the most cutting-edge methodology, Rational Retention (“RR”) partnered with a team of leading bioinformatics researchers from the New York University Center for Health Informatics and Bioinformatics, led by Dr. Constantin Aliferis. Working with Rational Retention's leadership team, including Chief Architect, Dr. Konstantin Mertsalov, the Aliferis team used its experience in biomedical applications of classification technology to develop Rational Intelligence. The NYU Health Informatics and Bioinformatics Center, under the leadership of Dr. Aliferis, has made major breakthroughs over the span of the last decade in the development of software, algorithms, and theory in the interpretation of super-high-dimensional data, toward unraveling mechanisms of disease for robust predictive modeling of complex biological systems. Several of the algorithms, protocols, and software developed by the NYU group enjoy thousands of research- and industrial-registered users, including major universities, pharmaceutical companies, and IT companies. Much of their research and innovation involves the classification of unstructured text. They have created and patented Causal Graph and Markov Blanket discovery algorithms to increase accuracy, decrease runtime, reduce the size of training sets, and increase defensibility.
RI's Accuracy and Defensibility Versus the Competition
The Rational Intelligence coding toolset is based on our own patented Markov Boundary, Causal Graph, Support Vector Machine, and other cutting-edge high-dimensional data classification methods. Our team has established these methods over decades of research and through rigorous testing across thousands of datasets and classification iterations. The team has also addressed the technology and approach with extensive academic peer review through 120 publications, including nine patents, four books, software systems, and academic papers. In text classification, one size does not fit all. Unlike our competitors, RI is able to use different classification methods based upon the unique characteristics of the data at hand. We recently completed the most comprehensive comparison of classification methods conducted to date: 30 of the most widely applied classification engines were tested against 20 feature extractors across 240 unique data sets. We created and tested over 100,000 unique state-of-the-art protocols to learn how the technologies perform on various data sets through empirical evidence.
The results led to some important conclusions:
- There were many classification techniques and algorithms common in the marketplace that consistently underperformed;
- All approaches, even underperforming ones, performed well on at least a limited number of datasets, which allows almost any provider to point to a successful result;
- Certain approaches were consistently extremely high-performing, but not universally nor absolutely so;
- Feature compression techniques leading to explainable and transparent models and faster execution of models can be applied while maintaining the performance of the classifiers; and
- Due to the variability in accuracy of the various approaches, having the skills to efficiently deploy multiple approaches and accurately measure the results is paramount to a successful outcome.
While one method may be best for a particular issue or dataset, it is necessary to be able to adjust classification techniques on a case-by-case basis. Thus, it is critical to have a suite of leading technologies available and the expertise necessary to properly deploy and evaluate the right approach for each case. This strategy is at the core of the Rational Intelligence offering.
To further enhance the defensibility of our classification technology, RI employs rigorous quality control and validation processes to ensure that models’ confidence levels are acceptable and accurate, without over-fitting the model to the training set. In addition, the Rational Retention solution transparently displays the unique characteristics of the documents that lead to coding decisions, giving law firms, their clients, and the courts confidence in the quality of our product.
Legal Considerations for Using Rational Intelligence
Increasingly, companies and law firms are embracing the use of predictive coding technology to manage data involved in litigation, and the courts have taken notice. In an article published in October, 2011, US Magistrate Judge Andrew Peck embraced the use of "computer-assisted coding … in those cases where it will help ‘secure the just, speedy, and inexpensive' (Fed. R. Civ. P. 1) determination of cases.” In Da Silva Moore v. Publicis Groupe et al. (February, 2012), Peck ruled on protocols surrounding the use of predictive coding for document production:
Computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the "first" or "guinea pig" for judicial acceptance of computer-assisted review. As with keywords or any other technological solution to ediscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b)(2)(C) proportionality.
Unfortunately, however, it is abundantly clear that most legal professionals do not possess even a cursory knowledge of the technology or process – leading many to question its legitimacy entirely. Although the underlying theory is scientifically sound and provable, the market is quickly becoming crowed with inferior coding tools and oversimplified approaches. Therefore, to ensure defensibility, it is crucial that the legal team is supported by real experts prepared to guide the process and testify as to the validity of the results. For courts that remain skeptical, predictive coding may be applied more strategically: RI can be used to analyze opposing party documents internally, requiring only client buy-in. Indeed, in cases where it is strategically advantageous to produce vast amounts of data, the use of predictive coding may not even be desirable. As best practices are developed around this technology, it is foreseeable that predictive coding will become the sole basis on which productions are determined.
Integration with Rational Retention and Rational eDiscovery
Although RI can be used as a standalone product, RI also integrates directly with Rational eDiscovery (“ReD”), RR’s hosted litigation repository. ReD is the most powerful and cost effective product in the marketplace for storing and organizing documents involved in litigation.
Along with intelligent coding, ReD incorporates other powerful functionality, including: (1) advanced search and analytics tools, which allow RR to identify and refine a set of relevant custodians and documents; (2) an integrated workflow engine that tracks and automates the handling of all loading, culling, search, review, and production activities; and (3) self-provisioning of all essential site activities, such as loading and producing documents, setting security, and organizing reviews. Rational Retention also employs the advanced classification software in our corporate products in the context of information lifecycle management and discovery response. Here, the intelligent coding technology is able to automatically classify documents into retention policies, enabling clients to organize (and pare down) their massive volumes of stored data without burdening end-users. Moreover, RR’s agents provide an open port to data stores, allowing additional data to be collected without interfering with an enterprise’s normal operations. Also, by enforcing retention policy, RR ensures that the only available data is that defined by the enterprise’s retention program.