Technology Assisted Review: Or, How I Stopped Worrying and Learned to Love a Computer Program (PART TWO)

Russell Beets Amy Catton May 22nd, 2018

PART TWO: This is part two of a series on my journey to appreciating TAR. Part one addressed defined TAR, described how it worked and provided tips on when you should consider using TAR. Part two addresses the TAR process, advantages and disadvantages of TAR, and my current thoughts on using the technology.

As I wrote in part one, recently, I (Russ Beets) began work on a complex litigation case that had millions of documents to review with many moving parts and quick deadlines. As an alternative to running targeted searches, we decided to utilize technology assisted review or “TAR” (also known in the industry as predictive coding). Amy Catton and Clara Skorstad, two Senior Project Managers on my team, are seasoned experts in TAR and helped with the drafting of this blog. Amy managed my most recent TAR project and Clara is a member of Duke University’s EDRM Technology Assisted Review Project Team, working to develop best practices. If anything below sounds like it came from a technical expert, it likely came from one or both of them.

The TAR Process

Once you have made the decision to utilize TAR in your case, it is imperative that you set up a clearly delineated protocol to ensure that the process goes smoothly and efficiently.

Proper Training

The maxim “Garbage In, Garbage Out” certainly applies in the case of TAR. If you train the system erroneously, you are wasting your time (and money!), as you are going to get back a set of documents that are not going to be helpful. Careful consideration needs to be given to selecting the proper people or “experts” to assist with teaching the computer. These persons need to be knowledgeable about the case and have the ability to make confident decisions about what documents are relevant and not relevant, as such decisions are going to have lasting impact down the road. When possible, it may be beneficial to have two or more attorneys review the initial documents collaboratively, reaching a consensus on the relevance designation for each document, in order to make sure there is a more uniform system of properly coding documents.

It is important to note that the TAR system focuses on a document’s content rather than any metadata, such as date or custodian. As a result, the To, From and Subjects of an email are irrelevant for the purposes of TAR. Additionally, the system learns nothing from photos or numbers ( so if the case involves a particular patent number, the number itself could not be used to train the system). When teaching the system, it is therefore critically important to consider whether the document is a good example from which the system can learn.

Creating a Seed Set

It helps if there is already a set of documents coded that can be deployed into a TAR workflow. If not, as an alternative, the computer can randomly select a set of documents for review. In this situation, it may take several rounds before a proper seed set (alternately called the “training set”) of documents can be developed, as the computer must learn which documents are relevant and which documents are to be rejected. The decisions made on a seed set create the data used to teach the computer how to recognize patterns of relevance in the greater universe of documents, thus facilitating better categorization.

Building a Control Set

The control set is also very important during this process. This is a random, representative set of documents from the entire population of documents that the reviewer codes as responsive or not responsive. This set then acts as the standard (the control) against which the results of the TAR analysis are tested. The control documents measure how well the system has been trained and will ultimately help with determining when training is completed.

Training

There are two types of training. (1) Automatic training utilizes a set of documents that have already been reviewed by a review team (not the designated experts) and are fed into the system. The system then uses the relevance tags from what has already been reviewed and applies those tags to determine relevance for the remaining documents. (2) Manual training is a more preferred approach as it involves an intensive, dedicated effort by an expert, resulting in more consistency than input from a large review team.

Quality Control Review

Once the training set is complete, the reviewers can then start performing quality control reviews of the sets. During this process, the reviewer is checking for several factors, including whether a document itself is relevant to the review, or specific to an issue that you are researching, as well as determining whether the document itself is a good example for TAR (as described above). There is no hard and fast rule for how many quality control sets you will need to review to reach stabilization. Stabilization occurs when additional training will not affect the computer’s ability to determine whether a document is relevant or not. For large projects, this can be as few as 5 rounds of quality control or as many as 10. Even after stabilization is reached, the emerging best practice is to “test the rest,” which means to test documents below the cut-off line to ensure that they in fact are not relevant.

Final Review

Once stabilization is reached, a determination will be made regarding which documents will need document-by-document review. For example, you may review the top 50% to 75% of TAR-generated relevant documents.

In the end, I discovered that I was needed in all stages of this process – seed sets, control rounds, training rounds, quality control and document-by-document review. My position was not even close to becoming obsolete – and because I became a subject matter expert, I had increased my value to the case team!

Advantages of TAR

There are considerable advantages to TAR, including the following:

The ability of TAR to take a massive number of documents and reduce them to a far more manageable set by excluding from review documents that are very likely not relevant, saving time and money.

Allows for a more consistent review, minimizing the human error that results from less uniform application of relevance standards.

Properly designed TAR processes uncover more relevant documents than a traditional human review and at a lower cost.

Allows case teams to more quickly assess facts and issues by focusing on the most relevant documents without comprehensive review of a large data set, saving time and further reducing costs.

Courts are beginning to understand the benefits of TAR and accept (or even promote) its use to streamline the discovery process, increase efficiencies and decrease costs.

Disadvantages of TAR

While TAR has proven to be incredibly helpful in some cases, it is not always the best option.

There is no industry standard for TAR software, so not all software is going to be equally effective. It may take some trial and error before finding the right software to suit your case’s needs.

TAR software is effective only with certain types of documents. It relies heavily on documents with rich text information to analyze. It is not able to evaluate documents like spreadsheets (numbers), blueprints, schematics, or any documents that do not contain adequate searchable text. Furthermore, certain file types like video and audio files are not easily analyzed. This type of information may be critical in some cases, and thus a more traditional human review would be appropriate (at least for these file types).

TAR is only effective where experienced attorneys (or experts) have spent significant time sufficiently training the computer. If a seed set is not properly developed, this can lead to a flawed learning process and can create huge problems throughout the life of the production.

Note About Privilege Review

TAR may not be the best option for a privilege-only review. Unlike relevancy, the complexities associated with a privilege review may not be predictable by an automated process. For example:

Whether a document is privileged may only be evident from fields that are not considered by TAR algorithms, such as the To and From fields.

The identification of privileged information may require a subjective judgment call regarding whether legal advice was sought and/or provided.

Whether a document falls under the protections of marital privilege, common interest privilege and/or joint defense agreements may be even more nuanced.

Privilege may vary from document to document even if the content is similar. For example, content may be privileged in one document but no longer be privileged if it is forwarded to a third party in another document.

Waiver of privilege may extend to the subject matter of a document, even if the text of the documents differs.

TAR algorithms are not able to consider the events surrounding the creation of a document, so a document that is privileged only by virtue of its reference in or to another document may not be properly categorized as privileged.

In some instances, a privilege call affects only part of the document and redactions are needed, while in other instances, the entire document is excluded from the production. In-house counsel may serve multiple roles, including a business role, which may render a communication not privileged.

For the above reasons, employing TAR to identify privileged information presents several risks and the cost savings associated with fewer hours spent combing through documents may not justify these risks. Utilizing TAR to filter out documents that are clearly irrelevant, combined with human review for privilege, will likely yield the best results.

Conclusion

While it can seem frightening at first to put your faith in the hands of a computer program, the biggest takeaway from TAR is that when done correctly, attorneys are always going to be involved in the review and sampling processes, from the formative stages of a case to preparation of the final production. There is no blind reliance on a computer to do an attorney’s work. Instead, TAR cuts through the chaff to get to the wheat, eliminating the need to sift through a myriad of extraneous documents. The most important documents are then put in place for attorneys to review for early case assessment and/or litigation assistance. TAR is not meant to replace standard review processes and protocols, but instead to help streamline those processes so that review can be more targeted, fruitful and efficient.

DISCLAIMER: The information contained in this blog is not intended as legal advice or as an opinion on specific facts. For more information about these issues, please contact the author(s) of this blog or your existing LitSmart contact. The invitation to contact the author is not to be construed as a solicitation for legal work. Any new attorney/client relationship will be confirmed in writing.

Russell Beets

Senior E-Discovery Attorney

Contact Russell
Amy Catton

Senior Project Manager

Contact Amy

View the discussion thread.

Newest Posts

Now You See Me, Now You Don’t: Ephemeral Messaging Challenges

The emergence of ephemeral messaging applications to communicate with friends, family, and coworkers quickly, securely, and effortlessly has boomed over the past decade. In that time, users of ephemeral messaging apps have risen significantly, from millions to billions of active users! Ephemeral messaging is an integral and evolving part of both individual and company communications with advantages and challenges impacting companies and the legal industry. Companies must consider the ramifications of their employees' use of ephemeral messaging and adopt policies and procedures to best protect themselves and comply with requirements relating to litigation and regulation. Courts have already begun addressing the failure to preserve relevant ephemeral messages and have been issuing sanctions in a myriad of legal challenges. Ephemeral messaging is here to stay, and companies, courts, and attorneys must understand and adapt to evolve with this emerging technology.
Generative AI E-Discovery Tools and the Importance of Prompt Engineering

The explosion of LLMs (large language models) and other Generative AI tools designed to increase workplace efficiency and productivity has created a new lexicon of jargon and definitions. The term “Prompt Engineering” may be the most widely used new term to describe the method of using LLMs and Generative AI tools. Prompt engineering is the creation of text-based instructions or cues that a person uses to direct LLMs or other generative AI tools. These instructions are then interpreted by the AI tool. Learning to harness the power of e-discovery AI Tools like Relativity’s aiR for Review through prompts specifically constructed to produce the output you desire in your case can increase your efficiency. Implementing these prompt creation and iteration tips and tricks will help you use Relativity’s aiR for Review tool to more effectively to assist with litigation, document review, and deposition or trial preparation.
Solving the “‘Privilege Log’ Problem”: Proposed Changes to Federal Rules of Civil Procedure 16 & 26

Changes to Federal Rules of Civil Procedure 16(b) and 26(f) are projected to come into effect in December 2025 that have been promulgated to address what proponents refer to as the “‘privilege log’ problem.” December 2025 is still relatively far in the future and whether the proposed amendments to Rules 16(b) and 26(f) are actually adopted is still subject to additional approvals, including approval by the Supreme Court. Nevertheless, bear in mind that nothing in the current Federal Rules of Civil Procedure prevents the parties from implementing the more proactive approach the proposed amendments seek to achieve.

The TAR Process

Proper Training

Creating a Seed Set

Building a Control Set

Training

Quality Control Review

Final Review

Advantages of TAR

Disadvantages of TAR

Note About Privilege Review

Conclusion

Russell Beets

Amy Catton

Subscribe to the E-Discovery Newsletter

Related Posts

Data Mapping - Why is it Important for Successful E-Discovery?

Pitfalls of Complex Search Protocols in ESI Agreements

Newest Posts

Now You See Me, Now You Don’t: Ephemeral Messaging Challenges

Generative AI E-Discovery Tools and the Importance of Prompt Engineering

Solving the “‘Privilege Log’ Problem”: Proposed Changes to Federal Rules of Civil Procedure 16 & 26