Technology Assisted Review: Or, How I Stopped Worrying and Learned to Love a Computer Program (PART TWO)

Russell Beets Amy Catton May 22nd, 2018

PART TWO: This is part two of a series on my journey to appreciating TAR. Part one addressed defined TAR, described how it worked and provided tips on when you should consider using TAR. Part two addresses the TAR process, advantages and disadvantages of TAR, and my current thoughts on using the technology.

As I wrote in part one, recently, I (Russ Beets) began work on a complex litigation case that had millions of documents to review with many moving parts and quick deadlines. As an alternative to running targeted searches, we decided to utilize technology assisted review or “TAR” (also known in the industry as predictive coding). Amy Catton and Clara Skorstad, two Senior Project Managers on my team, are seasoned experts in TAR and helped with the drafting of this blog. Amy managed my most recent TAR project and Clara is a member of Duke University’s EDRM Technology Assisted Review Project Team, working to develop best practices. If anything below sounds like it came from a technical expert, it likely came from one or both of them.

The TAR Process

Once you have made the decision to utilize TAR in your case, it is imperative that you set up a clearly delineated protocol to ensure that the process goes smoothly and efficiently.

Proper Training

The maxim “Garbage In, Garbage Out” certainly applies in the case of TAR. If you train the system erroneously, you are wasting your time (and money!), as you are going to get back a set of documents that are not going to be helpful. Careful consideration needs to be given to selecting the proper people or “experts” to assist with teaching the computer. These persons need to be knowledgeable about the case and have the ability to make confident decisions about what documents are relevant and not relevant, as such decisions are going to have lasting impact down the road. When possible, it may be beneficial to have two or more attorneys review the initial documents collaboratively, reaching a consensus on the relevance designation for each document, in order to make sure there is a more uniform system of properly coding documents.

It is important to note that the TAR system focuses on a document’s content rather than any metadata, such as date or custodian. As a result, the To, From and Subjects of an email are irrelevant for the purposes of TAR. Additionally, the system learns nothing from photos or numbers ( so if the case involves a particular patent number, the number itself could not be used to train the system). When teaching the system, it is therefore critically important to consider whether the document is a good example from which the system can learn.

Creating a Seed Set

It helps if there is already a set of documents coded that can be deployed into a TAR workflow. If not, as an alternative, the computer can randomly select a set of documents for review. In this situation, it may take several rounds before a proper seed set (alternately called the “training set”) of documents can be developed, as the computer must learn which documents are relevant and which documents are to be rejected. The decisions made on a seed set create the data used to teach the computer how to recognize patterns of relevance in the greater universe of documents, thus facilitating better categorization.

Building a Control Set

The control set is also very important during this process. This is a random, representative set of documents from the entire population of documents that the reviewer codes as responsive or not responsive. This set then acts as the standard (the control) against which the results of the TAR analysis are tested. The control documents measure how well the system has been trained and will ultimately help with determining when training is completed.

Training

There are two types of training. (1) Automatic training utilizes a set of documents that have already been reviewed by a review team (not the designated experts) and are fed into the system. The system then uses the relevance tags from what has already been reviewed and applies those tags to determine relevance for the remaining documents. (2) Manual training is a more preferred approach as it involves an intensive, dedicated effort by an expert, resulting in more consistency than input from a large review team.

Quality Control Review

Once the training set is complete, the reviewers can then start performing quality control reviews of the sets. During this process, the reviewer is checking for several factors, including whether a document itself is relevant to the review, or specific to an issue that you are researching, as well as determining whether the document itself is a good example for TAR (as described above). There is no hard and fast rule for how many quality control sets you will need to review to reach stabilization. Stabilization occurs when additional training will not affect the computer’s ability to determine whether a document is relevant or not. For large projects, this can be as few as 5 rounds of quality control or as many as 10. Even after stabilization is reached, the emerging best practice is to “test the rest,” which means to test documents below the cut-off line to ensure that they in fact are not relevant.

Final Review

Once stabilization is reached, a determination will be made regarding which documents will need document-by-document review. For example, you may review the top 50% to 75% of TAR-generated relevant documents.

In the end, I discovered that I was needed in all stages of this process – seed sets, control rounds, training rounds, quality control and document-by-document review. My position was not even close to becoming obsolete – and because I became a subject matter expert, I had increased my value to the case team!

Advantages of TAR

There are considerable advantages to TAR, including the following:

The ability of TAR to take a massive number of documents and reduce them to a far more manageable set by excluding from review documents that are very likely not relevant, saving time and money.

Allows for a more consistent review, minimizing the human error that results from less uniform application of relevance standards.

Properly designed TAR processes uncover more relevant documents than a traditional human review and at a lower cost.

Allows case teams to more quickly assess facts and issues by focusing on the most relevant documents without comprehensive review of a large data set, saving time and further reducing costs.

Courts are beginning to understand the benefits of TAR and accept (or even promote) its use to streamline the discovery process, increase efficiencies and decrease costs.

Disadvantages of TAR

While TAR has proven to be incredibly helpful in some cases, it is not always the best option.

There is no industry standard for TAR software, so not all software is going to be equally effective. It may take some trial and error before finding the right software to suit your case’s needs.

TAR software is effective only with certain types of documents. It relies heavily on documents with rich text information to analyze. It is not able to evaluate documents like spreadsheets (numbers), blueprints, schematics, or any documents that do not contain adequate searchable text. Furthermore, certain file types like video and audio files are not easily analyzed. This type of information may be critical in some cases, and thus a more traditional human review would be appropriate (at least for these file types).

TAR is only effective where experienced attorneys (or experts) have spent significant time sufficiently training the computer. If a seed set is not properly developed, this can lead to a flawed learning process and can create huge problems throughout the life of the production.

Note About Privilege Review

TAR may not be the best option for a privilege-only review. Unlike relevancy, the complexities associated with a privilege review may not be predictable by an automated process. For example:

Whether a document is privileged may only be evident from fields that are not considered by TAR algorithms, such as the To and From fields.

The identification of privileged information may require a subjective judgment call regarding whether legal advice was sought and/or provided.

Whether a document falls under the protections of marital privilege, common interest privilege and/or joint defense agreements may be even more nuanced.

Privilege may vary from document to document even if the content is similar. For example, content may be privileged in one document but no longer be privileged if it is forwarded to a third party in another document.

Waiver of privilege may extend to the subject matter of a document, even if the text of the documents differs.

TAR algorithms are not able to consider the events surrounding the creation of a document, so a document that is privileged only by virtue of its reference in or to another document may not be properly categorized as privileged.

In some instances, a privilege call affects only part of the document and redactions are needed, while in other instances, the entire document is excluded from the production. In-house counsel may serve multiple roles, including a business role, which may render a communication not privileged.

For the above reasons, employing TAR to identify privileged information presents several risks and the cost savings associated with fewer hours spent combing through documents may not justify these risks. Utilizing TAR to filter out documents that are clearly irrelevant, combined with human review for privilege, will likely yield the best results.

Conclusion

While it can seem frightening at first to put your faith in the hands of a computer program, the biggest takeaway from TAR is that when done correctly, attorneys are always going to be involved in the review and sampling processes, from the formative stages of a case to preparation of the final production. There is no blind reliance on a computer to do an attorney’s work. Instead, TAR cuts through the chaff to get to the wheat, eliminating the need to sift through a myriad of extraneous documents. The most important documents are then put in place for attorneys to review for early case assessment and/or litigation assistance. TAR is not meant to replace standard review processes and protocols, but instead to help streamline those processes so that review can be more targeted, fruitful and efficient.

DISCLAIMER: The information contained in this blog is not intended as legal advice or as an opinion on specific facts. For more information about these issues, please contact the author(s) of this blog or your existing LitSmart contact. The invitation to contact the author is not to be construed as a solicitation for legal work. Any new attorney/client relationship will be confirmed in writing.

Russell Beets

Senior E-Discovery Attorney

Contact Russell
Amy Catton

Senior Project Manager

Contact Amy

View the discussion thread.

Newest Posts

Spoiler Alert! Another Legal Update on Data Preservation and Spoliation Implications

There appears to be a recent theme on this blog regarding data preservation and spoliation, and—not to spoil anyone’s appetite for this important topic—we are back with another one. And for good reason given the heightened risk of spoliation sanctions in today’s increasingly data-driven legal landscape. A recent order in Safelite Group, Inc. v. Lockridge is one of many that highlights the growing need to stay apprised of the various steps necessary to ensure compliance with essential data preservation requirements.

Ignorance might be bliss, but it is not a defense. This is especially true as it relates to one’s duty to comply with a litigation hold. To avoid potential Rule 37(e) sanctions, attorneys must be familiar with the preservation steps needed for basic sources of ESI and take care to ensure that their clients understand the same.
Blurred Lines: Personal Devices, Proportionality, and Piercing the Work Product Privilege

In a fairly short opinion and order, the district court in Weston v. DocuSign, Inc. analyzed whether the parties were entitled to the production of text messages from former employees’ personal devices and potential piercing of the attorney work product privilege. The issues in this opinion are not necessarily novel but illustrate significant concerns for litigants.

In a world where the lines between our personal and private lives are increasingly blurry, the possibility of discovery on personal devices should come as a surprise to no one, and it is, of course, a litigation disaster to have the work product privilege protections pierced and to be ordered to turn over attorney notes, witness lists, and witness communications on the very subject of the litigation. So, what is the take-away for litigation counsel with respect to protecting the work product privilege?
Planting the Seeds of Accountability for Spoliation Sanctions

When seeking sanctions for spoliated evidence, the nature of the evidence and your jurisdiction can play a pivotal role. Are you in state or federal court? Is the missing evidence electronically stored information or not? The same facts and circumstances could yield vastly different outcomes depending on the answers to those questions. It is important to recognize up front, at the start of your case, how your jurisdiction may impact discovery issues that could arise later down the road so that you can plan accordingly. In the case in this post, while the court did not ultimately affirm the imposition of an adverse jury instruction for spoliation of evidence, it did find a duty to preserve existed based not only on the parties’ contract, but on evidence the party in question had promised to preserve such evidence. By contrast, the insurers failed to demonstrate that same party owed them a duty to preserve.

The TAR Process

Proper Training

Creating a Seed Set

Building a Control Set

Training

Quality Control Review

Final Review

Advantages of TAR

Disadvantages of TAR

Note About Privilege Review

Conclusion

Russell Beets

Amy Catton

Subscribe to the E-Discovery Newsletter

Related Posts

Data Mapping - Why is it Important for Successful E-Discovery?

Pitfalls of Complex Search Protocols in ESI Agreements

Newest Posts

Spoiler Alert! Another Legal Update on Data Preservation and Spoliation Implications

Blurred Lines: Personal Devices, Proportionality, and Piercing the Work Product Privilege

Planting the Seeds of Accountability for Spoliation Sanctions