Technology Advantage

Insight into Name Normalization: What Is Your Name?

Name Tag

Assigning a single unique name to identify an individual has created problems that precede the inception of e-discovery. Think of Santa Claus, St. Nicholas, St. Nick, Noel or simply Santa. These variations are associated with a unique individual and yet, our society has been using different names to refer to him since before we were born. In e-discovery, identifying individuals with information relevant to a dispute is one of the first and most important steps we go through at the inception of a case. As such, the identification of the different names for these individuals is critical to ensuring we do not miss potentially relevant files. 

In addition to the variation of names, the digital nature of information adds the complexity of format to the text. The electronic format of names and email addresses as they appear in headers or other communications may differ and the differences affect the way the name is indexed and captured. Therefore, it is also important to identify and consolidate these format variations.

Where should we look for variations of names?

Different sources of ESI may display names differently. If you are planning to collect email, then you should ask for all relevant email addresses – both business and personal (if the employee uses personal email to conduct business). If you are planning to collect social media, then you will need the usernames or aliases for the different platforms. You will also need the network usernames and any other usernames that the company uses for the employee (including chat platforms and phone numbers for mobile devices). You’ll also need to know if the employee’s name has changed over time or if the employee is referred to by any nicknames.

Can we cross-reference names in an automated way?

E-discovery platforms have name normalization tools that isolate and consolidate information in email headers and other metadata. This is an automated process that scans and associates the name variants, aliases and email addresses for individuals referenced in the data set. These associations are captured and the output should be available when reviewing documents.

Is this a “data cleansing” step?

Data cleansing is the process of organizing data to appear similar across all records and fields by (1) detecting and correcting correct or inaccurate records from a data set; and/or (2) bringing together data of varying file formats, naming conventions and columns and transforming it into one cohesive data set (such as revising the abbreviations “st.” and “rd.” to be “Street” and “Road” throughout). Name normalization can definitely be a data cleansing step but I recommend tackling names at the inception of a case so that your data identification, collection and review searches include all possible name variations for relevant individuals (e.g. last name changes after marriage/divorce,  nicknames and app-specific usernames).

Is the “normalization” process only for names?

The algorithm that evaluates the email headers for names and aliases also evaluate the entities that the person belongs to and standardizes the names of entities as well. Identifying entities is extremely helpful when analyzing communications and using data visualization tools.

What are the benefits of name normalization?

Name normalization:

  • Builds an inclusive set for key custodians.
  • Allows visualization tools to display information in an organized way.
  • Is the basis for communications analysis and the identification of additional custodians, search terms and relevant time periods.
  • Optimizes privilege and second level reviews.
  • Can be used as a tool to prioritize review based on custodians or potentially privileged data sets.
  • Optimizes search execution and results.

What are the drawbacks of name normalization?

The main drawback is that the normalization process may not include all variations because not all names have the same structure. In the US, for example, some names have middle names, a hyphenated structure, same first and last name or just initials instead of names. The variations in the structure should include variations in the first name as well (e.g., Elizabeth, Beth and Betty) or nicknames. Additionally, not all countries or cultures use the same rules for names.  For example, in some countries, marriage and divorce may result in someone’s last name being changed in a variety of ways; however, this may not be the case in other countries.

Conclusion – Use it or Not?

Name normalization is a process that should be implemented in your e-discovery databases as it reduces redundancy, improves the quality of the metadata and optimizes document review.


DISCLAIMER: The information contained in this blog is not intended as legal advice or as an opinion on specific facts. For more information about these issues, please contact the author(s) of this blog or your existing LitSmart contact. The invitation to contact the author is not to be construed as a solicitation for legal work. Any new attorney/client relationship will be confirmed in writing.


Topics: E-Discovery Best Practices Name Normalization Data Cleansing LitSmart KTLitSmart KT LitSmart project manager E-Discovery Project Manager Relativity

Subscribe to the E-Discovery Newsletter