The last part in this series focused on processing – why you process documents, what advantages processing has in document review and challenges during the processing phase, and I touched a bit on native files. This time, I will focus specifically on native files in the real world and the issues you may encounter during processing, production and review.
What Is A Native File?
A native file is the default file format that an application uses to create or save files. It is the original file document, such as the original Word document or Excel spreadsheet or Outlook email, taken directly from the source. It can be created and/or opened by its own software application, or in other words one that is native to the application. For example, a file ending with an extension of .docx can be created and opened in Microsoft Word.
Native File Processing
When a large set of native files from most any company is processed, there will invariably be files that are considered exceptions, meaning they can’t be processed (i.e., made useable and reviewable) using the tools available. Exceptions can vary from tool to tool and they can be a huge pain. With that in mind, I’ve included below some tips for managing native files from the outset.
1. Proprietary software or file types
When developing on a data map with the IT department or conducting custodian interviews, ask about proprietary software or file types that might not process in an e-discovery tool. Many times this data can be siloed during the collection process so that, if they can’t be processed, they can be handled another way.
2. Exception list
Talk to your vendor or law firm about what files can be processed using their tools and what are considered exceptions. They will usually have a list they can provide for your review.
3. NIST list
If your matter does not involve software systems and system files (meaning those file types are not critical to the case), de-NIST your data during processing. The NIST list was created by the National Institute of Standards and Technologies and lists system and non-document file types that can be filtered from a document collection. Also, find out if your vendor or law firm has their own filtering mechanisms which can help remove some unprocessable file types from the collection.
4. File extension analysis
After processing, run a tally on the file extension field to see what is now in the database. Some things to look for:
- XML files that appear to be part of a family. Sometimes the processing tool will process parts of a set of files that should really be kept together in order for them to work. We see this most often with GIS and GPS files as well as in IP cases regarding software code.
- BIN files, which are system files. They can be created for a variety of reasons but when a file fails to download (such as a link in an email to an archive or other location), a bin file will be created. You should research these to see if they are part of a piece of software that processed or if your email has issues with archived attachments. There are other situations when they are created but these are the most common that I’ve found.
- DBX, DBF files are often seen in data that also contains CAD drawings or 3D modeling files. Some of these programs break out the drawing and the data that contains the information about the drawing,
- MAC OSX and AD_ files are created when Apple Mac files are exported and converted for Windows. These are “shadow files” of the true files which will also reside in a folder in the set and its associated system files created with the export. These do not have to be processed but you will find them inside zipped containers and even if you de-NIST you can find them in your processed data.
- PNG files that are very small can end up being processed as text files or, if they have a low graphics resolution, can look enormous in the document review tool. It is always beneficial to check those and make sure they were processed correctly.
- ATT000001.htm files are files that are created by some email servers when emails are sent in Rich Text format or attachments are copied/pasted into the body of the email, rather than at the end. These are usually blank files which can either be removed or left and produced with an image placeholder, which leads us to a discussion about document production and trial...
Native File Production and Review
I have reviewed and advised on many ESI draft agreements regarding the production of native files. The most common argument for producing files only in native is that it is expensive to tiff (i.e., image) everything, particularly if the documents are oversized or in color. (Note that this also comes back to which vendor or law firm you use and what they charge for image creation.)
In discussions with opposing parties regarding production format and specifications, be aware of the following:
1. Native file production can be limited
In most cases, native productions are limited to documents that cannot be imaged (i.e., tiffed), such as Excel files, photos, movies and drawings. In these cases, the native files are produced with an image placeholder and some agreed-upon metadata. We generally try to limit producing documents that can be imaged in native format only. I’ve included some reasons why below.
2. How native files should be produced
In an ideal world, all native files should be produced with (1) an image (tiff) placeholder that contains the beginning Bates number and confidentiality designation; (2) along with accompanying metadata in a load file (when feasible); and (3) a field that states the file is native so they can be imaged or printed to pdf when the time comes to use them. When native files are produced with an image placeholder, the image placeholder will not be able to be searched within your database because the searchable text provided will be the actual text of the document. As a result, it can be difficult and time-consuming to locate these documents when they are not properly designated. For example, in a recent case, in order to identify files that were produced natively, I had to OCR all of the documents containing less than one image, populate a separate text field and then search for the phrase “file produced natively.” As usual, this issue arose at the least opportune time – when exhibits needed to be created for a deposition the following day.
3. Complete native and image productions
In some cases, all documents (even those that can be produced in tiff) are produced in both native format and image format with metadata. Generally this isn’t necessary but it does allow you to have a fully searchable database and pull up the native if needed. It also provides the receiving party with native color documents for documents that are imaged in black and white.
4. Native only productions
I have also received productions that contain just the native files with no load files, images, metadata or bates numbers whatsoever. Fortunately, these instances are usually limited to third-party productions where the third party may not have access or funds to process documents and the other parties allow it. If the set is small, and because we have the capability to process in-house, we may, as a courtesy, process these productions, assign a bates number and distribute the set to all parties. Taking these extra steps imposes no burden on our client and saves significant attorney time during review.
5. Document review issues
When opposing parties refuse to produce documents that can be imaged in any format other than native, it can cause all sorts of problems for document review.
- Reviewing native files can be time-consuming. If you have to review the native file, it may involve having to download it to your desktop and then opening it in the native application.
- You may not have applications for every file you need to review, such as drawings, 3D modeling or some accounting files. (Note that this is why inquiring about native file applications in your own collection is so helpful.) Sometimes we will ask the opposing party to provide a viewer if we cannot identify a free tool to download. There may be a strict policy about allowing new software or web downloads to be used in your environment, so getting ahead of that process is important.
- On a related note, for email alone, native files may come from a variety of email applications. For example, file extensions .ost, .pst, .msg, .mht, .eml and mbox are all emails (and even within an application, there might be a variety of versions, such as Outlook 2010, 2016, etc.). When emails are imaged they become uniform regardless of their native application.
- Some review tools do not render all file types equally in the viewer, meaning there can be limitations in the way a viewer renders the native file which can make it incomplete or difficult to review, or it might not render at all. This can vary among review tools.
- In the case of emails with attachments, reviewing native-only files can be challenging because attachments are not extracted separately, meaning parent emails and attachments cannot be coded or tracked individually.
There are other issues with reviewing native-only productions, but these are a few of the most common.
6. Printing issues
One major drawback with native productions is printing. Yes, attorneys still print and some judges and arbitration boards still want hard copy binders, not to mention the regular use of hard copy exhibits during depositions. Depending on the configuration of the printer, the page count can differ for each party, which can make the consistency of exhibits challenging (for example, printer setting may be different so a two page email for one party may end up being a three page email for another party). Printing to pdf can help, but there are other considerations like compression and formatting that can create inconsistencies in the page count. One way to get around this problem is to convert documents to tiff that you will need to print during depo prep or exhibit creation.
7. Deposition and Trial
Bates numbering of native documents will be on a document, not page level, meaning the bates number can be included in the file name but will not be branded on the document. What if you’re in a deposition and want to focus on a small part of a 10 page email string? The page on which that small part appears will not have a bates number, which may make referring back to it confusing.
There are many challenges to the ESI landscape and managing native files is one of them. It helps to get ahead of the process, know your client’s data and communicate effectively with all parties so that problems can be avoided and everyone is on the same page.
DISCLAIMER: The information contained in this blog is not intended as legal advice or as an opinion on specific facts. For more information about these issues, please contact the author(s) of this blog or your existing LitSmart contact. The invitation to contact the author is not to be construed as a solicitation for legal work. Any new attorney/client relationship will be confirmed in writing.