The Uses of Different File Formats
by Greg Marliave
Last week Joshua Gilliland wrote a great post about thinking through the form of production. I thought it would be a good time to share our thoughts about different file formats, and how they affect document review.
If you have reviewed multiple cases on one review platform, you may have noticed significant differences in the review experience. While the software presenting the documents is the same, the format of input can vary quite a bit, case to case. One of the biggest factors is what file types are included. If you are reviewing on the Everlaw platform, you will have up to three different viewers available to you: image, text, and native.
Image Format
Most review platforms default to an image viewer. An image viewer simply loads an image for each page of a document. Generally, those images will be TIFF, PDF, or PNG files. There are a number of advantages of image files:
They load quickly! Reviewing thousands to millions of pages takes long enough; you don’t want to be waiting for documents to load.
Standardizing all documents into one format creates consistency in the review experience.
You can redact material from an image, which you can’t do for native files.
Bates stamping on the images simplifies referencing documents and preparing for trial.
Text Format
Every production protocol should include text whenever possible. The most important reason to produce text is that it allows you to search for keywords in your documents en masse. Having text files also means that you can load them into a text viewer. Though text is not a true visualization of the document, missing images, there are also some key benefits:
Keyword highlighting! In our software, search keywords are highlighted: you can set keywords to automatically highlight across the entire case, or you can set personal highlights during a review session. All of this makes finding your key phrases in long documents a whole lot easier.
It still loads quickly!
In our tool, you can use machine translation to automatically translate foreign language text.
Native Format
Last, but not least, we have the native viewer, which loads the actual native file directly into your browser. A native is the original file in the file type of the program it is associated with, like .doc, .xls, .msg, etc. This is the most technically complex, as there are a theoretically limitless number of different file types. Any review platform will accept some number of the most common and build support for those into the platform; our platform supports over 300 different file types. The downside of a native viewer is that it is slower, but it has its benefits as well:
You are viewing the actual original file, with no alterations to conform to another format.
You can highlight and translate text in most native formats.
If you have the native file, you can still retrieve missing metadata.
Many file formats don’t make much sense broken out into images or text. Excel files are the best example. That’s why we have built our own custom spreadsheet viewer in the native view!
Our review platform can support pretty much any production protocol you can come up with. However, that doesn’t mean you won’t feel the effects. Getting the right protocol can save you review time and money. All of the different formats have their advantages, but here are a few common combinations to consider:
The kitchen sink: Images, text, and natives. One of the easiest solutions is to produce everything. It gives the reviewer the choice of using whatever tool is most appropriate for each document. For example, you can view your PDFs in the image viewer, emails in the text viewer, and spreadsheets in your native viewer. The downside is that hosting all the formats means an increase in hosting costs.
Image and text: Most of the time native files won’t be necessary, and the slower load speed means you won’t want to use the native viewer unless you have to anyway. That being said, sometimes you have to. If you are leaning towards image and text, you should make exceptions for certain files – like spreadsheets or videos. Note that this approach doesn’t often save much on costs, as images tend to make up the bulk of the costs.
Native and text: Producing responsive documents only in native is always an option and is most useful as a cost-saving measure. Even in native, it is best to produce text and metadata. A good use for native productions is for initial review before producing to the final protocol. This cuts down on the cost of hosting non-responsive documents; you can just image the documents when you need to produce them.
If you want to see how Everlaw deals with different file types, don’t hesitate to contact us!
Greg Marliave is Vice President of Product at Everlaw. Having previously worked at Facebook and Black Bag Advertising, his experience ranges from data analysis to online advertising.