Which File Format Is Best For eDiscovery? (PDF, TIFF, Native, etc.)

Which File Format Is Best For eDiscovery? (PDF, TIFF, Native, etc.)

Takeaway: If you can, insist on working with native load-file eDiscovery productions. If not, a PDF load-file production is fine. And, yes, you have other options. But they’re not ideal.??

Your file format for eDiscovery matters. So what are the options, and which is the best???

In today’s complicated digital world, eDiscovery is a critical process. And since it’s all about electronically stored information (ESI), the format of this information matters. It can be the difference between an efficient and effective data review and one where you miss key evidence.?

Option 1: Native files (I.e., the original format)

Native files are those that are in their original format – Word documents, Excel spreadsheets, PowerPoint presentations, etc. The main advantage of using native files in eDiscovery is in preserving all metadata and file attributes like embedded hyperlinks or formulae. However, the potential drawback is that these files usually need the original software for full access, viewing, or effective searching. Also, since these files keep their original format, they might appear differently on devices or software versions that interpret the file differently.

Option 2: TIFF files (I.e., a static image)

TIFF, or Tagged Image File Format, is an image format that's often adopted in eDiscovery. Unlike the flexible nature of native files, TIFF provides a fixed image of a document. The main advantage here is that it’s easy to annotate TIFFs, which makes them appealing to law firms. Also, TIFFs ensure documents’ appearance stays consistent across devices and software versions. But they have cons, too. For instance, they don’t have the original file metadata. And they aren’t ‘searchable’ with your eDiscovery search engine unless they’re paired with an accompanying text file.

Option 3: PDF files (I.e., a universal file format)

The Portable Document Format, or PDF, is a universal file format known for preserving a document’s original ‘look.’ I.e., it keeps the fonts, images, graphics, and layout of the original source. And it’s searchable, making it especially convenient in eDiscovery. However, you might lose some metadata when converting documents to PDF. And PDFs sometimes take up more storage compared to regular text files.??

Option 4: PST & OST files (I.e., email archive formats)

PST (Personal Storage Table) and OST (Offline Storage Table) are file formats used by Microsoft Outlook to archive email data. They’re vital reservoirs for email data, preserving threads, attachments, and even folder structures. (It’s this preservation of an email’s original structure that makes PSTs and OSTs so popular.) However, they get corrupted relatively easily, and larger PSTs and OSTs are so unwieldy that they slow down data review noticeably.

So, how do you choose from these formats? Well, it depends on what you’re prioritizing.?

Here are some things to consider when choosing a file format.?

1. Searchability

Your file review becomes much more efficient if you can swiftly and effectively search through content and associated metadata. And this is especially true for data-heavy sectors like eDiscovery or digital archiving. So, you’ll want to prioritize a file format that allows you to extract text, retain metadata, and index data reliably. (Note: Most file formats are searchable, but you might need additional tools or software for some.)

2. Metadata preservation

Metadata, often termed 'data about data,' provides context to all your case information. Context like when a file was created/modified, who created it, where it’s stored, and more. So, if things like timestamps and geolocation data are important for your case, you’ll want a format that preserves metadata well.?

3. Accessibility

A file that can't be accessed easily or requires specialized software will slow you down, especially if you collaborate with others or work in a team. So, sometimes, you might want to prioritize a format's accessibility over everything else. This will ensure it's compatible with the maximum number of applications and is easy to get tech support for. Ideally, it'll be an 'open' format that isn't tied to specific software or proprietary standards.

4. Size and storage

As organizations store more digital data, storage costs and efficiencies become important. Bulky files can slow you down, increase costs, and mess with data transfer. So, if this is an issue, you’ll want to explore formats with file compression capabilities, data redundancy, and storage optimization features.?

5. Security

The more we hear about data breaches and unauthorized access, the more a file format’s security begins to matter. So, when dealing with privileged information, you’ll want to choose a format with encryption capabilities, malware/tampering protection, and compatibility with digital rights management (DRM) tools. For instance, PDFs offer robust encryption and security features, making them ideal for sensitive documents.

6. Durability?

In the context of archiving and preserving data, file formats must stand the test of time without becoming obsolete. So, things like the adoption rate of the format, backward compatibility (i.e., how compatible it is with legacy software), and openness of the format standards will affect how long the format will last. A widely adopted, open standard is less likely to become obsolete than a proprietary, niche format.

7. Appearance?

Often, we want a format that preserves a document's original appearance – especially when we're dealing with areas like design, publishing, and content creation. In these cases, you'll want a format that locks a file's pagewise layout. This way, you'll get the same look regardless of which system you access the file on.?

So, which format is the best for eDiscovery??

Well, here’s what we prefer, starting with the best and ending with the worst.

The best option is a native load-file production.?

Native productions are way better at preserving file metadata than any other format. So, if you have a choice with incoming productions, ask for native ones. Ideally, you’ll want these natives with an accompanying load file to direct your software in slotting the file’s data into a behind-the-scenes database.

If not native, ask for a PDF load-file production.

If you can’t get a native production, ask for a PDF one instead. Ideally, one with a load file. These are becoming an industry standard, so you’ll likely get no pushback when requesting them. Just ensure the PDFs come as separate documents. It’s significantly more complicated to review them if all your files come stuffed into one gigantic PDF.

If not PDF, ask for a TIFF load-file production.

The TIFF format hasn’t been updated since 1992, so they’re less secure than PDFs, lower resolution, and have minimal add-on features. Still, they’re workable, even if it’ll take more effort.?

Your last resort: Bulk PDFs, loose collections of TIFFs, or paper files.?

If there’s no way of getting any of the earlier formats, you can still manage with (1) PDFs lumped into one massive document, (2) TIFF collections without load files, or (3) Paper documents. It’ll take a lot more effort to prep using these formats, though. And even here, you’ll have standards. For instance, scan the paper documents to a resolution of 300 PPI (i.e., pixels per inch) and scan each document as a separate PDF. Also, use optical character recognition (OCR) to convert the scanned image into machine-readable text.

Importantly, remember that all this gets easier with practice.

Selecting the right file format can feel like navigating a maze. But if you focus on your priorities and match these to your file format options, navigating that maze will get significantly easier. And just as importantly, you’ll be prepared to evaluate and adapt to newer file formats when they come along.?

要查看或添加评论,请登录

GoldFynch eDiscovery的更多文章

社区洞察

其他会员也浏览了