登录查看更多内容

What is deduplication, and what is its role in eDiscovery? Find out with GoldFynch!

GoldFynch eDiscovery

If you’re looking for an eDiscovery solution that’s fast, cost-effective and easy to use, GoldFynch is the answer.

发布日期: 2024年5月8日

In the world of eDiscovery, where vast quantities of data need to be analyzed, being able to accurately and efficiently cull unnecessary information is a big deal - it helps save both time and money.

Deduplication is a key tool through which legal professionals can do this and work with such large volumes, making it more manageable and, in the process, making their review more effective. Here’s a quick dive into the uses of deduplication in the eDiscovery workflow and a peek into how GoldFynch tackles the process.

If you’re interested in a deeper look at the specifics of deduplication using GoldFynch, check out our support article on the topic.

What is deduplication?

Deduplication is the process of identifying and eliminating duplicate data within a dataset. In eDiscovery, where data sets can be massive and include a lot of redundant information, deduplication plays a crucial role in streamlining the review process. By using software to remove (or “cull”) duplicate files, legal teams can:

Reduce costs: Deduplication reduces the volume of data that needs to be processed and reviewed, leading to significant cost savings in storage and review expenses.
Save time: Removing duplicates from your review accelerates the review process, allowing legal teams to focus on relevant information promptly.
Increase accuracy: By eliminating duplicate documents from your review, deduplication ensures that you can focus on unique content and avoid having multiple partially-reviewed versions of documents. This helps reduce the risk of inconsistencies, contradictions, and review metadata being lost.
Enhance review efficiency: With a unique dataset, reviewers can efficiently identify key documents and make informed decisions.

There are many “strategies” that can be used for deduplication. In simpler datasets where all information exists solely within a single table of information (e.g. marketing databases), the data in the rows of the database can be directly compared against that of other rows. But when it comes to eDiscovery, where there are frequently thousands, or even millions of files in a dataset, and where files can potentially have hundreds of pages, it would be extremely wasteful to compare all the data in each file. So, instead, various unique identifiers associated with files are used to compare them.

Deduplication in GoldFynch

Deduplication in GoldFynch can be performed in two ways: by comparing either the MD5 hash value of your files, or Message-IDs in the case of emails. MD5 file hashes serve as a digital "signature" for files, and even the slightest change to a file's data (whether visible or the binary data that comprises it) can change the hash value of the file.

It's also worth noting that deduplication is carried out on a root family level since excluding or removing duplicate attachments belonging to non-duplicate parent files is not typically desirable. So GoldFynch will not mark attachment files as duplicates even if the file hashes are the same, unless the parent files are also duplicates.?

This means that if you run a deduplication session using the hash-based strategy and no duplicates are detected, you can be confident that the duplicate-looking files are either attachments to non-duplicate parent files or aren't, in fact, exact duplicates.

领英推荐

Streamlining eDiscovery: Cost-Effective Investigations…

Marcum LLP 5 个月前

The Importance of Metadata in eDiscovery: Why It…

Access | Information Management 2 个月前

Important Things to Consider When Choosing eDiscovery…

Lineal 1 年前

Reviewing GoldFynch's deduplication results

When you run a deduplication operation in GoldFynch, you can generate a report of the files detected as duplicates before deciding what to do with them.

The GoldFynch duplicate report contains the following information:

APP Link - This is a direct link to the document in your GoldFynch case (only accessible if you are logged into an account that has access to your case)
APP ID - GoldFynch's internal ID, which is used to track each individual file that is uploaded
APP Parent ID - This is the ID of the Parent document. If there is no parent then it is the same as the APP ID?
Keep? - When the value is “TRUE” it indicates that the file is primary, and “FALSE” indicates that the file is a duplicate
File Name - File name of the document
Pathname - Path of the document in GoldFynch
Tags - All tags attached to the document will be listed

In case the files are emails, the following fields are populated with the available metadata:

Subject?
From?
To
Cc
Bcc
Sent
Message ID

Note: If the source does not have metadata, these fields will be blank, even if they are emails.

What's next?

Once the deduplication session is run, the system will mark (or “tag”) the duplicates with a special “DUPE” tag, and give you the option to delete them if you wish to. Whether you delete the duplicate files or not, we recommend creating a review set of your case for review - these automatically exclude any system-marked duplicates. So you’ll be covered even if you decide to hold onto your duplicate files (for example, if you want to show that a specific file was present in a particular folder, even though it was a duplicate.) You can learn about the other benefits of reviewing your files using review sets here.

All in all, deduplication is a powerful tool in your eDiscovery arsenal. It can help significantly reduce costs, save time, increase accuracy, and enhance review efficiency. Because when it comes down to it, you want to be able to focus on your case, not on managing your files!

What is deduplication, and what is its role in eDiscovery? Find out with GoldFynch!

GoldFynch eDiscovery

If you’re looking for an eDiscovery solution that’s fast, cost-effective and easy to use, GoldFynch is the answer.

What is deduplication?

Deduplication in GoldFynch

领英推荐

Reviewing GoldFynch's deduplication results

What's next?

GoldFynch eDiscovery的更多文章

社区洞察

其他会员也浏览了

Top 7 Things Your eDiscovery Vendor Doesn't Want You to Know

The eDiscovery Funnel: Evolving from Single-Use to Multi-Structured Data Review

Unveiling the Hidden Treasures in Your Case Load to Develop Workflow Templates

Streamlining eDiscovery Productions: Key Strategies and Steps to Follow

Comparing Instances of Relativity: It’s All in the Project Manager

Dark Data in eDiscovery: Why It Matters and How to Manage It

The Most Common eDiscovery Errors Companies Make During Data Collection

Is Your Small Business Ready For eDiscovery? (Yes, It Matters)

Hanzo Newsletter - February 2023

Win At eDiscovery By Setting Up Powerful Legal Teams

What is deduplication?

Deduplication in GoldFynch

领英推荐

Reviewing GoldFynch's deduplication results

What's next?

GoldFynch eDiscovery的更多文章

What Is OCR and Why Should You Care?

You Don’t Need to Be a Tech Whiz to Master eDiscovery

Why 'Keep It Simple' Works for eDiscovery

The Pitfalls of Deduplication Without Proper File Formatting in eDiscovery

The Hidden Pitfalls of Metadata Extraction: Why File Format Matters in eDiscovery

Why Document Management Systems (DMS) and File Management Systems (FMS) Just Don’t Cut It for eDiscovery

GiB vs. GB: Why Your Data Size and Storage Don’t Always Add Up

Information Governance vs. Data Governance: Key Differences and Why They Matter

Why DIY eDiscovery Beats Using Outlook or Acrobat Every Time

Navigating Employment Litigation: How to Tackle Common Challenges with Smarter Tools

社区洞察

其他会员也浏览了

Top 7 Things Your eDiscovery Vendor Doesn't Want You to Know

The eDiscovery Funnel: Evolving from Single-Use to Multi-Structured Data Review

Unveiling the Hidden Treasures in Your Case Load to Develop Workflow Templates

Streamlining eDiscovery Productions: Key Strategies and Steps to Follow

Comparing Instances of Relativity: It’s All in the Project Manager

Dark Data in eDiscovery: Why It Matters and How to Manage It

The Most Common eDiscovery Errors Companies Make During Data Collection

Is Your Small Business Ready For eDiscovery? (Yes, It Matters)

Hanzo Newsletter - February 2023

Win At eDiscovery By Setting Up Powerful Legal Teams