Data Graveyard: The Hunt for Hidden ROT
Judge, M. (1999). Office Space. Twentieth Century Fox.

Data Graveyard: The Hunt for Hidden ROT

Finding and minimizing redundant, old, and trivial (ROT) data is anything but trivial.

?

Let’s consider these three data categories separately for a moment.?First, redundant data.?It’s technically easy to find redundant data by comparing file bytes/sizes, but when you are provided with a list of 5,726 duplicate document (name, location, size, etc., only), it’s nearly impossible to determine what to do with each one without a file by file review. Does this really require 5,726 reviews/decisions? If so, ouch! Count me out. “Duplicate file” status is not actionable intelligence without further review.

?

Similarly, it's as easy as "shooting fish in a barrel" to find old files but just because a file is old does not mean it is immediately due for some sort of remediation.?More manual reviews are typically required.?If making a decision about these duplicate and old files means that a manual review must be performed on each file, then no one is going to want to deal with this issue at all.?It’s cost prohibitive.?This is one of the reasons that most organizations and individuals hoard data and watch it grow and grow and grow.

?

Conversely, discovering trivial data is tougher than a $3 steak.?Is there a regular expression for this? Is there a dictionary of terms we can use during a search that will indicate whether or not a document is trivial??Sorry, no.?Everyone wants to rid the world of ROT data, but no one has come up with a solution that is able to detect this kind of data and act on it in bulk – until now.

?

BigID has developed a technique that we call data cluster analysis. This machine learning algorithm has an incredible ability to identify and present our customers with virtual collections of similar documents. In our metadata registry, we place similar documents in virtual “stacks” or sets making them easy to validate and work with in bulk/batch fashion.?For example, our customers open up these collections to see that one contains nondisclosure agreements, another, RFPs, and one is full of purchase orders.?So this is great, we are able to show our customers numerous categories of high value data.?But wait.?Our customers open other clusters to find that some are composed of 12,384 machine generated backup logs from 11 years ago made by a backup solution that has long been replaced.?Another collection contains old data sheets for discontinued products, while another contains a hoard of travel expense reports from a pre-merger business unit that no longer exists.


At this moment of clarity, our customers are able, in many cases, to take bulk action on an entire collection of documents. One collection may be best handled by a n-year data retention policy (we have an app for that) while another collection may be clearly destined for the data graveyard, all facilitated by our data remediation app. So, as a result of this ML-based data categorization, BigID is shining a light on a problem that the market has bemoaned for years and has all but given up on.


So, whether you are looking to locate and manage your data, high value or ROT, BigID's approach is far superior to anything the market has seen in decades.

?

Reach out to BigID for a demonstration today.

Sean Koontz

Cyber leader | Passionate about Identity Security, Interconnected SaaS solutions and helping companies securely scale their Digital Transformation

2 年

S3 buckets be warned

要查看或添加评论,请登录

Phil McQuitty的更多文章

社区洞察

其他会员也浏览了