3 Things You Don’t Know about Data Mining

3 Things You Don’t Know about Data Mining

Assumptions that could make your cyber incident even worse?


Data mining involves programmatically searching and manually reviewing data. In incident response, it's crucial for identifying the scope and impact of compromised sensitive data, ultimately determining the notification list.??

If the already looming crisis of the cyber event itself isn’t bad enough, these misconceptions about data mining can set you back even more. Understanding the principles of data mining allows you to better set expectations for the process, and even prepare for an (inevitable) cyber event.??

Data(-Mining) isn’t Linear?

Not all data is created equal and nor is its mining linear. Just as files and their data vary widely in type, complexity, and context, so does the process of mining them. It is exactly this variance of process that turns into an unpleasant surprise for those experiencing a cyber incident and those involved in the data mining response. Tangibly speaking, this means that just because one file can be reviewed by one person in one minute, one million files (1,008,00 to be exact) cannot be reviewed in one day if you expanded the team to 700 people.??

Not all data is created equal and nor is its mining linear.

Many factors contribute to the file/review time ratio. In any given data store, there are hundreds of file types, thousands of languages, endless variations of file formats. Additionally, you must consider the content of the files, e.g. legal, financial, or administrative files to name a few. The complexity from the context of the data and the state of the file, such as high/low resolution, will also add another layer to the review process.??

?These variances are endless and with each added component there is less reason to expect file/time ratio in data mining to follow a pattern—especially a linear one.?

You Just Don’t Know your Data??

To steal a line from Tom Hanks,?"Life is like a box of chocolates, you never know what you're gonna get". You don't actually know what's in your data until you look at it closely under a microscope.?During our hundreds of engagements, we often come across clients who assume they know what’s on their hard drive; that couldn’t be further from the truth. Perhaps it’s an inclination towards optimism (!) but victims of exfiltration assume the accessed data did not contain consequential sensitive information far too often. Mining typically proves otherwise, and thousands of highly sensitive data are revealed to have been accessed by the threat actor.?

Never assume that you know what's in your data store unless you’ve actually mined it file by file.?

You just can't control what employees save and where they save it. Whether it’s work related or personal information, saved on purpose or accidentally, all sorts of data accumulation occurs on company hard drives . For example, a folder that is meant to hold one type of information, is accidentally set as a destination for downloads that are completely unrelated. Files can be misplaced or copied by mistake from one location to another. Even the most meticulous operations should expect surprises. Never assume that you know your data unless you’ve actually mined it file by file.?

Never Rely on Manual Review as Backup?

Accuracy is the name of the game in data mining and its resulting report. The greatest risk in any data mining process is allowing human interpretation to drive the findings. . When reviewing data to identify sensitive elements, human reviewers inadvertently allow their interpretation to skew what ought to be objective extraction.?

There is no room for interpretation in data mining for cyber incident response. Counsel and the victim organization need direct, objective, and in cases when there may be doubt, as-close-to-the-truth as possible data extraction. That is why automated extraction that uses machine learning is far superior to manual review.??

There is no room for interpretation in data mining for cyber incident response. Extraction is the only truth.

In the absence of R&D efforts or talent, many mistake manual review as a more meticulous alternative but they couldn’t be more mistaken. For example, a human reviewer might see “Tom”, but the characters in the electronic file are immutable as “Tim”, or similarly “Neil” versus “Niel”. The final report deliverable from a post-breach response data mining engagement has no place for interpretation. Automated data mining is the most objective method that yields the most accurate results.?


As data volumes and complexity continue to grow exponentially, so do cyber incidents. Instilling data mining best practices, and implementing technology-first response workflows, are the only effective ways to respond and produce completely accurate notification lists.?


要查看或添加评论,请登录

ACTFORE的更多文章

社区洞察

其他会员也浏览了