Knowing what you don't know
Matthew Hardman
Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership
Musings from the coffee shop again, waiting for the kids...
The other day I was having lunch with a friend of mine, who was now working in an analyst firm, we talked about personal topics, but then of course we talked some shop as well. One of the areas we were discussing was around "Data Security". He mentioned to me, that all the firewalls, antivirus and security tools in the world, don't mean a thing unless you know what you are trying to secure and where it is. There is no denying his position, no point in buying a super mega safe and realising your fortune is in the wardrobe.
The unfortunate thing in many occasions, is that users may not understand the value of the data they work with on a day to day basis. In IT administration, and software development we create logins, roles, security access groups to secure data, but these are quite often circumnavigated because of functionality in an application, or the need to collaborate.
The Problem
Take for example a sales pipelining tool. When this is deployed typically Operations and IT goes to great lengths to ensure that the appropriate roles are defined with the right level of access, so that data leakage is kept at an absolute minimum. Data is safe and secure inside the confines of the application right? Well that same application also implements the capability to export results from that tool to something like a .CSV format so the user can chose to use an alternate application to work with the data. That's awesome, flexibility, usability right? Here comes the big question...
"Where do you store the .csv file?"
There is a SHOULD answer to this question, and there is a DID answer.
The SHOULD answer involves making an active decision to place that data in a secure location, that you can provision access to if necessary for further collaboration, and when the data is no longer necessary you revoke access and ensure the deletion of the data.
The DID answer is, I put it on the NAS so it was easier to get access to, and forgot about it.
Unfortunate as it is, most of us will put it in the DID category. Don't think so? Go check out your network storage where you have placed data in the past, what have you put there, do you even know what you put there?
A Real Experience
We recently worked with a customer to help answer that question, "What did we really have?". The answer quite frankly shocked them.
In answering the question, we utilised a data intelligence framework called Hitachi Content Intelligence (I am not going to do the hard sell, but I find this technology fascinating), which helps customers to search, enrich and transform data of all types, entirely built on a Microservices framework running on Docker to deliver massive scale and performance (especially because of how large these unstructured data stores can be).
Using this technology, the customer allowed us to search and index their local NAS to understand what they had...
OK, well a file count is nothing terribly impressive, organizations have amassed a huge amount of data over the years with the increasing digitisation of systems and processes. However the time of last modification did raise an eyebrow...
"We are spending top dollar backing up the same data over and over again on a regular basis and nothing has changed. Our backup costs are increasing as our data is increasing, but we aren't understanding what or why we are backing things up."
Guess what, this wasn't the kicker... this one made peoples minds blow!
These were all the file types that were stored on the NAS, and take a look at the top two formats, take a minute and soak it in.
Anyone who has ever worked with data knows that a ".CSV" file, is an export of data directly from an application in to a readable format for many other applications to consume and use, and an "Excel" file, well that one is obvious, more than likely data which can contain lists and charts and a whole range of insights.
All this and more sitting on a NAS that everyone in the organisation has access to. The leadership we were talking to looked around and then asked "So what do you recommend?".
In Summary
The reality in looking at this, is that unstructured data is the proverbial haystack, and sensitive data, the needle you are trying to find. The challenge is, it almost looks like there are more needles than hay. How do we solve it? Honestly, it will be a continuous process of assessing the situation, educating users, implementing safeguards to take action, and then repeat.
Technology is never going to be the magic solution to everything, maybe not yet, but it does go a long way to giving you insights as to where you need to start.
Final Note: If you would like to do a similar assessment for your organisation, don't hesitate to reach out.
Final Note 2: I will make sure we do a video to explain this all with a demo soon!