The Big Data Investment? Here’s A Hint: It’s About The Actual Data.
Lisa M Lambert
Chief Investment Officer, Private Markets at George Kaiser Family Foundation; Board Member; Founder, National Grid Partners; Founder, UPWARD
The promise of big data is straightforward: having more information about a business can help you make better decisions about its future. No surprise that technology investments in the last decade have zeroed in on storing and visualizing data.
But it’s also left us with what I call the messy middle.
Think of it this way. You collect warehouses of data, in real time, and then ask an analyst to look for hidden gems of insight. But before they can even get started, they have to prepare the data – much like a chef who can’t start cooking until all the ingredients are cleaned, chopped, and ready to go.
Analysts typically have two choices when it comes to data preparation: do it themselves or ask their IT departments. The former takes time and can leave useful data to go stale. The latter requires an available budget and competition with other IT priorities. And yet, the speed at which new information continues to arrive – what I call data velocity – keeps accelerating.
“Data is exploding faster than our ability to put our arms around it, so you’re going to have to adapt,” said the retired United States Army general at the Domopalooza 2016 conference. “The right answer on Monday is never going to be the right answer on Tuesday.”
The reality is that analysts spend far too much time on data preparation, often writing custom code scripts or spreadsheet macros to gather, clean, classify, augment, and merge information. That’s why I’m convinced the next round of big data investments will focus on automated, intelligent data-preparation tools. Tools that let analysts spend more time doing their real jobs. Tools that require clicks, not code.
Tools that clean up this very messy middle.
How Things Got So Muddled
We now generate data at a ‘round-the-clock pace that almost defies comprehension – the equivalent of 250,000 Libraries of Congress every day. This information has no standard form, arriving in everything from spreadsheets and memos to video and social media posts.
The research firm IDC created a great way to understand the scope of this situation: the amount of new data will double, every year, between 2013 and 2020; and the percentage considered to be useful if it can be tagged, will jump by more than two-thirds in this time period. In other words, we’ll see more, and more relevant, information that can help a business – or hurt it.
Imagine, for example, you run a bank in the United States. You have multiple reporting requirements related to everything from money laundering to the value of your liquid assets. Yet every day, depositors move funds and bond prices fluctuate. Your compliance, or the lack thereof, depends on how quickly you can analyze this real-time data.
This dilemma asserts itself daily from compliance-driven industries such as banking and insurance to pure everyday commerce. The best defense is a decision-making process that keeps your speed-to-analysis as low as possible. Since data preparation is the most time-consuming part of the analytic process, it only makes sense to automate it.
What Intelligent Data Preparation Looks Like
One new company winning in the data preparation market – so much that it became part of my portfolio – is Paxata. Its algorithms clean, combine, and enrich data. It also uses cognitive-computing routines to create relationships among data automatically. For instance, if the software encounters a field in a spreadsheet or report labeled first name it infers a connection to another field labeled last name.
The result is less time cleaning data and more time solving real problems.
Consider a U.S. food manufacturer that wanted to reduce product spoilage. Paxata combined internal production and point-of-sale data from separate systems into a single, real-time view of its product – from initial order to checkout at a cash register. This approach not only eliminated spreadsheet workarounds, it freed time for analysts to identify previously unknown pinch points in the supply chain.
Paxata took a similar approach with a technology vendor whose analysts spent most of their time gathering and aggregating data, resulting in stale and inaccurate information. They now see a combined data flow from a half-dozen backend systems, and can collaborate in a secure, fully auditable, and up-to-date environment. Gone are the specialized code and spreadsheet macros that mashed this information together. And with more time to do their actual jobs, they discovered nearly $10 million in operational savings.
Analyst-Ready Data In Minutes, Not Months
Today, the process for moving from raw data to analyst-ready information is broken. Too many IT departments don’t understand, or can’t keep up, with the incoming information. Too often, analysts can’t use the available IT tools and have to cobble together workarounds.
I expect the need for real-time data analysis among knowledge workers to grow rapidly – and that the biggest obstacle to their success will be the time required to prepare information. Automated, intelligent tools will speed that process, and ultimately the pace of business decision-making itself.
And make things a lot less messy.
Lisa Lambert is a Managing Partner at The Westly Group, www.westlygroup.com, responsible for equity investments in software, IOT, and the Internet. Lisa is also Founder/CEO and Chairman of UPWARD, www.upwardwomen.org, a global network of executive women.
Enterprise Sales - Digitally Transform Financial Operations Management
7 年Yes excellent article Lisa - I agree with you ADM platforms will take off for sure because bad data can have disastrous effects on the business. Here's a white paper that compliments your views on Paxata : 5 Best Practices for better SAP Master Data, https://goo.gl/TqEsk5
APICS certified supply chain professional (CSCP). Operating now in Startup Ecosystem as Mentor Startup India ???? MAARG Mentor
7 年Data format standardization for each sector can save some time for data cleaning..... Good article.
Senior Software Engineer at Indus Net Technologies
7 年It's really an overhead for cleaning the data before proceeding to store it in specified dimensions and performing the analytics. The best solution would be while gathering up the data sets a standard format is specified preferably xml and post that through a filter we pass the xml and it will result out the information. The job this way will get a bit eadier. But with huge data sets and no such specified standard formats developing such a tool is also pretty hard job as you cannot cime up with any fixed algorithms.
IT and Safety Compliancy
7 年A very interesting article indeed.
Senior Business Analyst in Business Intelligence and Decision Support
7 年Ever heard of Filter Bobbles? I have a few issues with the approach mentioned, and they have to do with things being mixed up. "Tools that require clicks, not code." - I hate those! They make everything so slow, difficult, and they force your mind into a preprocessed path. No new knowledge comes from that. Big Data is defined by having more data flowing in than the equipment you can afford is capable of handling... So: buying a software solution will not help you using Big Data, it will remove it! Until next time you are overwhelmed... And I am sorry to miscredit the retired United States Army General, but unless the quote is out of context, he is wrong. If the answer is wrong on Tuesday, it was wrong on Monday - you just didn't know it yet! This has nothing to do with Big Data, but everything to do with good old fashioned Analysis. When it comes to analyzing information, I will never trust conclusions based on cooked data. Simple as that. As an analyst, you have to know the limitations of the data, not mitigating them. You have to prepare your data yourself, you have to understand the flow and the possible alterations data can be exposed to. But in the end, you also have to trust the systems, you have tested and found true (until it is time to test them again). On the other hand, you have Big Data. That's why they are called "big". And they need different methods to harvest, but what comes from them is neither the truth, nor is it called analysis.