The Big Data Investment? Here’s A Hint: It’s About The Actual Data.

The Big Data Investment? Here’s A Hint: It’s About The Actual Data.

The promise of big data is straightforward: having more information about a business can help you make better decisions about its future. No surprise that technology investments in the last decade have zeroed in on storing and visualizing data.

But it’s also left us with what I call the messy middle.

Think of it this way. You collect warehouses of data, in real time, and then ask an analyst to look for hidden gems of insight. But before they can even get started, they have to prepare the data – much like a chef who can’t start cooking until all the ingredients are cleaned, chopped, and ready to go.

Analysts typically have two choices when it comes to data preparation: do it themselves or ask their IT departments. The former takes time and can leave useful data to go stale. The latter requires an available budget and competition with other IT priorities. And yet, the speed at which new information continues to arrive – what I call data velocity – keeps accelerating.

“Data is exploding faster than our ability to put our arms around it, so you’re going to have to adapt,” said the retired United States Army general at the Domopalooza 2016 conference. “The right answer on Monday is never going to be the right answer on Tuesday.”

The reality is that analysts spend far too much time on data preparation, often writing custom code scripts or spreadsheet macros to gather, clean, classify, augment, and merge information. That’s why I’m convinced the next round of big data investments will focus on automated, intelligent data-preparation tools. Tools that let analysts spend more time doing their real jobs. Tools that require clicks, not code.

Tools that clean up this very messy middle.

How Things Got So Muddled

We now generate data at a ‘round-the-clock pace that almost defies comprehension – the equivalent of 250,000 Libraries of Congress every day. This information has no standard form, arriving in everything from spreadsheets and memos to video and social media posts.

The research firm IDC created a great way to understand the scope of this situation: the amount of new data will double, every year, between 2013 and 2020; and the percentage considered to be useful if it can be tagged, will jump by more than two-thirds in this time period. In other words, we’ll see more, and more relevant, information that can help a business – or hurt it.

Imagine, for example, you run a bank in the United States. You have multiple reporting requirements related to everything from money laundering to the value of your liquid assets. Yet every day, depositors move funds and bond prices fluctuate. Your compliance, or the lack thereof, depends on how quickly you can analyze this real-time data.

This dilemma asserts itself daily from compliance-driven industries such as banking and insurance to pure everyday commerce. The best defense is a decision-making process that keeps your speed-to-analysis as low as possible. Since data preparation is the most time-consuming part of the analytic process, it only makes sense to automate it.

What Intelligent Data Preparation Looks Like

One new company winning in the data preparation market – so much that it became part of my portfolio – is Paxata. Its algorithms clean, combine, and enrich data. It also uses cognitive-computing routines to create relationships among data automatically. For instance, if the software encounters a field in a spreadsheet or report labeled first name it infers a connection to another field labeled last name.

The result is less time cleaning data and more time solving real problems.

Consider a U.S. food manufacturer that wanted to reduce product spoilage. Paxata combined internal production and point-of-sale data from separate systems into a single, real-time view of its product – from initial order to checkout at a cash register. This approach not only eliminated spreadsheet workarounds, it freed time for analysts to identify previously unknown pinch points in the supply chain.

Paxata took a similar approach with a technology vendor whose analysts spent most of their time gathering and aggregating data, resulting in stale and inaccurate information. They now see a combined data flow from a half-dozen backend systems, and can collaborate in a secure, fully auditable, and up-to-date environment. Gone are the specialized code and spreadsheet macros that mashed this information together. And with more time to do their actual jobs, they discovered nearly $10 million in operational savings.

Analyst-Ready Data In Minutes, Not Months

Today, the process for moving from raw data to analyst-ready information is broken. Too many IT departments don’t understand, or can’t keep up, with the incoming information. Too often, analysts can’t use the available IT tools and have to cobble together workarounds.

I expect the need for real-time data analysis among knowledge workers to grow rapidly – and that the biggest obstacle to their success will be the time required to prepare information. Automated, intelligent tools will speed that process, and ultimately the pace of business decision-making itself.

And make things a lot less messy.

Lisa Lambert is a Managing Partner at The Westly Group, www.westlygroup.com, responsible for equity investments in software, IOT, and the Internet. Lisa is also Founder/CEO and Chairman of UPWARD, www.upwardwomen.org, a global network of executive women. 


Tim Woodhouse

Enterprise Sales - Digitally Transform Financial Operations Management

7 年

Yes excellent article Lisa - I agree with you ADM platforms will take off for sure because bad data can have disastrous effects on the business. Here's a white paper that compliments your views on Paxata : 5 Best Practices for better SAP Master Data, https://goo.gl/TqEsk5

Uday Deo

APICS certified supply chain professional (CSCP). Operating now in Startup Ecosystem as Mentor Startup India ???? MAARG Mentor

7 年

Data format standardization for each sector can save some time for data cleaning..... Good article.

回复
Ranak Ghosh

Senior Software Engineer at Indus Net Technologies

7 年

It's really an overhead for cleaning the data before proceeding to store it in specified dimensions and performing the analytics. The best solution would be while gathering up the data sets a standard format is specified preferably xml and post that through a filter we pass the xml and it will result out the information. The job this way will get a bit eadier. But with huge data sets and no such specified standard formats developing such a tool is also pretty hard job as you cannot cime up with any fixed algorithms.

Tristan Washington

IT and Safety Compliancy

7 年

A very interesting article indeed.

回复
Hans J?rgen Pedersen

Senior Business Analyst in Business Intelligence and Decision Support

7 年

Ever heard of Filter Bobbles? I have a few issues with the approach mentioned, and they have to do with things being mixed up. "Tools that require clicks, not code." - I hate those! They make everything so slow, difficult, and they force your mind into a preprocessed path. No new knowledge comes from that. Big Data is defined by having more data flowing in than the equipment you can afford is capable of handling... So: buying a software solution will not help you using Big Data, it will remove it! Until next time you are overwhelmed... And I am sorry to miscredit the retired United States Army General, but unless the quote is out of context, he is wrong. If the answer is wrong on Tuesday, it was wrong on Monday - you just didn't know it yet! This has nothing to do with Big Data, but everything to do with good old fashioned Analysis. When it comes to analyzing information, I will never trust conclusions based on cooked data. Simple as that. As an analyst, you have to know the limitations of the data, not mitigating them. You have to prepare your data yourself, you have to understand the flow and the possible alterations data can be exposed to. But in the end, you also have to trust the systems, you have tested and found true (until it is time to test them again). On the other hand, you have Big Data. That's why they are called "big". And they need different methods to harvest, but what comes from them is neither the truth, nor is it called analysis.

要查看或添加评论,请登录

Lisa M Lambert的更多文章

  • GROWING – AND PROTECTING – YOUR NET WORTH

    GROWING – AND PROTECTING – YOUR NET WORTH

    Women know a lot about good financial management. We run intricate budgets at work, and sometimes at home.

    9 条评论
  • How to Build Your Billion-Dollar Company

    How to Build Your Billion-Dollar Company

    One turned down a job from a former United States Vice President to start her own company. Another sold her first…

    26 条评论
  • 3 Great Leaders, 6 Tough Questions & A Big Pinch Of Salt

    3 Great Leaders, 6 Tough Questions & A Big Pinch Of Salt

    It’s fair to say that I’m assertive and that I like to get results. It was that way when I played college basketball…

    14 条评论
  • THREE STEPS TO BUILDING A POWERFUL NETWORK

    THREE STEPS TO BUILDING A POWERFUL NETWORK

    If you want to see me passionate—even a little exercised—bring up the need for professional women to have networks. The…

    42 条评论
  • Get On (A) Board Now!

    Get On (A) Board Now!

    I do a lot of reading about women and leadership and always make it a point to scrutinize the numbers. I’m happy to…

    27 条评论
  • Making It Through The Leaky Pipeline

    Making It Through The Leaky Pipeline

    I often tell people workplace diversity and inclusion isn’t a pipeline problem – it’s a leakage issue. The pipe is…

    6 条评论
  • Diversity & Inclusion Is A Growth Story

    Diversity & Inclusion Is A Growth Story

    This post is for every investor with the skepticism to ask: why should I give two hoots about a diversity and inclusion…

    17 条评论
  • Cybercrime Isn’t Going Away – Neither Should Investors

    Cybercrime Isn’t Going Away – Neither Should Investors

    It has been a busy 18 months in the world of cybersecurity. If you’ve been breached, hacked, or had your data ransomed,…

    5 条评论
  • Despite The Hype, IoT Is Already Big Money

    Despite The Hype, IoT Is Already Big Money

    If you’ve had enough about the Internet Of Things (IoT), give me six paragraphs to sketch out why it’s about much more…

    70 条评论
  • Two Lines Of Code Are About To Upend A $20 Billion Industry

    Two Lines Of Code Are About To Upend A $20 Billion Industry

    Imagine you’re on a business trip and killing time at the airport before your next flight. You’re excited to get home…

    37 条评论

社区洞察

其他会员也浏览了