UNSTRUCTURED DATA - STEPPING UP THE LADDER

STEPPING UP THE LADDER – UNSTRUCTURED DATA

By W H Inmon

Recent announcements by large vendors indicate that there is increasing interest and support for including unstructured data in the mainstream of decision making. This is a very positive trend and the vendors are to be congratulated for moving this direction.

However, the vendors are about to discover some truths hiding behind unstructured data. One of those truths is that merely storing unstructured data does not yield any great business value. Certainly, unstructured data must be stored, but merely storing unstructured data does not – in and of itself – yield any great advantage.

The first truth that the vendors will discover is that there is a big difference between text and machine generated data. These two types of data are as different as East and West.

The second truth the vendors will discover is that there is an untapped marketplace bound up in text that has not been touched. There is call center data, the voice of the customer, medical records, contracts, warranties, email analysis, … all just waiting. And to date no one has really cracked the code for these marketplaces. So the textual marketplace is like California in 1848. There is gold there just waiting to be picked up if you can get there.

But for all of the promise of the textual marketplace, the vendors that announced support for the storage of unstructured data are about to discover that they are merely on the first step up a long ladder.

That ladder looks like –

No alt text provided for this image

In order to yield any great business value the unstructured data must be disambiguated. There is simply too much text to try to process it manually. The human brain just can’t handle it. What is needed is the computer. And there have been attempts to disambiguate text using the computer.

There are some simple ways that unstructured data and text can be disambiguated. There is stemming. There is soundex. There is stop word processing. There is proximity analysis. These are all interesting activities but these disciplines don’t really produce any great business value, to speak of.

The next step up the ladder is key word processing and NLP. Indeed, there is a trickle of business value that can be obtained using these approaches. But these approaches are clumsy, complex and expensive. They are very academic and require an army of high priced consultants. These approaches take a long time to manage and trying to wrangle business value from them is a complex hassle.

At the top of the heap for disambiguating text is textual ETL. Textual ETL is inexpensive, easy to use, and fast. And textual ETL is designed to produce business value quickly and simply.

So the movement to unstructured data is a very good thing. But if the vendors that support storing unstructured data are looking for quick wins, they are going to be disappointed. These vendors have got to move up the ladder before they can start to gain the confidence of the customer.

At the end of the day, producing business value is where the vendors have to be if they want to stick around.


Bill Inmon is the founder of Forest Rim Technology, a company in Denver, Colorado that helps customer hear and interpret the voice of the customer.


Joe Jordan

Sr. Solutions Engineer at Snowflake

3 年

There's Gold in them there Lakes! And ur sifting pan is Textual ETL!

回复

You're still on top of the ladder, Bill ??

回复
Martin Goebbels

Over 20 years dedicated to data analytics and related activities, tools change but foundations stay. Good data quality may not make your decisions better ones, but bad data quality will definitely make them worse.

3 年

I'd love to see #TextualETL working together with Snowflake. Having that unstructured data linked with all the Context and all other relevant metadata easily accessible and searchable in a single environment

回复
Todd Scofield

Catalyst for Creativity in Big Data & High Performance Computing

3 年

To this day the challenge has been how to process the data at hand, structured or unstructured fast enough to take advantage of the potential knowledge in the data. This challenge has gone on for the 45 years I have been involved in exploiting the latest chips and algorithms to achieve 2-100X business goals. To this day, exploiting structured data is easier. Bill, and his team have built something that using current platforms can provide access to the Gold within Textual Data in a simple and cost effective way. In its current state it will fit many business models and deliver a positive outcome. What you don't know you can't deliver. Check it out - See if it fits your business challenges. Todd

SCOTT Olu?waf??mi TAYLOR

The Data Whisperer | Data Storytelling | Data Puppets | DataVengers | Keynoter | Brand Content | Event MC/Host | DataIQ100 | Onalytica Who’s Who | CDOMag Top Consultant | 5X Data Marathon Host | Dataversity Top10 Blogger

3 年

So unstructured data can turn into value if you start to structure it?

要查看或添加评论,请登录

Bill Inmon的更多文章

  • POW WOW DENVER - MARCH 2025

    POW WOW DENVER - MARCH 2025

    THE DENVER POW WOW – March 2025 It was a lazy mid March Saturday afternoon and it was a warm day in Denver. Every year…

    1 条评论
  • STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

    2 条评论
  • THE TEXT MAZE

    THE TEXT MAZE

    THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

    2 条评论
  • BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

    17 条评论
  • ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

    7 条评论
  • A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…

  • MESSAGE TO ELON

    MESSAGE TO ELON

    MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

    73 条评论
  • GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

    5 条评论
  • BUILDING THE LLM - PART VI

    BUILDING THE LLM - PART VI

    BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

    3 条评论
  • BUILDING THE LLM - PART V

    BUILDING THE LLM - PART V

    BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

    2 条评论

社区洞察

其他会员也浏览了