TEXTUAL METADATA INFRASTRUCTURE

TEXTUAL METADATA INFRASTRUCTURE

TEXTUAL METADATA INFRASTRUCTURE

By W H Inmon

In a recent post about the metadata infrastructure required for textual data, it was stated that when text is formatted into a structured format that a different approach to structuring the data found in text is needed. This is due to the inherent flexibility and free form format required by text. Text is fundamentally different than structured data. Anyone can say anything or write anything they want. There is no prescribed format or constraints for speech and the written word. Text is inherently free form. As such a different structuring of data is needed when text is converted to a structured format.

It is true that text needs to be placed into a structured format in order to be analyzed using standard analytical processing. The need for a structured format is dictated by analytical software – Excel, Tableau, knowledge graph data bases, and so forth. But even so, there are some inherent differences between a classical structured structuring of data and a textual structuring of data.

In order to understand these differences consider a simple classical data base structure –


The simple structured data base shows that the column headers directly specify – constrain - what data is to be found in the column. In the column for school, only a school belongs there. In the column for city, only cities and towns belong there. The column header directly describes and constrains the contents of the column.

Now consider a data base approach for textual data –



In the textual data base, the column heading for word indirectly describes what is in the column. The column for word contains only words. But the words can be anything. The column heading for word places no constraints on what the content might be for the column. The direct description of the contents of the word column are described by another column – context. The data found in context in the same row describes the contents of the data found in word.

The column “context” allows for there to be a direct description of the contents of the word column.

In a structured data base there are only direct column headings. In the textual data base there are indirect and direct descriptors of data.

The ability to have many types of data in the word column allows the structured data base for text to accommodate undisciplined and unstructured text.

?

Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena.? It is a nice cool winter day in Denver. It snowed last night but melted off by noon. Jeb and Lena went out to the back yard in the sun and played. Lena is faster than Jeb because she is younger than him. But Lena puts up with Jeb in any case.

Metadata Everyone says it Nobody does it

John O'Gorman

Disambiguation Specialist

3 个月

Bill - With respect, the column 'Name' by itself doesn't appear to have any constraints either. "Jeb' and 'Lena' qualify as members of the 'Name' set and you could likely get away with 'Denver' as a City value associated with those two records. School value would be a challenge though.

Bill Inmon... apologies in advance. The words "Textual Metadata Infrastructure" sounds to me like they could use a Megazord suffix... like some monstrous mechanical armed vehicle robot Zord thing that the Power Rangers would summon to action to combat intergalactic villainy. In all seriousness... I recognize your work to make an advantage of the monstrous volume of undisciplined and unstructured text to be impactful and essential. How could we be saving and yet not be using unstructured data all these years? With modern tools and a little INMONnovation... it looks like advantages of making use of unstructured data are just around the corner.

Interesting. As the context column is filled in , it will add relations, I would say a LLM is well fit to fill the context with business terminology and then its matter of denormalize and query ,

回复
Andrés Useche Gómez

Data Warehouse Architect in Hach

3 个月

Hi Bill Inmon, as you mention, free text is more complex than structured data, in real life words can be misspelled, for example, “desert” can have a different meaning depending on the context, but at the same time it can be misspelled as “dessert”. What is your suggestion to deal with these problems, misspellings and ambiguous meanings?. Thanks

回复

要查看或添加评论,请登录

Bill Inmon的更多文章

  • STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

    STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

    2 条评论
  • THE TEXT MAZE

    THE TEXT MAZE

    THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

    2 条评论
  • BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER

    BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

    17 条评论
  • ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY

    ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

    7 条评论
  • A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON

    A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…

  • MESSAGE TO ELON

    MESSAGE TO ELON

    MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

    73 条评论
  • GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

    GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

    5 条评论
  • BUILDING THE LLM - PART VI

    BUILDING THE LLM - PART VI

    BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

    3 条评论
  • BUILDING THE LLM - PART V

    BUILDING THE LLM - PART V

    BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

    2 条评论
  • BUILDING THE LLM - PART IV

    BUILDING THE LLM - PART IV

    BUILDING THE LLM – Part IV By W H Inmon The value of a generic industry language model becomes apparent when looking at…

    2 条评论

社区洞察

其他会员也浏览了