登录查看更多内容

TEXTUAL METADATA INFRASTRUCTURE

Bill Inmon

Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member

发布日期: 2024年12月11日

+ 关注

By W H Inmon

In a recent post about the metadata infrastructure required for textual data, it was stated that when text is formatted into a structured format that a different approach to structuring the data found in text is needed. This is due to the inherent flexibility and free form format required by text. Text is fundamentally different than structured data. Anyone can say anything or write anything they want. There is no prescribed format or constraints for speech and the written word. Text is inherently free form. As such a different structuring of data is needed when text is converted to a structured format.

It is true that text needs to be placed into a structured format in order to be analyzed using standard analytical processing. The need for a structured format is dictated by analytical software – Excel, Tableau, knowledge graph data bases, and so forth. But even so, there are some inherent differences between a classical structured structuring of data and a textual structuring of data.

In order to understand these differences consider a simple classical data base structure –

The simple structured data base shows that the column headers directly specify – constrain - what data is to be found in the column. In the column for school, only a school belongs there. In the column for city, only cities and towns belong there. The column header directly describes and constrains the contents of the column.

Now consider a data base approach for textual data –

领英推荐

Synthesizing Multi-Table Databases: Model Evaluation &…

Vincent Granville 9 个月前

The untapped potential of unstructured data

KX 8 个月前

Pioneering Efficiency: A Leader’s Guide to Data…

Foursquare 1 年前

In the textual data base, the column heading for word indirectly describes what is in the column. The column for word contains only words. But the words can be anything. The column heading for word places no constraints on what the content might be for the column. The direct description of the contents of the word column are described by another column – context. The data found in context in the same row describes the contents of the data found in word.

The column “context” allows for there to be a direct description of the contents of the word column.

In a structured data base there are only direct column headings. In the textual data base there are indirect and direct descriptors of data.

The ability to have many types of data in the word column allows the structured data base for text to accommodate undisciplined and unstructured text.

Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena.? It is a nice cool winter day in Denver. It snowed last night but melted off by noon. Jeb and Lena went out to the back yard in the sun and played. Lena is faster than Jeb because she is younger than him. But Lena puts up with Jeb in any case.

Maria del Mar Bonet

3 个月

Metadata Everyone says it Nobody does it

1 次回应

John O'Gorman

Disambiguation Specialist

3 个月

Bill - With respect, the column 'Name' by itself doesn't appear to have any constraints either. "Jeb' and 'Lena' qualify as members of the 'Name' set and you could likely get away with 'Denver' as a City value associated with those two records. School value would be a challenge though.

1 次回应

Fred Lardaro

3 个月

Bill Inmon... apologies in advance. The words "Textual Metadata Infrastructure" sounds to me like they could use a Megazord suffix... like some monstrous mechanical armed vehicle robot Zord thing that the Power Rangers would summon to action to combat intergalactic villainy. In all seriousness... I recognize your work to make an advantage of the monstrous volume of undisciplined and unstructured text to be impactful and essential. How could we be saving and yet not be using unstructured data all these years? With modern tools and a little INMONnovation... it looks like advantages of making use of unstructured data are just around the corner.

3 次回应

Tamer Chowdhury

3 个月

Interesting. As the context column is filled in , it will add relations, I would say a LLM is well fit to fill the context with business terminology and then its matter of denormalize and query ,

Andrés Useche Gómez

Data Warehouse Architect in Hach

3 个月

Hi Bill Inmon, as you mention, free text is more complex than structured data, in real life words can be misspelled, for example, “desert” can have a different meaning depending on the context, but at the same time it can be misspelled as “dessert”. What is your suggestion to deal with these problems, misspellings and ambiguous meanings?. Thanks

查看更多评论

要查看或添加评论，请登录

Bill Inmon的更多文章

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

2025年3月14日

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

STREAMLINING THE EMERGENCY ROOM By W H Inmon The emergency room of the hospital is where people turn to when they have…

2 条评论
THE TEXT MAZE

2025年3月12日

THE TEXT MAZE

THE TEXT MAZE By W H Inmon A really interesting question is – why does text befuddle the computer? The fact that 80% or…

2 条评论
BLAME IT ALL ON GRACE HOPPER

2025年3月9日

BLAME IT ALL ON GRACE HOPPER

BLAME IT ALL ON GRACE HOPPER By W H Inmon One of the more interesting aspects about the world of IT is that IT people…

17 条评论
ASSOCIATIVE RECALL AND REALITY

2025年3月1日

ASSOCIATIVE RECALL AND REALITY

ASSOCIATIVE RECALL AND REALITY By W H Inmon A while back, on a Saturday night, my wife and I were looking for a movie…

7 条评论
A FIRESIDE CHAT WITH BILL INMON

2025年2月28日

A FIRESIDE CHAT WITH BILL INMON

A FIRESIDE CHAT WITH BILL INMON Get Bill’s perspective on your IT organization and its initiatives. Come spend an hour…
MESSAGE TO ELON

2025年2月18日

MESSAGE TO ELON

MESSAGE TO ELON By W H Inmon Yesterday Elon Musk tweeted a message asking if anyone had some innovative ways to improve…

73 条评论
GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

2025年2月10日

GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

GREAT EXPECTATIONS: WALT DISNEY AND THE PENTAGON By W H Inmon Think of all the delight Walt Disney has brought the…

5 条评论
BUILDING THE LLM - PART VI

2025年2月5日

BUILDING THE LLM - PART VI

BUILDING THE LLM – Part VI By W H Inmon The language model is an interesting piece of technology. There are many facets…

3 条评论
BUILDING THE LLM - PART V

2025年2月4日

BUILDING THE LLM - PART V

BUILDING THE LLM – Part V By W H Inmon The generic industry language model has at a minimum three important elements of…

2 条评论
BUILDING THE LLM - PART IV

2025年2月3日

BUILDING THE LLM - PART IV

BUILDING THE LLM – Part IV By W H Inmon The value of a generic industry language model becomes apparent when looking at…

2 条评论

See all articles

TEXTUAL METADATA INFRASTRUCTURE

Bill Inmon

Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member

领英推荐

Bill Inmon的更多文章

社区洞察

其他会员也浏览了

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

Understanding the Power of OWL in Information Modeling: A Comparison of Data Architects and Ontologists

Transforming Data into Insights: The Evolution of Data Analytics

Disrupting the Data Storage Landscape: How Vector Databases are Revolutionizing Traditional Storage Methods

DataGradients: Extract Actionable Insights from Your CV Datasets with One Line of Code

How to Read Graph DataBase Benchmarks (Part-1)

The Vision Behind Compute.AI: Empowering Enterprises for a New Era of Data Intelligence

领英推荐

Bill Inmon的更多文章

STREAMLINING THE EMERGENCY ROOM - TEXTUAL ETL

THE TEXT MAZE

BLAME IT ALL ON GRACE HOPPER

ASSOCIATIVE RECALL AND REALITY

A FIRESIDE CHAT WITH BILL INMON

MESSAGE TO ELON

GREAT EXPECTATIONS:WALT DISNEY AND THE PENTAGON

BUILDING THE LLM - PART VI

BUILDING THE LLM - PART V

BUILDING THE LLM - PART IV

社区洞察

其他会员也浏览了

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Addressing DBMS Innovation Stagnation with Hyperlinks as Super Keys

Forrester changed the way they think about data catalogs. Here’s what you need to know.

Come Hell or High Water: Some Lessons from Four Years of Data Mesh Implementations Learned the Hard Way: Lesson One

Understanding the Power of OWL in Information Modeling: A Comparison of Data Architects and Ontologists

Transforming Data into Insights: The Evolution of Data Analytics

Disrupting the Data Storage Landscape: How Vector Databases are Revolutionizing Traditional Storage Methods

DataGradients: Extract Actionable Insights from Your CV Datasets with One Line of Code

How to Read Graph DataBase Benchmarks (Part-1)

The Vision Behind Compute.AI: Empowering Enterprises for a New Era of Data Intelligence