A SHORT TUTORIAL ON METADATA
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
A SHORT TUTORIAL ON METADATA
By W H Inmon
In the structured world, metadata is a part of the infrastructure for defining the structure itself. For many reasons metadata is absolutely essential for the management and usage of structured data. The structured world requires such metadata as –
?? Key definition
?? Attribute definition, including the name of the attribute
?? Table name definition
?? Index name
?? And so forth
Metadata is an explicit and necessary component of structured data. Structured metadata is explicitly defined as the structure is built and disclosed to the data base management system. As such structured metadata is always available, even if it is in the background.
Metadata is also a necessary and important component of text and text analytics. You can’t understand or interpret text properly without metadata. However, metadata in text is very, very different from metadata in the structured world. Metadata in the two worlds are as different as chalk and cheese.
The differences in metadata found in the two worlds begin with the fact that structured metadata is defined looking inward to the corporation and textual metadata is defined looking externally from the corporation. Structured metadata describes such things as customer, product, and sale. Textual metadata describes the text as the text is understood by the general population.
But this is only the beginning of the differences in metadata in the two environments.
Another significant difference in the forms of metadata is that structured metadata describes specific instances of data. A customer has an account number. A product has a part number. A sale has a date, and a location, and so forth. Structured metadata can be used to identify a specific instance of data.
But textual metadata does not specify individual occurrences of data. Textual metadata is built around the classifications of text. A classification of trees might be elm, oak, pecan, and sycamore. Some types of a car are a Porsche, a Ford, a Honda, and a Toyota. Some types of a profession are a preacher, policeman, actor, and a salesperson
领英推荐
The same occurrence of text can typically be classified multiple ways, according to the larger categorizations of text to which the text belongs. For example, an auto manufacturer may have classifications of drive train parts, assembly instructions, shipping data, and sales price. A categorization of banks may have classifications of text based on currencies, ATM operation, handling of insufficient funds, and so forth.
In order to illustrate the phenomenon of the different ways text can be classified, consider this simple scenario –
Two men ae standing on a street corner and a young lady walks by. One man says to the other – “She’s hot.”
Now what does “hot” mean? It can mean many things.
One interpretation is that the man finds the young lady to be attractive. He would like to have a date with her.
The lady is in Houston Texas on a July day and she is pouring sweat. She is physically hot.
The two men are doctors and the young lady has just had her temperature taken. She has a fever. Her body is hot.
Making the wrong interpretation of the meaning of “hot” can lead to very awkward conclusions and circumstances.
So how should “hot” be interpreted? The answer is that there is a major categorization or description of the circumstances in which the word is said. If the context of the conversation is a hospital the context would be hot/ill. If the context is Houston, Texas, the context would be hot/sweating. If the context is two single men looking at a beautiful lady, the context would be hot/attractive.
So all text requires context in order to be interpreted properly. Text without context is simply not useful for any kind of analysis.
So where does context come from? In the case of text, context has to be inferred from the text itself. The text surrounding the word in question is the first clue as to the context. The larger descriptive category of the circumstances surrounding the text is the second clue. The third clue is the normal definitions and/or interpretations of the word being analyzed.
One of the anomalies of inference of context is that it is NEVER perfect. Suppose the young woman was entering a hospital in Houston. She could be both ill and sweating at the same time. The very nature of text and text analytics is that determinations of context are made probabilistically. This means that in some instances the context will not be interpreted correctly. But as long as the context is interpreted correctly the majority of the time, that is what suffices in the world of textual metadata.
?
Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena. Last night Lena figured out how to open the doggie door going into the house. Bill was sitting on the couch and Lena suddenly appeared. A new latch was put on the doggie door. But it was nice to see Lena and she joined Bill on the couch to get her belly rubbed.
Software Data Engineer
1 个月Structured metadata: Crystal clear Unstructured metadata: Context determines the true meaning. Love your posts, dear Bill ????
Presales Solution Strategist @ Quest Software | Data Modeling, Data Intelligence
1 个月Scarlett Johansson running a fever in Dubai would objectively be hot, hot, hot.
Engineering Data with Passion: Making Data Dreams Come True!
2 个月Thanks for this short yet insightful explanation on Metadata
Building information systems for the benefit of all Taxonomy | Ontology | KG | InfoSci
2 个月Bill Inmon love this explanation.
Cognitive architectures
2 个月I am waiting for the day when perfect context interpretation of the word "I" will be made clear to me.