LIFE ON THE MOON
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
LIFE ON THE MOON
By W H Inmon
You grew up on earth. You know about breathing. You know about gravity. You know what a glass of water tastes like. What grass and a running stream look like. Then one day you wake up on the moon and everything you thought you knew is no longer valid. It is a shock.
This scenario is what happens when you make the transition from structured data to the world of text. You grew up in and made your career in a world of structured data. You learned about records. Keys. Data models. Attributes and a whole lot of other facts about structured data. Then one day you find yourself in the world of text. And all of the things that you thought you once knew are no longer to be found. Kind of like trying to live on the moon.
There are many, many significant differences between the structured world and the world of text. One of the most basic is this. Structured data is precise. Textual data is probabilistic.
As an example of precise data, you walk into the bank and write a check for $50.00. The bank better interpret that check as exactly $50.00. Not $49.95. Not $51.00. The bank had better process the check for exactly $50.00. Such is structured data. Precise.
Now lets consider textual data. A doctor in his notes writes the term – “ha”. What does “ha” mean? If it is a heart doctor that wrote the notes “ha” probably means “heart attack”. But on occasion the heart doctor will mean “head ache”. 99% of the time ha means heart attack. But occasionally it doesn’t.
Or suppose you are doing voice transcription. 95% of the time the voice of the speaker is understood and is transcribed properly. But 5% of the time the voice of the speaker is misunderstood and the word is transcribed improperly.
Or suppose that you see a letter for a gentleman named John Smith. You look John Smith up in a directory. You find a John G Smith. Is John Smith the same as John G Smith? Maybe it is. Maybe it isn’t.
You are reading a textual document that has gone through OCR. You see the word “elophant”. You assume it is the word elephant. The OCR technology has encountered an inking defect and could not clearly detect what was written. However, there is a lady named Trudy Elophant. How do you make the interpretation?
You see the word “fire” in text. What does fire mean? Does it mean a conflagration? Does it mean the pulling of the trigger of a gun? Does it mean a boss has rid himself/herself of an unwanted employee? You need context in order to make the proper interpretation.
And the list of ambiguities that arise in handling text continues to rise. Processing text is NOTHING like processing structured data. It is patently a mistake to think that you are going to apply the rules of structured data to the world of text. In a word, when dealing with text, you are not dealing with a world of precision. You are dealing with a world where assumptions have to be made in order to interpret what was written. Your assumptions may not be correct and may result in an imprecise understanding. And this lack of precision is simply very uncomfortable to a person who has spent their professional career in a world that was precise.
And the differences between precision and probability are just the tip of the iceberg in understanding the two different environments.
So why would a person want to embrace the world of text anyway? The answer is that there are huge amounts of business value in text that are just waiting to be discovered. No one is even looking in that world in a methodical manner. Today’s ignorance of text is like being in California in 1848. We are told by historians that in 1848 you could just walk down to the stream and pick up gold. No one was even looking for gold in 1848. ?Once gold was discovered at Sutter’s Mill that was no longer true. But in the early days the gold was just sitting there with no one paying attention. Kind of like text today.
Text is just waiting there for someone to discover. Will it be you?
?
Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena. Last night we had a crisis. We ran out of dog food. Sylvia had to run to the store and get a bag before Jeb went crazy. Lena let Jeb do all the complaining. But she ate her food when Sylvia brought it from the store in any case.
AI & Ethics | Digital Experience | Advanced technologies & Quantum Computing
5 小时前Structured data is familiar ground, but as Bill shows, there is a real opportunity in text. Sure, it’s messy and uncertain, but that’s where the value is waiting. The question is—are we ready to dig in and find it?
Software tester/developer Python/SQL
1 天前What a beautiful, appealing text!? It is not unexpected that the most important philosophers of the 20th century have already thought deeply about this.? And that field of study is called hermeneutics: https://plato.stanford.edu/entries/hermeneutics/ Meanings do not come about through interpretation, but a reader understands a text insofar as he brings the necessary meanings with him. A post in a newspaper can be understood directly and completely by an adult - no interpretation necessary - while for an 8-year-old child not all the necessary meanings are yet available for this. If you read a book or watch a movie for pleasure, you want to understand it directly and not tediously interpret it, right? This is only possible if the meanings are already there (with in the reader). A good text, for example a good book or an article or a news item in the newspaper, or a good movie instantly pull you into a different reality; no cumbersome interpretation is required for that.??
Data,AI & Hyperautomation
1 天前Your moon landing analogy is spot on! Let's embark on this journey to the textual frontier!
What if you woke up one day... and Bill Inmon was suddenly calling himself Bill Inmoon? Welcome to the Ambiguniverse of unstructured data. Given context... there's gold in that thar' data.
Disambiguation Specialist
1 天前Bill Inmon - What do you think about using a combination of business glossary and instance data to reduce those ambiguities you mention?