BUILDING ON A SOLID FOUNDATION
Bill Inmon
Founder, Chairman, CEO, Best-Selling Author, University of Denver & Scalefree Advisory Board Member
BUILDING ON A SOLID FOUNDATION
By W H Inmon
CHATgpt is cool. It is elegant It is simple in concept. It is really fantastic. Everybody loves it.
But CHATgpt has a fatal and systemic flaw. As cool as CHATgpt is, it does not understand the difference between –
???Go find me the data
??Go find me the believable data
These propositions sound the same, but they are not the same thing at all. And the difference between the two is very important, very profound.
As a simple example of the value of believable data, I once sat in on a management meeting for a company. The president asked accounting - How much revenue did we make last quarter. The he asked marketing. Then he asked sales. Each of the different organizations had a different answer for how much last quarter’s revenue was. The issue was not finding data but finding data that was believable and accurate. And making important corporate decisions on data that is unbelievable is a supremely bad practice.
领英推荐
As long as CHATgpt insists that the problem is just finding data, CHATgpt will never advance beyond becoming an elegant, sexy, fun toy.
CHATgpt makes the assumption that finding data is the problem. They are not aware that finding data is only the start of doing analysis. But CHATgpt is in good company in having this misperception. Other popular technologies have made this same fundamental mistake. Indeed AI, ML, data mesh, et al assume that there is this solid and believable foundation of data to operate on?The truth is that there is no such foundation.
So what happens when an entirely new technology builds on a mythical foundation? What happens is that the technology is on shaky ground. The technology has built its existence on beach sand, rather than bedrock. And what happens when the first storm comes along? The technology doesn’t work and provides non sensical answers.
Stated differently, AI/ML/DataMesh/CHATgpt all produce very erroneous results when the data they operate on is unreliable. GIGO.?
So why isn’t there a solid foundation of data to operate on? There are many reasons for the lack of a solid foundation of believable, reliable data on which to place elegant technologies. The main reason why there is no solid foundation is that vendors and consultants don’t want to get involved with the job of integrating data.?Structured data. Textual data. Analog data, et al. Integrating data is a dirty, complex tedious job and it is thankless. In order to do the integration of data, you have to get your hands dirty. And vendors and consultants hate getting their hands dirty.
Integrating data is like planting tomatoes in the springtime. If you are going to plant tomatoes, you are going to get you hands dirty. That is just the way it is.
Now there is now denying that CHATgpt is very sexy. It is elegant. It is attractive. In a word it is cool. And investors in the world just love cool. But when it comes to investors and cool, what has the industry track record been? Consider Elizabeth Holmes and Theranos. Now Elizabeth Holmes and Theranos were cool. But that is all that they were. Or how about Bankman-Fried and FTX? Bankman-Fried and FTX were mysterious. Full of opportunity to make easy money based on the genus of a young wizard.. That was really cool. Something to think about. Investors just love to throw their money away on cool.
Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena. It is snowing right now. A lot. More snow than Jeb and Lena are tall. So Jeb and Lena sit comfortably on the porch and wait for a thaw. But they still want their cookies and treats.
Data Warehouse Engineer | Data Engineer
2 年Not many speaks about building a roboust framework which brings data from various systems in a systematic way with auditing to ensure data at consumption layer is same as what is at source. In many places I see lack of auditing process, which leads to lack of confidence on data.
Author | Business Developer | Computational Linguist | Marketer | Researcher | Statistician
2 年Bill, well said, but you ought to speak to the tech... (1) NLG has been around for +4 years. (2) It's based on the law of probabilities (i.e., the chances a word follows another). (3) NLG is not scraped data. (4) ChatGPT can't return facts or logic, no matter how well you prompt it. Besides, who likes to read AI-generated content? It's so boring!
Data Engineering | Data Strategy and Data Landscaping
2 年Nicely said Mr. Bill Inmon!! Data Quality is not really first priority concern for any AI chatbot. As a result it becomes a huge pile of misinformation. But definitely there are really few use cases where pure truth doesn’t matter or near-pure truth is sufficient! For those use cases, it might fit well.
Co-Founder, CXO; Data Trust for GenAI; Startup Advisor
2 年?? the no nonsense advise from the master creator Bill Inmon! Chatgpt is ?? now but then it comes ??, try "Go find me the believable data". Hopefully, it will learn the tricks in due time ;)
Growth Marketer | Product Marketer | AI Ecosystem Builder | 15 Yrs Cutting-Edge B2B & B2C Tech | Founder AskAI.org | Tech Committee Lead for National AI Standard | Creative, Collaborative, Hands-On??
2 年Great article Bill Inmon. I think it's the case that folks are currently fascinated by the compositional capability of ChatGPT more than the accuracy of the data / content it produces. Writing a poem in the spirit of Keats and the voice of Elvis Presley is heavy on tone, light on factual accuracy. Delivering an analysis of last quarter's sales numbers and explaining what that means for next quarter's sales stragey is the opposite. Garbage in, garbage out as you say. Therefore, data platform technologies (e.g. 'data network' architectures such as dataware) that attack data fragmentation by reducing data silos and copy-based data integration will become increasingly important...not to achieve the mythical "single source of truth" but to establish a simpler, less duplicated data ecosystem that can better and more accurately service AI tools like ChatGPT. The upshot of eliminating silos is not only the reduction (and eventual elimination) of data integration overheads, but more accurate data lineage plus genuinely effective controls that provide data owners with agency over who can view/edit/query their data, another critical outcome that's been largely side-stepped in our collective fascination with "artificial composition".