Building a Data Mesh Platform: ChatGPT as a Game-Changer for Data Product Delivery
Introduction:
As the world increasingly depends on data to guide decisions, businesses must adapt to remain competitive. With the rapid evolution of data engineering and data science, leveraging the appropriate tools and strategies for efficient data consumption is crucial. We have embarked on a journey to construct a state-of-the-art data mesh platform. One of our objectives is to render data products more accessible and valuable to our users.
What if ChatGPT Served as an Output Port for Data Products?
As we deliver data products, we are exploring innovative methods to make data more accessible and valuable to our users. One potential solution could be ChatGPT, an AI language model created by OpenAI.
ChatGPT can process vast amounts of data, extract insights, and convey those insights in a natural language format. By integrating ChatGPT into our data mesh platform, we can offer users a robust, conversational interface for engaging with data products. This would empower them to:
Enabling New Possibilities
As we advance in developing our data mesh platform, integrating solutions like ChatGPT presents numerous advantages:
领英推荐
How Could This Work?
To assess the current state of GPT models, we have built a small demo based on the Titanic dataset, available on Kaggle. We employed chain-of-thought prompting to evaluate human queries on datasets. All prompts below used ChatGPT 3.5 directly out of the box and connected the Titanic dataset using the Pandas agent provided by Langchain.
For instance, we asked, "What is the average age of the people who survived on the Titanic?" ChatGPT was able to accurately translate this into Pandas code that returned the correct age: 28.34 years.
Simple retrieval questions like "Was there a Fernando on board?" resulted in "No" or "What can you tell me about the oldest passenger on board?" led to "The oldest passenger on board was a male, aged 80, named Barkworth, ...". We tried more advanced queries and all were answered successfully!
Granting ChatGPT access to an agent that it can use to query data results in an entirely new way of utilizing these models. It reduces the need to hallucinate and (in this small demo) was capable of basing its analysis on facts.
What Do I Think This All Means?
As we progress further in our journey toward a data mesh architecture, we have delivered numerous data products which are well-defined, documented, readable, and easily accessible for end-users. Coincidentally, this is the ideal input for ChatGPT to begin interacting with the data, as this makes it straightforward to include all relevant information, context, and understanding in prompts.
Further Steps
All concepts and use cases are still in very early stages, and a lot of unknowns remain. I am highly enthusiastic about the possibilities while also remaining mindful of the risks this technology poses. I would love to hear your thoughts in the comments, especially if you are also interested in this sort of use case!
Domein Architect Data bij ABN AMRO, Eigenaar HCWebAdvies.nl, Mede-oprichter Stichting Digital Natives of Aruba & Bestuurslid Stichting Uit Welke Beker
1 年Marianne Pot
Executive Data Steward | Data Governance Lead at Vanderlande Industries | Certified Analytics Translator
1 年Like you said, many unknowns to explore, like data governance, data security, data privacy
Business Intelligence Consultant at Nippur
1 年I really like the way this is heading! As I remember correctly, ThoughtSpot already had these kind of features in 2020 ( although the linguistic model was not that advanced). Also PowerBI has a similar functionallity with their Q&A feature. I think it becomes really interesting when a natural Language interface will be put on top of a complicated Entity model. The Language model would have to find the correct relationships between entities to answer the questions.
Experienced Leader in AI, data and ICT
1 年Ha Geert en oud-Vanderlande collega’s, ook hier bij Bynder zijn we met vrijwel hetzelfde bezig: opbouwen data mesh rondom ons DAM platform (in Snowflake/Dbt/AWS) en daarop AI features bouwen om - in ons geval - het zoeken van digital content te vergemakkelijken. We kijken daarbij naar zowel features van AWS (speech recognition, image recognition), maar ontwikkelen ook zelf (deels met partners). Meer mag en kan ik er op LinkedIn niet over vertellen, bel maar een keer als je er meer over wilt weten. Het zijn boeiende tijden!!!
Co-Founder & Engagement Director at Data Leaders
1 年If others wish to join please PM me. All our peer exchanges are vendor free