Building a Data Mesh Platform: ChatGPT as a Game-Changer for Data Product Delivery
Photo by Krivec Ales: https://www.pexels.com/photo/mountainous-valley-with-evergreen-forest-against-misty-sky-580151/

Building a Data Mesh Platform: ChatGPT as a Game-Changer for Data Product Delivery

Introduction:

As the world increasingly depends on data to guide decisions, businesses must adapt to remain competitive. With the rapid evolution of data engineering and data science, leveraging the appropriate tools and strategies for efficient data consumption is crucial. We have embarked on a journey to construct a state-of-the-art data mesh platform. One of our objectives is to render data products more accessible and valuable to our users.

What if ChatGPT Served as an Output Port for Data Products?

As we deliver data products, we are exploring innovative methods to make data more accessible and valuable to our users. One potential solution could be ChatGPT, an AI language model created by OpenAI.

ChatGPT can process vast amounts of data, extract insights, and convey those insights in a natural language format. By integrating ChatGPT into our data mesh platform, we can offer users a robust, conversational interface for engaging with data products. This would empower them to:

  • Ask questions and receive relevant, concise responses
  • Generate reports and summaries on demand
  • Identify trends and anomalies in data
  • Access real-time analytics and make data-driven decisions

Enabling New Possibilities

As we advance in developing our data mesh platform, integrating solutions like ChatGPT presents numerous advantages:

  1. Enhanced User Experience: ChatGPT enables users to interact with data in a more natural, conversational manner, lowering the learning curve and making data more approachable.
  2. Increased Data Literacy: By providing a user-friendly interface, ChatGPT encourages users to engage with data products, promoting data literacy throughout the organization.
  3. Improved Collaboration: ChatGPT can help bridge the gap between technical and non-technical team members, fostering better communication and collaboration around data-driven projects.

How Could This Work?

To assess the current state of GPT models, we have built a small demo based on the Titanic dataset, available on Kaggle. We employed chain-of-thought prompting to evaluate human queries on datasets. All prompts below used ChatGPT 3.5 directly out of the box and connected the Titanic dataset using the Pandas agent provided by Langchain.

For instance, we asked, "What is the average age of the people who survived on the Titanic?" ChatGPT was able to accurately translate this into Pandas code that returned the correct age: 28.34 years.

Simple retrieval questions like "Was there a Fernando on board?" resulted in "No" or "What can you tell me about the oldest passenger on board?" led to "The oldest passenger on board was a male, aged 80, named Barkworth, ...". We tried more advanced queries and all were answered successfully!

Granting ChatGPT access to an agent that it can use to query data results in an entirely new way of utilizing these models. It reduces the need to hallucinate and (in this small demo) was capable of basing its analysis on facts.

What Do I Think This All Means?

As we progress further in our journey toward a data mesh architecture, we have delivered numerous data products which are well-defined, documented, readable, and easily accessible for end-users. Coincidentally, this is the ideal input for ChatGPT to begin interacting with the data, as this makes it straightforward to include all relevant information, context, and understanding in prompts.

Further Steps

All concepts and use cases are still in very early stages, and a lot of unknowns remain. I am highly enthusiastic about the possibilities while also remaining mindful of the risks this technology poses. I would love to hear your thoughts in the comments, especially if you are also interested in this sort of use case!

Haime Croeze

Domein Architect Data bij ABN AMRO, Eigenaar HCWebAdvies.nl, Mede-oprichter Stichting Digital Natives of Aruba & Bestuurslid Stichting Uit Welke Beker

1 年
Geert-Jan Verdonk

Executive Data Steward | Data Governance Lead at Vanderlande Industries | Certified Analytics Translator

1 年

Like you said, many unknowns to explore, like data governance, data security, data privacy

回复
Ricardo Jacobs

Business Intelligence Consultant at Nippur

1 年

I really like the way this is heading! As I remember correctly, ThoughtSpot already had these kind of features in 2020 ( although the linguistic model was not that advanced). Also PowerBI has a similar functionallity with their Q&A feature. I think it becomes really interesting when a natural Language interface will be put on top of a complicated Entity model. The Language model would have to find the correct relationships between entities to answer the questions.

Harm Bodewes

Experienced Leader in AI, data and ICT

1 年

Ha Geert en oud-Vanderlande collega’s, ook hier bij Bynder zijn we met vrijwel hetzelfde bezig: opbouwen data mesh rondom ons DAM platform (in Snowflake/Dbt/AWS) en daarop AI features bouwen om - in ons geval - het zoeken van digital content te vergemakkelijken. We kijken daarbij naar zowel features van AWS (speech recognition, image recognition), maar ontwikkelen ook zelf (deels met partners). Meer mag en kan ik er op LinkedIn niet over vertellen, bel maar een keer als je er meer over wilt weten. Het zijn boeiende tijden!!!

Louise Clement

Co-Founder & Engagement Director at Data Leaders

1 年

If others wish to join please PM me. All our peer exchanges are vendor free

要查看或添加评论,请登录

社区洞察

其他会员也浏览了