Reimagining Data Teams in the era of Generative AI
Innovative data teams are already improving their productivity with the assistance of Generative AI (GenAI) using ChatGPT, GitHub Copilot, Gemini, etc.
From data collection to deployment there is a place for Large Language Models (LLMs) in your data team.
The journey often starts with a time-sensitive request from the business team:
Hi team, could someone help me understand all our expenses larger than $1K broken down by the account manager in the last quarter? Please send ASAP before the budget review tomorrow, thanks!
If we are lucky, the data might already be in the database so we can skip collection, cleaning, to focus on analysis and deploying / sharing the results to fulfill the request.
In the era of ChatGPT, we see modern data teams using the following components:
Best Practices
The workflow today is mostly manual, copy-paste the context into the LLM to make it aware of our data context, craft the prompt, wait for the code to generate (usually SQL), copy-paste the code back to your data analytics platform to deploy a dashboard that answers this question to the stakeholder.
The current practices leave a lot of room for improvement, we can do better.
First, we need to stop copy-pasting context files around and start improving them together. One easy solution is to check-in the context file into GitHub and collaborate in GitHub directly.
Second, collect completion data, data is gold! This will be very useful when your team starts fine-tuning models. One easy way is to log the arguments and responses sent to your LLM API, but beware that it takes months to collect significant data so double-check your retention policies.
领英推荐
Third, you can consider prompt frameworks like DSPy to help your team write better, more reusable prompts.
Reimagining Collaboration
However, we can do so much better, by reimagining collaboration.
In the era of Generative AI, we can't afford to be the bottleneck of the organization, copy-pasting into LLMs and waiting for results to pop -- business users demand instant answers to compete in the market. Business users want to use Midjourney to produce reasonable designs in seconds, answer general questions instantly with ChatGPT, and produce data-driven insights themselves. The game has changed.
Innovative data teams know this, Retrieval Augmented Generation (RAG) is one of the top skills being developed, but that's just the start.
We believe the future of data teams is to provide conversational experiences connected to their private data (databases, data lakes, services, documents, etc) to democratize data and maintain their competitive edge.
The job of the data team is no longer to produce insights, but to build the machine that builds the machine.
At Hal9 we help data teams provide conversational data experiences to your business users, but there are other resources and initiatives worth considering as well like PrivateGPT, GPT4All, LocalGPT and the like.
Reach out to us at Hal9, request a demo, see you soon!