Unifying Data & Gen AI / LLM platforms
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
AI / Gen AI challenges for a Data platform
As a Data and AI/ML practitioner, I have always wondered as to why we have such a big disconnect between the business intelligence (BI) and AI/ML worlds.
Data is a key ingredient for both BI and AI/ML, and enterprise data provides the strategic differentiation for most use-cases. Given this, why do we still need separate platforms, tooling, managed by DataOps and MLOps pipelines, respectively?
The ideal world should look something like the reference architecture below:
Following the medallion architecture, source data (both structured and unstructured) is ingested into the Bronze layer, where it is cleansed and standardized into the Sliver layer, with further modeling and transformation into the Gold layer. The data is now ready for consumption by both BI — Dashboarding tools & machine learning (ML) pipelines.
In reality, however, we see that this curated / processed data is moved to another location, e.g., cloud storage buckets, or another data lake, where it is further transformed as part of ML training (LLM fine-tuning) and deployment.
So Fig. 1., in an enterprise landscape, looks something like Fig. 2 (below). The data (pre-)processing part of a ML pipeline focuses on moving data from the Source to ML model, without necessarily including how the model executes on the data itself.
Needless to say, this results in redundancy and a fragmentation of the BI and AI/ML pipelines. Snowflake has been leading the way here in terms of unifying the two worlds. In the rest of this article,
we deep dive into how Snowflake is bringing large language models (LLMs) to data, rather than the other way around - prevalent in most enterprise data and AI ecosystems today.
Snowflake's Gen AI capabilities, bringing LLMs to Governed data
Continuing its tradition of providing a user friendly platform with state-of-the-art data processing and governance capabilities, Snowflake has rolled out its integrated Data & AI / Gen AI platform - illustrated in Fig. 3. Cortex AI is the Snowflake's Gen AI/LLM platform with Snowflake ML catering to the more traditional AI (data science / predictive analytics) capabilities.
We focus on Gen AI capabilities in this article, and show how easy it has become to build state-of-the-art LLM based use-cases on well governed and modeled enterprise data already present in Snowflake repositories.
Snowflake provides the full set of natural language processing (NLP) capabilities:
We deep-dive into the above 3 LLM capabilities in the sequel.
LLM functions for routine NLP tasks
Snowflake provides the following LLM functions for routine NLP tasks (link to the full documentation ): The functions are available as SQL functions and can also be invoked in Python.
COMPLETE function for User specified NLP tasks
The COMPLETE function is a general purpose LLM function to perform user specified tasks. Users can choose from a wide range of LLMs (Fig. 4), and the function generates responses based on a given prompt.
Below is an example the of the COMPLETE function call in SQL to analyze the sentiment of product reviews stored in CONTENT column of REVIEW_DATA table, and benchmark them with respect to manually assessed sentiments:
select CONTENT, SENTIMENT as original_sentiment,
SNOWFLAKE.CORTEX.COMPLETE (
‘llama2-70b-chat’, CONCAT('Check the column CONTENT and answer if the review is "positive" or "negative". Here is the product review: ', CONTENT)
from CORTEX_DB.REVIEW_TEST_DATASET.REVIEW_DATA;
This shows how easy it is to build custom NLP functions leveraging state-of-the-art LLMs using only SQL.
Fine-tune LLMs using the Snowflake AI & ML Studio
Finally, we discuss fine-tuning LLMs using enterprise data to build task specific LLMs. We know that foundational LLMs are pre-trained on public data. Fine-tuning provides the ability to contextualize LLMs with (and restrict their responses to) enterprise knowledge captured in the form of documents, wikis, business processes, etc.
Fine-tuning entails taking a pre-trained LLM, and re-training it with (smaller) enterprise data. Technically, this implies updating the weights of the last layer(s) of the trained neural network to reflect the enterprise data and task. So fine-tuning used to be a complex process restricted to technical and engineering teams.
Thankfully, Snowflake has democratized this process completely, where it is now possible to fine-tune state-of-the-art LLMs using their AI & ML Studio in a few clicks. Fig. 5 shows a snapshot of the Snowflake AI & ML Studio LLM fine-tuning entry screen:
followed by a guided set of steps to:
And, that's it - your LLM fine-tuning is in progress!
To conclude, the lines between Data and AI / Gen AI platforms are blurring, and companies like Snowflake are making it ever so easier to leverage Gen AI / LLM capabilities on secure and governed data already stored in Snowflake. So it is high time to give Snowflake's Cortex AI a shot for your strategic Gen AI use-cases.
VP Product & Engineering | Connecting The Dots in AI
3 周Nice one Debmalya
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
4 周Also, published in AI Advances now https://medium.com/ai-advances/unifying-data-gen-ai-llm-platforms-1c8284252824
Managing Consultant | Leadership | Change Transformation | Strategy | Portfolio/Program
1 个月Debmalya Biswas Appreciate you sharing Debmalya, Thank you.
A patent engineer, an engineer and an inventor. - THINKWARE Corporation General Manager (Leader of Intellectual Property Department)
1 个月?? ??? ???.
Driving Digital MarTech Innovation through AI | Head, Global MarTech practice @nagarro
1 个月Brilliant article Debmalya, so much to learn from you