Conversational BI with Snowflake's Cortex Analyst
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
I have previously written about Conversational BI and the challenges in realizing them. With large language models (LLMs), we seem to have reached a tipping point. I recently had a chance to try Snowflake’s new BI Conversational Assistant called Cortex Analyst and I will highlight the differentiating features of Analyst’s approach – on why I think it just might work this time.
Text-to-SQL Challenges
Let’s focus on structured data, relational data to be precise. This forms the underlying storage format for most of the business intelligence (BI) world. The predominant language to interact with such storage platforms is SQL. So we are talking about translating a natural language query (NLQ) to SQL in this article, also known as text-to-SQL, or a natural language interface to databases (NLIDB).
For example, let us consider a Country table with Language and Population details — illustrative schema below:
Country table: Country ID | Name | Language | Population Count
NLQ1: Which country has the maximum population count?
SQL1: Select Name, max([Population Count]) from Country;
At the core of most natural language Q&A systems, is a natural language understanding (NLU) ?module that is trying to understand the NLQ’s intent by extracting and classifying the ‘utterances’. In simple words, one can think of utterances as the key phrases in the sentence, e.g., country, maximum, population, count.
The next step is to generate the corresponding SQL query based on this information. So we need a transformation / mapping logic to map ‘country’ to the ‘Country’ table (the table to be queried), ‘maximum’ to the MAX SQL function, ‘population count’ to the column ‘Population Count’. And, this is where things start to get challenging.
Mapping NLQ utterances to the right SQL operators, esp., in determining if an utterance corresponds to a table, column, primary / foreign key, SQL operator, in the first place —is non-trivial. For example, without any inherent knowledge of the database schema, it is very difficult for the mapping logic to determine that the ‘count’ in this case refers to the column ‘population count’, and not the SQL function COUNT. The problem gets amplified for complex queries, e.g.,
NLQ2: Which language is spoken by maximum number of countries?
whose SQL translation would involve both the SQL functions: MAX & COUNT.
To summarize, text-to-SQL solutions do not generalize well to new domains, and a very good understanding of the underlying data / metadata is needed for it to work efficiently on complex datasets.
Snowflake’s Cortex Analyst
The goal of a Conversational BI Assistant is to empower business users so that they can query the data repositories themselves, without any dependency on data engineering teams. And, this is where Snowflake has taken a very user-centric view of enabling BI conversations seamlessly with at the following three differentiating features:
领英推荐
-?????? User intent validation and query explanation
-?????? Lightweight semantic model
-?????? Flexibility with respect to the underlying LLM
Cortex Analyst, together with Cortex Copilot, are Snowflake’s latest additions to their Cortex Generative AI platform - to provide the conversational BI capability. While Cortex Copilot is targeted more towards developers, Cortex Analyst is the one we will be talking about in this article focusing on business users.
To elaborate, Cortex Analyst is actually an end-to-end text-to-answer solution as it returns the generated SQL as well, providing the final query response. It is very easy to deploy – available as an API, and Snowflake also provides a simple Streamlit application that can be deployed with a few lines of code. As such, it is very use to deploy and pilot / use.
We all know that LLMs hallucinate, so the first (and best) thing to do is to validate the system’s understanding of a given query with the user – before responding with the final answer. Cortex Analyst enables this by having a dialogue with the user, where for any given query, it first presents its understanding of the query to the user with an explanation of how it generated the SQL query. In addition, it also provides suggestions to make the query more concrete in case of ambiguities – illustrated in the below figure. This user validation helps in significantly improving the correctness and accuracy of the final answer.
Secondly, Cortex Analyst addresses the data repository metadata mapping problem highlighted earlier with the help of a semantic model. The semantic model is the bridge mapping the domain or business specific terms used by users to the database schemas. These additional semantic details, like more descriptive names or synonyms, enable Cortex Analyst to answer natural language queries more reliably.
Overall, the semantic model can be considered as a logical layer providing a simplified view of a physical database table or view. Snowflake makes it very easy to create the semantic model with the help of a generator tool and is stored in an easily accessible YAML file. It is important to note that the semantic model also allows the provision of storing queries. This acts as a verified query repository (VQR), capturing user queries that have worked correctly in the past - providing a query cache that helps in further improving accuracy and trustworthiness of the results.
Finally, let’s delve into the flexibility that Snowflake provides with respect to choosing the underlying LLM powering Cortex Analyst. As we all know, the LLM landscape and leaderboard is continuously evolving with new and (more powerful) LLMs getting released regularly. So choice is critical in such fast-moving technology landscapes. By default, Cortex Analyst leverages Snowflake-hosted Cortex LLMs, which have been heavily fine-tuned for text-to-SQL generation tasks and are one of the most powerful LLMs available today. It is however possible to explicitly opt-in to allow Cortex Analyst to use the latest OpenAI GPT models, hosted by Microsoft Azure, alongside the Snowflake-hosted models. At runtime, Cortex Analyst will select the optimal combination of models to ensure the highest accuracy and performance for each query.
To conclude, Cortex Analyst is a very promising Conversational BI Assistant ready to be used today (in public preview). I highly recommend giving it a try!
#sponsored
Transformational AI Coach | Helping Busy Professionals Skyrocket Productivity & Live Stress-Free Lives with AI | Efficiency & Time Optimization Expert | AI Tools Strategist | Speaker & Mentor | Top AI Voice
1 个月It's great to hear about your experience with Snowflake's new BI Conversational Assistant, Debmalya Biswas. Your insights into how it enables hashtag#Business users to query data repositories themselves are truly valuable. Looking forward to reading your detailed article on Cortex Analyst.
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1 个月The detailed technical article is now available at AI Advances Conversational BI: Text2SQL https://ai.gopubby.com/conversational-bi-text2sql-3ae203683a2e
Principal Consultant - AI Product Solutions | GenAI, LLM, Python | CV, AR, Sustainability Enthusiast | Technologist | Thinker | Systemic Designer | Social Entrepreneur
1 个月I agree with the Text2SQL has potential for enterprises. Even for many large applications without enprising nature. I tried my hands with two conversational apps to utilize Text2SQL, and in one of them using Snowflake. However, I'm not much impressed with snowflakes genAI features yet...they r more on showoff time being, and promoted campaigns it seems. They are better as core data lakes. My personal experience.
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1 个月Linking to Snowflake's Cortex Analyst https://www.snowflake.com/en/blog/cortex-analyst-ai-self-service-analytics/