Leveraging LLMs for Intuitive Interactions with Enterprise SQL Databases
Suman Biswas
Engineering Leadership, Emerging Tech & AI - Enterprise Architecture | Digital Strategy | Building Responsible AI Platform
The backbone of any enterprise is its data, and SQL databases have long been the standard for storing this invaluable asset. However, extracting insights from this data typically requires expertise in SQL query language, which can be a barrier for many business users. But what if you could just ask your database a question in natural language and get the answer? That's where Large Language Models (LLMs) come in, revolutionizing the way we interact with databases.
The Power of LLMs in SQL Interactions: LLMs have the extraordinary ability to understand and generate human-like text, making it possible to interact with SQL databases using natural language. This advancement opens the door for a wide range of users, from business analysts to executives, to engage with their data without the need to understand complex SQL syntax.
LangChain: Bridging Language and Databases: LangChain is at the forefront of this innovation, providing SQL Chains and Agents that can build and execute SQL queries based on natural language prompts. These tools harness the power of LLMs to translate your questions into executable SQL code, making data querying as simple as having a conversation.
Install necessary packages
!pip install -q langchain
!pip install -q openai
!pip install langchain_experimental
Set up the environment variable for OpenAI
import os
os.environ['OPENAI_API_KEY'] = 'sk
Import required libraries
import numpy as np
import pandas as pd
import sqlite3
df = pd.read_csv("sample_data/sales_data_sample.csv", encoding='ISO-8859-1')
df.head(10)
Read the sales data into a DataFrame
import sqlite3
import pandas as pd
df = pd.read_csv("sample_data/sales_data_sample.csv", encoding='ISO-8859-1')
# Connect to your SQLite database
conn = sqlite3.connect('llm_sql.sqlite')
# Create a cursor object
c = conn.cursor()
c.execute('CREATE TABLE IF NOT EXISTS sales_item (ORDERNUMBER int, QUANTITYORDERED int, PRICEEACH Float, ORDERLINENUMBER int, SALES float, STATUS text, CITY text, STATE text, COUNTRY text)')
conn.commit()
df.to_sql('sales_item', conn, if_exists='replace', index = False)
# Define your SQL query
query = '''
SELECT * FROM sales_item LIMIT 100
'''
# Execute the query
c.execute(query)
# Fetch the results (e.g., using fetchall())
results = c.fetchall()
# Close the cursor and the database connection
c.close()
conn.close()
# Now you can work with the 'results' variable, which contains the data
Define a function to run SQL queries
领英推荐
def sql_query(sql, db):
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute(sql)
rows = cur.fetchall()
for row in rows:
print(row)
conn.close()
Use the function to display the first 20 entries from 'sales_item' table
import sqlite3
sql_query('SELECT * FROM sales_item LIMIT 20;',
"llm_sql.sqlite")
Initialize the Langchain SQLDatabase and OpenAI language model
import os
from langchain.utilities import SQLDatabase
from langchain.llms import OpenAI
from langchain_experimental.sql import SQLDatabaseChain
Create a SQLDatabaseChain to interface with the language model
db = SQLDatabase.from_uri('sqlite:///llm_sql.sqlite')
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
Run natural language queries using the SQLDatabaseChain
db_chain.run("How many ORDERNUMBER are there?")
> Entering new SQLDatabaseChain chain...
How many ORDERNUMBER are there?
SQLQuery:SELECT COUNT(ORDERNUMBER) FROM sales_item;
SQLResult: [(2823,)]
Answer:There are 2823 ORDERNUMBER.
> Finished chain.
There are 2823 ORDERNUMBER.
db_chain.run("Could you provide me with the data on unique countries generating sales?")
Entering new SQLDatabaseChain chain...
Could you provide me with the data on unique countries generating sales?
SQLQuery:SELECT DISTINCT "COUNTRY" FROM sales_item LIMIT 5;
SQLResult: [('USA',), ('France',), ('Norway',), ('Australia',), ('Finland',)]
Answer:The countries generating sales are USA, France, Norway, Australia, and Finland.
> Finished chain.
The countries generating sales are USA, France, Norway, Australia, and Finland.
db_chain.run("Please provide the sales figures for motorcycles in the USA for the year 2003?")
Entering new SQLDatabaseChain chain...
Please provide the sales figures for motorcycles in the USA for the year 2003?
SQLQuery:SELECT SUM(SALES) FROM sales_item WHERE PRODUCTLINE = 'Motorcycles' AND COUNTRY = 'USA' AND YEAR_ID = 2003;
SQLResult: [(178108.95,)]
Answer:The total sales for motorcycles in the USA for the year 2003 is 178108.95.
> Finished chain.
The total sales for motorcycles in the USA for the year 2003 is 178108.95.
Complex Query
db_chain.run("Could you identify the country that had the third-highest motorcycle sales in the year 2004?")
Entering new SQLDatabaseChain chain...
Could you identify the country that had the third-highest motorcycle sales in the year 2004?
SQLQuery:SELECT COUNTRY, SUM(SALES) AS TOTAL_SALES FROM sales_item WHERE YEAR_ID = 2004 AND PRODUCTLINE = 'Motorcycles' GROUP BY COUNTRY ORDER BY TOTAL_SALES DESC LIMIT 3;
SQLResult: [('USA', 287243.08999999997), ('France', 95415.33999999998), ('Australia', 33123.969999999994)]
Answer:The country with the third-highest motorcycle sales in the year 2004 is Australia.
> Finished chain.
The country with the third-highest motorcycle sales in the year 2004 is Australia.
Building Waii's SQL AI Agent
1 年We just wrote a blog post on this, on the steps and requirements needed to build your own advanced text-to-SQL: https://medium.com/querymind/building-your-own-text-to-sql-steps-and-requirements-ab276826c882
Data Scientist | @ Inxite Out
1 年Thank you Suman, I am facing problem in connecting to mysql and using mysql.connector the use my database for this purpose. Because my imported database is not of SQLDatabase format so getting attribute error: no dialect attribute in database. Can you help?
Technology Leader | CTO | Fintech Engineer | Making credit accessible to financially marginalized groups in Africa..
1 年SQL has solidified its reputation as a formidable tool for data analysis. Today, large language models are reshaping our interaction with SQL databases, allowing us to converse with our data in natural language. Check out my latest Colab notebook to discover the full potential of conversational data analysis and how it can revolutionize your workflow. #DataAnalysis #naturallanguageprocessing #Innovation #LLM #chatgpt
Senior Product & Program Delivery Lead | Brewdat Data Platform| MBA IIM Lucknow| Certified Project Management Professional(PMP)| CSPO|
1 年Very useful and informative ????????
Engineering Leadership, Emerging Tech & AI - Enterprise Architecture | Digital Strategy | Building Responsible AI Platform
1 年Here's the Kaggle data for the article. Let me know if you encounter any code execution issues. https://www.kaggle.com/datasets/kyanyoga/sample-sales-data/