Leveraging LLMs for Intuitive Interactions with Enterprise SQL Databases
Lam

Leveraging LLMs for Intuitive Interactions with Enterprise SQL Databases

The backbone of any enterprise is its data, and SQL databases have long been the standard for storing this invaluable asset. However, extracting insights from this data typically requires expertise in SQL query language, which can be a barrier for many business users. But what if you could just ask your database a question in natural language and get the answer? That's where Large Language Models (LLMs) come in, revolutionizing the way we interact with databases.

The Power of LLMs in SQL Interactions: LLMs have the extraordinary ability to understand and generate human-like text, making it possible to interact with SQL databases using natural language. This advancement opens the door for a wide range of users, from business analysts to executives, to engage with their data without the need to understand complex SQL syntax.

LangChain: Bridging Language and Databases: LangChain is at the forefront of this innovation, providing SQL Chains and Agents that can build and execute SQL queries based on natural language prompts. These tools harness the power of LLMs to translate your questions into executable SQL code, making data querying as simple as having a conversation.

Install necessary packages

!pip install -q langchain
!pip install -q openai
!pip install langchain_experimental        

Set up the environment variable for OpenAI

import os
os.environ['OPENAI_API_KEY'] = 'sk        

Import required libraries

import numpy as np
import pandas as pd
import sqlite3
df = pd.read_csv("sample_data/sales_data_sample.csv", encoding='ISO-8859-1')
df.head(10)        

Read the sales data into a DataFrame

import sqlite3
import pandas as pd

df = pd.read_csv("sample_data/sales_data_sample.csv", encoding='ISO-8859-1')

# Connect to your SQLite database
conn = sqlite3.connect('llm_sql.sqlite')
# Create a cursor object
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS sales_item (ORDERNUMBER int, QUANTITYORDERED int, PRICEEACH Float, ORDERLINENUMBER int, SALES float, STATUS text, CITY text, STATE text, COUNTRY text)')
conn.commit()

df.to_sql('sales_item', conn, if_exists='replace', index = False)

# Define your SQL query
query = '''
SELECT * FROM sales_item LIMIT 100
'''

# Execute the query
c.execute(query)

# Fetch the results (e.g., using fetchall())
results = c.fetchall()

# Close the cursor and the database connection
c.close()
conn.close()

# Now you can work with the 'results' variable, which contains the data        

Define a function to run SQL queries

def sql_query(sql, db):
    conn = sqlite3.connect(db)
    cur = conn.cursor()
    cur.execute(sql)
    rows = cur.fetchall()
    for row in rows:
        print(row)
    conn.close()        

Use the function to display the first 20 entries from 'sales_item' table

import sqlite3
sql_query('SELECT * FROM sales_item LIMIT 20;',
               "llm_sql.sqlite")        

Initialize the Langchain SQLDatabase and OpenAI language model

import os
from langchain.utilities import SQLDatabase
from langchain.llms import OpenAI
from langchain_experimental.sql import SQLDatabaseChain        

Create a SQLDatabaseChain to interface with the language model

db = SQLDatabase.from_uri('sqlite:///llm_sql.sqlite')
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)        

Run natural language queries using the SQLDatabaseChain

db_chain.run("How many ORDERNUMBER are there?")

> Entering new SQLDatabaseChain chain...
How many ORDERNUMBER are there?
SQLQuery:SELECT COUNT(ORDERNUMBER) FROM sales_item;
SQLResult: [(2823,)]
Answer:There are 2823 ORDERNUMBER.
> Finished chain.
There are 2823 ORDERNUMBER.        
db_chain.run("Could you provide me with the data on unique countries generating sales?")

Entering new SQLDatabaseChain chain...
Could you provide me with the data on unique countries generating sales?
SQLQuery:SELECT DISTINCT "COUNTRY" FROM sales_item LIMIT 5;
SQLResult: [('USA',), ('France',), ('Norway',), ('Australia',), ('Finland',)]
Answer:The countries generating sales are USA, France, Norway, Australia, and Finland.
> Finished chain.
The countries generating sales are USA, France, Norway, Australia, and Finland.        
db_chain.run("Please provide the sales figures for motorcycles in the USA for the year 2003?")

Entering new SQLDatabaseChain chain...
Please provide the sales figures for motorcycles in the USA for the year 2003?
SQLQuery:SELECT SUM(SALES) FROM sales_item WHERE PRODUCTLINE = 'Motorcycles' AND COUNTRY = 'USA' AND YEAR_ID = 2003;
SQLResult: [(178108.95,)]
Answer:The total sales for motorcycles in the USA for the year 2003 is 178108.95.
> Finished chain.
The total sales for motorcycles in the USA for the year 2003 is 178108.95.        

Complex Query

db_chain.run("Could you identify the country that had the third-highest motorcycle sales in the year 2004?")

Entering new SQLDatabaseChain chain...
Could you identify the country that had the third-highest motorcycle sales in the year 2004?
SQLQuery:SELECT COUNTRY, SUM(SALES) AS TOTAL_SALES FROM sales_item WHERE YEAR_ID = 2004 AND PRODUCTLINE = 'Motorcycles' GROUP BY COUNTRY ORDER BY TOTAL_SALES DESC LIMIT 3;
SQLResult: [('USA', 287243.08999999997), ('France', 95415.33999999998), ('Australia', 33123.969999999994)]
Answer:The country with the third-highest motorcycle sales in the year 2004 is Australia.
> Finished chain.
The country with the third-highest motorcycle sales in the year 2004 is Australia.        

Google Colab

Derek Chang

Building Waii's SQL AI Agent

1 年

We just wrote a blog post on this, on the steps and requirements needed to build your own advanced text-to-SQL: https://medium.com/querymind/building-your-own-text-to-sql-steps-and-requirements-ab276826c882

回复
Rupesh Sahu

Data Scientist | @ Inxite Out

1 年

Thank you Suman, I am facing problem in connecting to mysql and using mysql.connector the use my database for this purpose. Because my imported database is not of SQLDatabase format so getting attribute error: no dialect attribute in database. Can you help?

回复
Mwaka Ambrose

Technology Leader | CTO | Fintech Engineer | Making credit accessible to financially marginalized groups in Africa..

1 年

SQL has solidified its reputation as a formidable tool for data analysis. Today, large language models are reshaping our interaction with SQL databases, allowing us to converse with our data in natural language. Check out my latest Colab notebook to discover the full potential of conversational data analysis and how it can revolutionize your workflow. #DataAnalysis #naturallanguageprocessing #Innovation #LLM #chatgpt

Pritam Bhattacharya

Senior Product & Program Delivery Lead | Brewdat Data Platform| MBA IIM Lucknow| Certified Project Management Professional(PMP)| CSPO|

1 年

Very useful and informative ????????

Suman Biswas

Engineering Leadership, Emerging Tech & AI - Enterprise Architecture | Digital Strategy | Building Responsible AI Platform

1 年

Here's the Kaggle data for the article. Let me know if you encounter any code execution issues. https://www.kaggle.com/datasets/kyanyoga/sample-sales-data/

要查看或添加评论,请登录

Suman Biswas的更多文章

社区洞察

其他会员也浏览了