登录查看更多内容

Chat with SQL: AI-Powered Natural Language to Database Queries

Rohit Sharma

AI/ML Computational Science Manager

发布日期: 2025年2月22日

In AI-driven applications, natural language to SQL (NLP-to-SQL) is becoming an essential capability. While not a standalone solution, it serves as a core building block that can be seamlessly integrated into larger AI systems, enabling better data accessibility and automation.

In this article, we’ll explore how to create a powerful system that enables natural language queries on a relational SQL database using LlamaIndex and OpenAI’s GPT-4o-mini.

We’ll walk through the entire architecture, from database setup to query execution and formatted result display.

Introduction

Imagine asking questions about your SQL database in plain English and getting structured answers instantly. This is precisely what we achieve with LlamaIndex, OpenAI, and SQLAlchemy. The system automates SQL generation, execution, and response formatting, making data retrieval intuitive and accessible.

???High-Level Architecture

Database Setup: A relational SQLite database is created with tables for customers, products, orders, and shipping.
LlamaIndex Integration: The database schema is mapped into an index, enabling natural language query interpretation.
OpenAI LLM: Queries are processed by GPT-4o-mini to generate accurate SQL commands.
SQL Execution: The generated SQL is executed using SQLAlchemy, retrieving the relevant data.

?? Setting Up the SQL Database

To enable natural language querying, we first need a structured relational database. We'll use SQLite for simplicity, but this approach works with PostgreSQL, MySQL, or any SQL database supported by SQLAlchemy.

Database Schema

The schema consists of six key tables:

customers – Stores customer details such as name, email, and registration date.
categories – Defines product categories.
products – Stores product details like name, price, stock levels, and category.
orders – Tracks transactions made by customers.
order_items – Links products to orders.
shipping – Stores shipping details for each order.

Creating the Database with SQLAlchemy

We define the schema using SQLAlchemy and create the database.

??Import the required libraries:

??Define Table Schemas (breakdown of the function)

???Create the tables

?Insert sample data into Categories

?Insert Sample data into Customers table:

??Retrieve a category and customer for FK usage

?Insert sample data into Products

??Retrieve products for orders

?Insert sample Orders

?Insert Order Items for each order

?Insert Shipping info

?Insert a new order for testing pending deliveries

?? Close The session

?Create the DB by invoking the function created above

Your test Database is ready!

??Connecting LlamaIndex with Your SQL Database

Now that we have set up the SQLite database, the next step is to connect it to LlamaIndex. This will enable natural language queries on structured SQL tables.

To integrate LlamaIndex with our SQL database, we use the SQLDatabase class. It connects to our database and enables retrieval-augmented generation (RAG) for SQL queries.

1?? create_engine("sqlite:///ecommerce.db")

Creates a connection to the SQLite database.
The engine allows us to interact with the database via SQLAlchemy.

2?? SQLDatabase(engine, include_tables=[...])

Loads only the necessary tables into LlamaIndex.
Improves query efficiency by limiting scope.

LlamaIndex is connected to your DB now!

??Defining Table Schema & Initializing the Query Engine

Now that we've connected our SQLite database to LlamaIndex, the next step is to define table schemas. This ensures that the LLM (GPT-4o-mini) correctly understands the database structure and can generate accurate SQL queries.

?? What Happens in This Step?

Table Schema Definition: We define metadata for each SQL table. This metadata includes column names, data types, and relationships. The LLM uses this schema to generate precise SQL queries.
Indexing Tables with LlamaIndex: We map the schema to LlamaIndex. This allows efficient table retrieval for relevant queries.
Creating the Query Engine: The LLM generates SQL from user queries. Queries are executed on the database.

Results are returned in a structured format.

Import the required libraries

?? Understanding Table Schema Definition

The SQLTableSchema objects help guide the LLM in generating correct SQL queries.

?? How It Works?

Each table is mapped with a name and detailed instructions.
The context_str provides:Column descriptionsRelationships (Foreign Keys)Joins & query instructions
This ensures that the LLM does not assume column names but instead follows correct SQL structure.

Example:

For orders table, the LLM now knows:

customer_id is a Foreign Key → must be joined with customers.id.
Use order_items to get products in an order.
Shipping details are in the shipping table → must join on orders.id = shipping.order_id.

?? Without this schema, the LLM might make incorrect SQL assumptions.

领英推荐

Generative AI Frameworks and Tools Every…

Pavan Belagatti 1 年前

Roadmap to Leveraging Generative AI in Data Science

Data Science Dojo 1 年前

Data Science in 2025: Skills, Tools, and Job Market…

Analytics Insight? 1 个月前

Our solution now has the way of understanding of the Table Schema Definition!

????Creating the Object Index

This is a very important step to understand how the solution works.

Now, we map the table schemas into an object index.

?? What This Does?

Creates an index of SQL tables for retrieval.
Stores metadata for efficient access.
Ensures table structures are correctly represented.

?? With this index, we can now perform fast and accurate SQL retrieval.

??Explanation of How the Magic works:

A Vector Database stores high-dimensional representations (embeddings) of data and allows fast similarity-based retrieval. In our case, it helps the LLM retrieve the most relevant tables before generating an SQL query.

Instead of searching by exact keywords (like traditional databases), a VectorDB finds semantically similar matches—making it perfect for AI-driven applications like NLP-to-SQL.

?? How VectorDB is Used in NLP-to-SQL?

1?? Database Schema Representation:

Each table schema (columns, relationships, metadata) is converted into embeddings (vector representations).
This allows for semantic search when retrieving relevant tables.

2?? Query Execution Pipeline:

The natural language query is also converted into a vector and compared against indexed table embeddings to retrieve the most relevant tables.

3?? LLM Uses Retrieval to Generate SQL:

The LLM receives only relevant tables instead of the entire schema.
This reduces confusion and improves SQL accuracy.
The retrieved schema guides the LLM in query generation.

Our solution now has the way of fetching the right set of tables that LLM needs to use for creating the SQL queries!

?Integrating OpenAI LLM for SQL Query Generation

Once the database schema is mapped with LlamaIndex, the next step is to integrate an LLM (Large Language Model) that can interpret natural language queries and generate accurate SQL statements. For this, we use OpenAI’s GPT-4o-mini, which provides a powerful text-to-SQL conversion capability.

??Initializing the Query Engine

Now, we will set up the query engine, which will:

Retrieve relevant tables based on user queries.
Generate SQL queries using OpenAI LLM.
Execute SQL and return results.

How it works:

SQLTableRetrieverQueryEngine(sql_db, ...)Connects the LLM with the SQL database.Generates SQL queries from natural language.
obj_index.as_retriever(similarity_top_k=1)Retrieves the most relevant tables based on query.
llm=llmUses OpenAI GPT-4o-mini to generate SQL commands.

?? Now, the system is ready to handle natural language queries!

?? What we have achieved so far? A Recap...

? We defined SQL table schemas to guide LLM-generated queries.

? We created an object index to store table metadata.

? We initialized the query engine to process natural language queries into SQL.

?? Helper Function to Print Outputs

To ensure a clean and structured display, we use a helper function that:

Executes the query.
Prints the generated SQL query.
Displays the query results in Markdown format for better readability.

Helper function to print the SQL query and Responses

We have a way to pretty print and understand Solution outputs!

??Moment of Truth!

?? Next Step: Querying the Database with Natural Language! ?? Shall we? ??

Querying the Database with Natural Language

Now that we've set up our SQL database, connected it to LlamaIndex, and configured our query engine, it’s time to see it in action!

In this section, we’ll:

Run natural language queries against the database.
Generate the corresponding SQL queries.
Execute the queries and print formatted results.

?? Few Test Results

Now, let's put our AI-powered SQL query engine to the test! Below are some real natural language queries, the generated SQL, and the formatted results.

Each query demonstrates how LlamaIndex, OpenAI GPT-4o-mini, and SQLAlchemy work together to translate human language into SQL, execute it, and return structured results.

?? Here are a few test cases in action: ??

?? Summary

? Natural language queries are converted into SQL queries.

? The generated SQL is executed automatically.

? Formatted results are displayed in Markdown format.

?? Final Thoughts

This system eliminates the need to write SQL manually. It allows non-technical users to query databases naturally while ensuring accuracy, efficiency, and usability.

This approach is particularly useful for:

Business analysts who want quick insights from databases.
Customer service teams tracking orders without SQL knowledge.
E-commerce managers monitoring sales trends.

With LlamaIndex, OpenAI, and SQLAlchemy, we’ve built a scalable, intelligent, and accessible SQL chat interface.

Rohit Sharma

AI/ML Computational Science Manager

3 周

Update: I've made some updates to the code snippets to enhance accuracy and efficiency. Check out the revised sections on table retrieval and query execution.

要查看或添加评论，请登录

Rohit Sharma的更多文章

Most Chatbot Suck (And It’s Not Because of the LLM)

2025年2月27日

Most Chatbot Suck (And It’s Not Because of the LLM)

It’s easy to plug in an API and get a chatbot to respond. In fact with the level of abstraction available today –…
DeepSeek-R1: Reasoning Capability with Reinforcement Learning

2025年1月25日

DeepSeek-R1: Reasoning Capability with Reinforcement Learning

I’ve been reading DeepSeek’s paper and what strikes me the most isn’t just the technical bits but the way these guys…

2 条评论
My Experiment with Neo4j and the Power of Graph-Based RAG

2025年1月19日

My Experiment with Neo4j and the Power of Graph-Based RAG

Semantic search and knowledge graphs are two pre-dominant paradigms for working with knowledge sources. Semantic search…

2 条评论
Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

2025年1月14日

Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

If you think all LLMs are the same - think again. Every time I find something new when I deep dive into a new…

1 条评论
Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

2024年12月8日

Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

As I wrap up another weekend tinkering with agentic AI - one thing is clear: frameworks and hype aside building…
LangChain Templates - Turbocharging AI Development

2024年4月27日

LangChain Templates - Turbocharging AI Development

Now quite a while ago – Langchain released “LangChain templates” which I think are going to revolutionize the AI app…

See all articles

Introduction

???High-Level Architecture

?? Setting Up the SQL Database

Database Schema

Creating the Database with SQLAlchemy

??Import the required libraries:

??Define Table Schemas (breakdown of the function)

???Create the tables

?Insert sample data into Categories

?Insert Sample data into Customers table:

??Retrieve a category and customer for FK usage

?Insert sample data into Products

??Retrieve products for orders

?Insert sample Orders

?Insert Order Items for each order

?Insert Shipping info

?Insert a new order for testing pending deliveries

?? Close The session

?Create the DB by invoking the function created above

??Connecting LlamaIndex with Your SQL Database

??Defining Table Schema & Initializing the Query Engine

Import the required libraries

?? Understanding Table Schema Definition

?? How It Works?

领英推荐

????Creating the Object Index

?? What This Does?

?? How VectorDB is Used in NLP-to-SQL?

?Integrating OpenAI LLM for SQL Query Generation

??Initializing the Query Engine

How it works:

?? What we have achieved so far? A Recap...

?? Helper Function to Print Outputs

??Moment of Truth!

Querying the Database with Natural Language

?? Few Test Results

?? Summary

?? Final Thoughts

Rohit Sharma的更多文章

Most Chatbot Suck (And It’s Not Because of the LLM)

DeepSeek-R1: Reasoning Capability with Reinforcement Learning

My Experiment with Neo4j and the Power of Graph-Based RAG

Revolutionizing AI with Jamba: The Cost-Effective Game-Changer for Long Contexts

Agentic AI Is Not a Magic Pill - Here’s Why Hard Work and Skills Still Matter

LangChain Templates - Turbocharging AI Development

社区洞察

其他会员也浏览了

Real-time Sentiment Analysis System: Social Media Post HLD & LLD

25 Powerful Resources: What Are Some Popular Libraries And Tools Used In Data Science?

Data Science Technologies

Mastering Azure AI Foundry: Bridging the Gap Between Natural Language and SQL

Unlocking the Power of Data: A Comprehensive Guide to Our Data Science Course

SQL vs. NoSQL for AI Agents and Real-Time Generative AI Applications

Conversational BI with Snowflake's Cortex Analyst

Timescale Newsletter ?? Shaping the Future of Development

Text-to-SQL Generation: A Deep Dive

What Will I Learn in the Data Science Course?