Chat with SQL: AI-Powered Natural Language to Database Queries
In AI-driven applications, natural language to SQL (NLP-to-SQL) is becoming an essential capability. While not a standalone solution, it serves as a core building block that can be seamlessly integrated into larger AI systems, enabling better data accessibility and automation.
In this article, we’ll explore how to create a powerful system that enables natural language queries on a relational SQL database using LlamaIndex and OpenAI’s GPT-4o-mini.
We’ll walk through the entire architecture, from database setup to query execution and formatted result display.
Introduction
Imagine asking questions about your SQL database in plain English and getting structured answers instantly. This is precisely what we achieve with LlamaIndex, OpenAI, and SQLAlchemy. The system automates SQL generation, execution, and response formatting, making data retrieval intuitive and accessible.
???High-Level Architecture
?? Setting Up the SQL Database
To enable natural language querying, we first need a structured relational database. We'll use SQLite for simplicity, but this approach works with PostgreSQL, MySQL, or any SQL database supported by SQLAlchemy.
Database Schema
The schema consists of six key tables:
Creating the Database with SQLAlchemy
We define the schema using SQLAlchemy and create the database.
??Import the required libraries:
??Define Table Schemas (breakdown of the function)
???Create the tables
?Insert sample data into Categories
?Insert Sample data into Customers table:
??Retrieve a category and customer for FK usage
?Insert sample data into Products
??Retrieve products for orders
?Insert sample Orders
?Insert Order Items for each order
?Insert Shipping info
?Insert a new order for testing pending deliveries
?? Close The session
?Create the DB by invoking the function created above
Your test Database is ready!
??Connecting LlamaIndex with Your SQL Database
Now that we have set up the SQLite database, the next step is to connect it to LlamaIndex. This will enable natural language queries on structured SQL tables.
To integrate LlamaIndex with our SQL database, we use the SQLDatabase class. It connects to our database and enables retrieval-augmented generation (RAG) for SQL queries.
1?? create_engine("sqlite:///ecommerce.db")
2?? SQLDatabase(engine, include_tables=[...])
LlamaIndex is connected to your DB now!
??Defining Table Schema & Initializing the Query Engine
Now that we've connected our SQLite database to LlamaIndex, the next step is to define table schemas. This ensures that the LLM (GPT-4o-mini) correctly understands the database structure and can generate accurate SQL queries.
?? What Happens in This Step?
Results are returned in a structured format.
Import the required libraries
?? Understanding Table Schema Definition
The SQLTableSchema objects help guide the LLM in generating correct SQL queries.
?? How It Works?
Example:
For orders table, the LLM now knows:
?? Without this schema, the LLM might make incorrect SQL assumptions.
领英推荐
Our solution now has the way of understanding of the Table Schema Definition!
????Creating the Object Index
This is a very important step to understand how the solution works.
Now, we map the table schemas into an object index.
?? What This Does?
?? With this index, we can now perform fast and accurate SQL retrieval.
??Explanation of How the Magic works:
A Vector Database stores high-dimensional representations (embeddings) of data and allows fast similarity-based retrieval. In our case, it helps the LLM retrieve the most relevant tables before generating an SQL query.
Instead of searching by exact keywords (like traditional databases), a VectorDB finds semantically similar matches—making it perfect for AI-driven applications like NLP-to-SQL.
?? How VectorDB is Used in NLP-to-SQL?
1?? Database Schema Representation:
2?? Query Execution Pipeline:
3?? LLM Uses Retrieval to Generate SQL:
Our solution now has the way of fetching the right set of tables that LLM needs to use for creating the SQL queries!
?Integrating OpenAI LLM for SQL Query Generation
Once the database schema is mapped with LlamaIndex, the next step is to integrate an LLM (Large Language Model) that can interpret natural language queries and generate accurate SQL statements. For this, we use OpenAI’s GPT-4o-mini, which provides a powerful text-to-SQL conversion capability.
??Initializing the Query Engine
Now, we will set up the query engine, which will:
How it works:
?? Now, the system is ready to handle natural language queries!
?? What we have achieved so far? A Recap...
? We defined SQL table schemas to guide LLM-generated queries.
? We created an object index to store table metadata.
? We initialized the query engine to process natural language queries into SQL.
?? Helper Function to Print Outputs
To ensure a clean and structured display, we use a helper function that:
We have a way to pretty print and understand Solution outputs!
??Moment of Truth!
?? Next Step: Querying the Database with Natural Language! ?? Shall we? ??
Querying the Database with Natural Language
Now that we've set up our SQL database, connected it to LlamaIndex, and configured our query engine, it’s time to see it in action!
In this section, we’ll:
?? Few Test Results
Now, let's put our AI-powered SQL query engine to the test! Below are some real natural language queries, the generated SQL, and the formatted results.
Each query demonstrates how LlamaIndex, OpenAI GPT-4o-mini, and SQLAlchemy work together to translate human language into SQL, execute it, and return structured results.
?? Here are a few test cases in action: ??
?? Summary
? Natural language queries are converted into SQL queries.
? The generated SQL is executed automatically.
? Formatted results are displayed in Markdown format.
?? Final Thoughts
This system eliminates the need to write SQL manually. It allows non-technical users to query databases naturally while ensuring accuracy, efficiency, and usability.
This approach is particularly useful for:
With LlamaIndex, OpenAI, and SQLAlchemy, we’ve built a scalable, intelligent, and accessible SQL chat interface.
AI/ML Computational Science Manager
3 周Update: I've made some updates to the code snippets to enhance accuracy and efficiency. Check out the revised sections on table retrieval and query execution.