Chat with SQL: AI-Powered Natural Language to Database Queries

Chat with SQL: AI-Powered Natural Language to Database Queries

In AI-driven applications, natural language to SQL (NLP-to-SQL) is becoming an essential capability. While not a standalone solution, it serves as a core building block that can be seamlessly integrated into larger AI systems, enabling better data accessibility and automation.

In this article, we’ll explore how to create a powerful system that enables natural language queries on a relational SQL database using LlamaIndex and OpenAI’s GPT-4o-mini.

We’ll walk through the entire architecture, from database setup to query execution and formatted result display.

Introduction

Imagine asking questions about your SQL database in plain English and getting structured answers instantly. This is precisely what we achieve with LlamaIndex, OpenAI, and SQLAlchemy. The system automates SQL generation, execution, and response formatting, making data retrieval intuitive and accessible.


???High-Level Architecture

  1. Database Setup: A relational SQLite database is created with tables for customers, products, orders, and shipping.
  2. LlamaIndex Integration: The database schema is mapped into an index, enabling natural language query interpretation.
  3. OpenAI LLM: Queries are processed by GPT-4o-mini to generate accurate SQL commands.
  4. SQL Execution: The generated SQL is executed using SQLAlchemy, retrieving the relevant data.


?? Setting Up the SQL Database

To enable natural language querying, we first need a structured relational database. We'll use SQLite for simplicity, but this approach works with PostgreSQL, MySQL, or any SQL database supported by SQLAlchemy.

Database Schema

The schema consists of six key tables:

  • customers – Stores customer details such as name, email, and registration date.
  • categories – Defines product categories.
  • products – Stores product details like name, price, stock levels, and category.
  • orders – Tracks transactions made by customers.
  • order_items – Links products to orders.
  • shipping – Stores shipping details for each order.

Creating the Database with SQLAlchemy

We define the schema using SQLAlchemy and create the database.

??Import the required libraries:

??Define Table Schemas (breakdown of the function)

???Create the tables

?Insert sample data into Categories


?Insert Sample data into Customers table:


??Retrieve a category and customer for FK usage

?Insert sample data into Products

??Retrieve products for orders

?Insert sample Orders

?Insert Order Items for each order

?Insert Shipping info

?Insert a new order for testing pending deliveries

?? Close The session

?Create the DB by invoking the function created above


Your test Database is ready!

??Connecting LlamaIndex with Your SQL Database

Now that we have set up the SQLite database, the next step is to connect it to LlamaIndex. This will enable natural language queries on structured SQL tables.

To integrate LlamaIndex with our SQL database, we use the SQLDatabase class. It connects to our database and enables retrieval-augmented generation (RAG) for SQL queries.

1?? create_engine("sqlite:///ecommerce.db")

  • Creates a connection to the SQLite database.
  • The engine allows us to interact with the database via SQLAlchemy.

2?? SQLDatabase(engine, include_tables=[...])

  • Loads only the necessary tables into LlamaIndex.
  • Improves query efficiency by limiting scope.

LlamaIndex is connected to your DB now!

??Defining Table Schema & Initializing the Query Engine

Now that we've connected our SQLite database to LlamaIndex, the next step is to define table schemas. This ensures that the LLM (GPT-4o-mini) correctly understands the database structure and can generate accurate SQL queries.

?? What Happens in This Step?

  1. Table Schema Definition: We define metadata for each SQL table. This metadata includes column names, data types, and relationships. The LLM uses this schema to generate precise SQL queries.
  2. Indexing Tables with LlamaIndex: We map the schema to LlamaIndex. This allows efficient table retrieval for relevant queries.
  3. Creating the Query Engine: The LLM generates SQL from user queries. Queries are executed on the database.

Results are returned in a structured format.

Import the required libraries


?? Understanding Table Schema Definition

The SQLTableSchema objects help guide the LLM in generating correct SQL queries.

?? How It Works?

  1. Each table is mapped with a name and detailed instructions.
  2. The context_str provides:Column descriptionsRelationships (Foreign Keys)Joins & query instructions
  3. This ensures that the LLM does not assume column names but instead follows correct SQL structure.

Example:

For orders table, the LLM now knows:

  • customer_id is a Foreign Key → must be joined with customers.id.
  • Use order_items to get products in an order.
  • Shipping details are in the shipping table → must join on orders.id = shipping.order_id.

?? Without this schema, the LLM might make incorrect SQL assumptions.

Our solution now has the way of understanding of the Table Schema Definition!

????Creating the Object Index

This is a very important step to understand how the solution works.

Now, we map the table schemas into an object index.

?? What This Does?

  • Creates an index of SQL tables for retrieval.
  • Stores metadata for efficient access.
  • Ensures table structures are correctly represented.

?? With this index, we can now perform fast and accurate SQL retrieval.


??Explanation of How the Magic works:

A Vector Database stores high-dimensional representations (embeddings) of data and allows fast similarity-based retrieval. In our case, it helps the LLM retrieve the most relevant tables before generating an SQL query.

Instead of searching by exact keywords (like traditional databases), a VectorDB finds semantically similar matches—making it perfect for AI-driven applications like NLP-to-SQL.


?? How VectorDB is Used in NLP-to-SQL?

1?? Database Schema Representation:

  • Each table schema (columns, relationships, metadata) is converted into embeddings (vector representations).
  • This allows for semantic search when retrieving relevant tables.

2?? Query Execution Pipeline:

  • The natural language query is also converted into a vector and compared against indexed table embeddings to retrieve the most relevant tables.

3?? LLM Uses Retrieval to Generate SQL:

  • The LLM receives only relevant tables instead of the entire schema.
  • This reduces confusion and improves SQL accuracy.
  • The retrieved schema guides the LLM in query generation.

Our solution now has the way of fetching the right set of tables that LLM needs to use for creating the SQL queries!

?Integrating OpenAI LLM for SQL Query Generation

Once the database schema is mapped with LlamaIndex, the next step is to integrate an LLM (Large Language Model) that can interpret natural language queries and generate accurate SQL statements. For this, we use OpenAI’s GPT-4o-mini, which provides a powerful text-to-SQL conversion capability.


??Initializing the Query Engine

Now, we will set up the query engine, which will:

  1. Retrieve relevant tables based on user queries.
  2. Generate SQL queries using OpenAI LLM.
  3. Execute SQL and return results.

How it works:

  • SQLTableRetrieverQueryEngine(sql_db, ...)Connects the LLM with the SQL database.Generates SQL queries from natural language.
  • obj_index.as_retriever(similarity_top_k=1)Retrieves the most relevant tables based on query.
  • llm=llmUses OpenAI GPT-4o-mini to generate SQL commands.


Initializing the Query Engine


?? Now, the system is ready to handle natural language queries!

?? What we have achieved so far? A Recap...

? We defined SQL table schemas to guide LLM-generated queries.

? We created an object index to store table metadata.

? We initialized the query engine to process natural language queries into SQL.


?? Helper Function to Print Outputs

To ensure a clean and structured display, we use a helper function that:

  • Executes the query.
  • Prints the generated SQL query.
  • Displays the query results in Markdown format for better readability.

Helper function to print the SQL query and Responses


We have a way to pretty print and understand Solution outputs!

??Moment of Truth!

?? Next Step: Querying the Database with Natural Language! ?? Shall we? ??


Querying the Database with Natural Language

Now that we've set up our SQL database, connected it to LlamaIndex, and configured our query engine, it’s time to see it in action!

In this section, we’ll:

  1. Run natural language queries against the database.
  2. Generate the corresponding SQL queries.
  3. Execute the queries and print formatted results.


?? Few Test Results

Now, let's put our AI-powered SQL query engine to the test! Below are some real natural language queries, the generated SQL, and the formatted results.

Each query demonstrates how LlamaIndex, OpenAI GPT-4o-mini, and SQLAlchemy work together to translate human language into SQL, execute it, and return structured results.

?? Here are a few test cases in action: ??


?? Summary

? Natural language queries are converted into SQL queries.

? The generated SQL is executed automatically.

? Formatted results are displayed in Markdown format.


?? Final Thoughts

This system eliminates the need to write SQL manually. It allows non-technical users to query databases naturally while ensuring accuracy, efficiency, and usability.

This approach is particularly useful for:

  • Business analysts who want quick insights from databases.
  • Customer service teams tracking orders without SQL knowledge.
  • E-commerce managers monitoring sales trends.

With LlamaIndex, OpenAI, and SQLAlchemy, we’ve built a scalable, intelligent, and accessible SQL chat interface.


Rohit Sharma

AI/ML Computational Science Manager

3 周

Update: I've made some updates to the code snippets to enhance accuracy and efficiency. Check out the revised sections on table retrieval and query execution.

回复

要查看或添加评论,请登录

Rohit Sharma的更多文章

社区洞察

其他会员也浏览了