ChatGPT + LangChain = Natural Language Queries to your data
Maxim Pustovoi
Passionate software engineer with huge experience developing robust code for growing businesses from insurance/banking to space technologies.
Hey LinkedIn! ????
I've recently delved into the world of database querying using natural language, and let me tell you, it's a a lot of fun! ??
?? What's the Buzz?
I've been exploring the LangChain Python library, and it's blowing my mind! ???? This powerful tool allows you to interact with databases using your native language. Imagine asking complex questions, fetching data, and making updates, all with the simplicity of everyday language. ?????
Why NLQ?
?? My Experience So Far:
The most I am impressed by how little it takes to get it working. I've started with a few lines of code from example:
# import ...
model = ChatOpenAI(
openai_api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-3.5-turbo-16k-0613",
temperature=0.3,
)
db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
toolkit = SQLDatabaseToolkit(db=db, llm=model)
agent = create_sql_agent(
llm=model,
toolkit=toolkit,
verbose=True,
agent_type=AgentType.OPENAI_FUNCTIONS,
handle_parsing_errors=True,
)
response = agent.run("what is the oldest car in the country")
print(response)
After a few manual tests I was so excited with results so I wanted to demo it to everybody and therefor I've created a simple microservice in FastAPI and SPA in VueJS.
领英推荐
# import ...
app = FastAPI()
app.mount("/static", StaticFiles(directory="static", html=True), name="static")
@app.get('/')
def home():
return RedirectResponse("/static/index.html")
@app.get('/api/prompt', response_class=PlainTextResponse)
def prompt(text: str):
model = ChatOpenAI(
openai_api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-3.5-turbo-16k-0613",
temperature=0.3,
)
db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
toolkit = SQLDatabaseToolkit(db=db, llm=model)
agent = create_sql_agent(
llm=model,
toolkit=toolkit,
verbose=True,
agent_type=AgentType.OPENAI_FUNCTIONS,
handle_parsing_errors=True,
)
return agent.run(text)
As for database I've chosen a clickhouse and populated it with artifitial data.
-- DDL
create or replace table vehicle_owner_data(
id int,
name varchar(255),
address varchar(255),
city varchar(64),
state varchar(2),
zip varchar(8),
vin varchar(255),
model varchar(255),
make varchar(255),
year smallint
) engine MergeTree order by (name, address, city, state, zip, model, make);
-- DML
insert into vehicle_owner_data
(id, name, address, city, state, zip, vin, model, make, year)
values
(1, 'Charles K. Neff', '2756 Haymond Rocks Road', 'Dalles', 'OR', '97058', '1GNSKJKC4FR503624', 'RX', 'Lexus', 2007)
(2, 'Jeffrey C. Dileo', '2290 Nelm Street', 'Jackson', 'OR', '63755', '1FDKF37G2VEB88179', 'Veloster', 'Hyundai', 2015),
(3, 'John L. Gibson', '2619 School Street', 'Danbury', 'CT', '06810', '2C1MR5294T6721849', 'Matrix', 'Toyota', 2005),
(4, 'Gabriel L. Hernandez', '4215 Christie Way', 'Springfield', 'MA', '01109', 'KNDJB723415016964', 'Impreza', 'Subaru', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', '1FDKF37G2TEA10902', 'Malibu', 'Chevrolet', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', 'JTJJM7FX0E5079716', 'Nubira', 'Daewoo', 2009),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '3VW1K7AJ4CM316353', 'M3', 'BMW', 2004),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '1GKLRLED3AJ108466', 'M5', 'BMW', 2002),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', 'WBAEV53465KM40462', '607', 'Peugeot', 2003);
Conclusions: ??
Despite the first impression, during testing I noticed that the model does not always produce a desired output and sometimes hallucinates by producing pseudocode instead of the expected aggregations.
If this was a real product this bug probably would be compensated by more fine-tuning of the model context and adding filtering before displaying the results to the end user.
As the data structure becomes more complex (a couple of additional tables and connections between them were added), the quality of queries generated by the model from simple promts degrades greatly. The requirements for detailing prompts are increasing, as well as increasing requirements to the skills of person that writes these prompts forcing them to know internal structure of the database.
Final conclusion is now I am less stessed about LLMs taking my job, at current phase it's nothing else but a swiss knife that can help me to do my job faster and more efficient.
PS: This small research barely scratching the surface of what LLMs are capable of doing and what we are going to see in the future. I did it as pet project in order to get myself more familiar with langchain library and OpenAI API.
Link to github: https://github.com/mlivirov/pet-langchain
Founder @ Bridge2IT +32 471 26 11 22 | Business Analyst @ Carrefour Finance
10 个月The possibilities with AI seem endless. Thank you for sharing these intriguing developments! ????