登录查看更多内容

ChatGPT + LangChain = Natural Language Queries to your data

Maxim Pustovoi

Passionate software engineer with huge experience developing robust code for growing businesses from insurance/banking to space technologies.

发布日期: 2024年1月23日

Hey LinkedIn! ????

I've recently delved into the world of database querying using natural language, and let me tell you, it's a a lot of fun! ??

?? What's the Buzz?

I've been exploring the LangChain Python library, and it's blowing my mind! ???? This powerful tool allows you to interact with databases using your native language. Imagine asking complex questions, fetching data, and making updates, all with the simplicity of everyday language. ?????

Why NLQ?

Simplicity: No need to master complex query languages. Just speak or write your query in the language you're comfortable with.
Efficiency: Streamline your workflow by cutting down the time spent crafting intricate queries. Get the data you need, faster.
Accessibility: Make database interactions accessible to non-technical stakeholders. Break down communication barriers and empower everyone to harness the power of data.

?? My Experience So Far:

The most I am impressed by how little it takes to get it working. I've started with a few lines of code from example:

# import ...
 
model = ChatOpenAI(
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-3.5-turbo-16k-0613",
        temperature=0.3,
    )

db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
toolkit = SQLDatabaseToolkit(db=db, llm=model)

agent = create_sql_agent(
    llm=model,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    handle_parsing_errors=True,
)

response = agent.run("what is the oldest car in the country")
print(response)

After a few manual tests I was so excited with results so I wanted to demo it to everybody and therefor I've created a simple microservice in FastAPI and SPA in VueJS.

领英推荐

OpenAI Hype Cycle

AIM 1 年前

Build RAG applications using only APIs with Postman! ??

Clarifai 6 个月前

Building Retrieval Augmented Generation (RAG) from…

Saurav Prateek 3 个月前

# import ...

app = FastAPI()

app.mount("/static", StaticFiles(directory="static", html=True), name="static")


@app.get('/')
def home():
    return RedirectResponse("/static/index.html")


@app.get('/api/prompt', response_class=PlainTextResponse)
def prompt(text: str):
    model = ChatOpenAI(
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-3.5-turbo-16k-0613",
        temperature=0.3,
    )

    db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
    toolkit = SQLDatabaseToolkit(db=db, llm=model)

    agent = create_sql_agent(
        llm=model,
        toolkit=toolkit,
        verbose=True,
        agent_type=AgentType.OPENAI_FUNCTIONS,
        handle_parsing_errors=True,
    )

    return agent.run(text)

As for database I've chosen a clickhouse and populated it with artifitial data.

-- DDL
create or replace table vehicle_owner_data(
    id int,
    name varchar(255),
    address varchar(255),
    city varchar(64),
    state varchar(2),
    zip varchar(8),
    vin varchar(255),
    model varchar(255),
    make varchar(255),
    year smallint
) engine MergeTree order by (name, address, city, state, zip, model, make);

-- DML
insert into vehicle_owner_data
    (id, name, address, city, state, zip, vin, model, make, year)
values
(1, 'Charles K. Neff', '2756 Haymond Rocks Road', 'Dalles', 'OR', '97058', '1GNSKJKC4FR503624', 'RX', 'Lexus', 2007)
(2, 'Jeffrey C. Dileo', '2290 Nelm Street', 'Jackson', 'OR', '63755', '1FDKF37G2VEB88179', 'Veloster', 'Hyundai', 2015),
(3, 'John L. Gibson', '2619 School Street', 'Danbury', 'CT', '06810', '2C1MR5294T6721849', 'Matrix', 'Toyota', 2005),
(4, 'Gabriel L. Hernandez', '4215 Christie Way', 'Springfield', 'MA', '01109', 'KNDJB723415016964', 'Impreza', 'Subaru', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', '1FDKF37G2TEA10902', 'Malibu', 'Chevrolet', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', 'JTJJM7FX0E5079716', 'Nubira', 'Daewoo', 2009),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '3VW1K7AJ4CM316353', 'M3', 'BMW', 2004),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '1GKLRLED3AJ108466', 'M5', 'BMW', 2002),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', 'WBAEV53465KM40462', '607', 'Peugeot', 2003);

Conclusions: ??

Despite the first impression, during testing I noticed that the model does not always produce a desired output and sometimes hallucinates by producing pseudocode instead of the expected aggregations.

If this was a real product this bug probably would be compensated by more fine-tuning of the model context and adding filtering before displaying the results to the end user.

As the data structure becomes more complex (a couple of additional tables and connections between them were added), the quality of queries generated by the model from simple promts degrades greatly. The requirements for detailing prompts are increasing, as well as increasing requirements to the skills of person that writes these prompts forcing them to know internal structure of the database.

Final conclusion is now I am less stessed about LLMs taking my job, at current phase it's nothing else but a swiss knife that can help me to do my job faster and more efficient.

PS: This small research barely scratching the surface of what LLMs are capable of doing and what we are going to see in the future. I did it as pet project in order to get myself more familiar with langchain library and OpenAI API.

Link to github: https://github.com/mlivirov/pet-langchain

ChatGPT + LangChain = Natural Language Queries to your data

Maxim Pustovoi

Passionate software engineer with huge experience developing robust code for growing businesses from insurance/banking to space technologies.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

Echoes of the Forgotten Code: 21K Codebase Challenge – From GPT-3.5 to Google Gemini, Who Remembers Best?

High Fidelity Retrieval Augmented Generation (RAG) with Meta Llama 3.1 at PubNub

The Power of Language Models & How to Communicate With Them

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

LangChain's Importance in Building RAG Systems for LLMs

Re-Introducing DSPyGen: A Revolutionary Approach to AI Development

Analytics projects with source code; Top AI tools for you; Data Science Cheat Sheet; Become a Data Analyst in 90 Days.

领英推荐

Revolutionizing Data Documentation with OpenMetadata and Large Language Models

2024年2月13日

社区洞察

其他会员也浏览了

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

Echoes of the Forgotten Code: 21K Codebase Challenge – From GPT-3.5 to Google Gemini, Who Remembers Best?

High Fidelity Retrieval Augmented Generation (RAG) with Meta Llama 3.1 at PubNub

The Power of Language Models & How to Communicate With Them

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

LangChain's Importance in Building RAG Systems for LLMs

Re-Introducing DSPyGen: A Revolutionary Approach to AI Development

Analytics projects with source code; Top AI tools for you; Data Science Cheat Sheet; Become a Data Analyst in 90 Days.