ChatGPT + LangChain = Natural Language Queries to your data

ChatGPT + LangChain = Natural Language Queries to your data

Hey LinkedIn! ????

I've recently delved into the world of database querying using natural language, and let me tell you, it's a a lot of fun! ??

?? What's the Buzz?

I've been exploring the LangChain Python library, and it's blowing my mind! ???? This powerful tool allows you to interact with databases using your native language. Imagine asking complex questions, fetching data, and making updates, all with the simplicity of everyday language. ?????

Why NLQ?

  1. Simplicity: No need to master complex query languages. Just speak or write your query in the language you're comfortable with.
  2. Efficiency: Streamline your workflow by cutting down the time spent crafting intricate queries. Get the data you need, faster.
  3. Accessibility: Make database interactions accessible to non-technical stakeholders. Break down communication barriers and empower everyone to harness the power of data.

?? My Experience So Far:

The most I am impressed by how little it takes to get it working. I've started with a few lines of code from example:

# import ...
 
model = ChatOpenAI(
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-3.5-turbo-16k-0613",
        temperature=0.3,
    )

db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
toolkit = SQLDatabaseToolkit(db=db, llm=model)

agent = create_sql_agent(
    llm=model,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    handle_parsing_errors=True,
)

response = agent.run("what is the oldest car in the country")
print(response)        

After a few manual tests I was so excited with results so I wanted to demo it to everybody and therefor I've created a simple microservice in FastAPI and SPA in VueJS.

# import ...

app = FastAPI()

app.mount("/static", StaticFiles(directory="static", html=True), name="static")


@app.get('/')
def home():
    return RedirectResponse("/static/index.html")


@app.get('/api/prompt', response_class=PlainTextResponse)
def prompt(text: str):
    model = ChatOpenAI(
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-3.5-turbo-16k-0613",
        temperature=0.3,
    )

    db = SQLDatabase.from_uri(os.getenv("DATABASE_URI"))
    toolkit = SQLDatabaseToolkit(db=db, llm=model)

    agent = create_sql_agent(
        llm=model,
        toolkit=toolkit,
        verbose=True,
        agent_type=AgentType.OPENAI_FUNCTIONS,
        handle_parsing_errors=True,
    )

    return agent.run(text)        

As for database I've chosen a clickhouse and populated it with artifitial data.

-- DDL
create or replace table vehicle_owner_data(
    id int,
    name varchar(255),
    address varchar(255),
    city varchar(64),
    state varchar(2),
    zip varchar(8),
    vin varchar(255),
    model varchar(255),
    make varchar(255),
    year smallint
) engine MergeTree order by (name, address, city, state, zip, model, make);

-- DML
insert into vehicle_owner_data
    (id, name, address, city, state, zip, vin, model, make, year)
values
(1, 'Charles K. Neff', '2756 Haymond Rocks Road', 'Dalles', 'OR', '97058', '1GNSKJKC4FR503624', 'RX', 'Lexus', 2007)
(2, 'Jeffrey C. Dileo', '2290 Nelm Street', 'Jackson', 'OR', '63755', '1FDKF37G2VEB88179', 'Veloster', 'Hyundai', 2015),
(3, 'John L. Gibson', '2619 School Street', 'Danbury', 'CT', '06810', '2C1MR5294T6721849', 'Matrix', 'Toyota', 2005),
(4, 'Gabriel L. Hernandez', '4215 Christie Way', 'Springfield', 'MA', '01109', 'KNDJB723415016964', 'Impreza', 'Subaru', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', '1FDKF37G2TEA10902', 'Malibu', 'Chevrolet', 1995),
(5, 'Stephen J. Knouse', '725 Upton Avenue', 'Windham', 'MA', '04062', 'JTJJM7FX0E5079716', 'Nubira', 'Daewoo', 2009),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '3VW1K7AJ4CM316353', 'M3', 'BMW', 2004),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', '1GKLRLED3AJ108466', 'M5', 'BMW', 2002),
(6, 'Louis M. Clifford', '3599 Public Works Drive', 'Johnson', 'MA', '37601', 'WBAEV53465KM40462', '607', 'Peugeot', 2003);        


Conclusions: ??

Despite the first impression, during testing I noticed that the model does not always produce a desired output and sometimes hallucinates by producing pseudocode instead of the expected aggregations.

If this was a real product this bug probably would be compensated by more fine-tuning of the model context and adding filtering before displaying the results to the end user.

As the data structure becomes more complex (a couple of additional tables and connections between them were added), the quality of queries generated by the model from simple promts degrades greatly. The requirements for detailing prompts are increasing, as well as increasing requirements to the skills of person that writes these prompts forcing them to know internal structure of the database.

Final conclusion is now I am less stessed about LLMs taking my job, at current phase it's nothing else but a swiss knife that can help me to do my job faster and more efficient.

PS: This small research barely scratching the surface of what LLMs are capable of doing and what we are going to see in the future. I did it as pet project in order to get myself more familiar with langchain library and OpenAI API.

Link to github: https://github.com/mlivirov/pet-langchain

Koenraad Block

Founder @ Bridge2IT +32 471 26 11 22 | Business Analyst @ Carrefour Finance

10 个月

The possibilities with AI seem endless. Thank you for sharing these intriguing developments! ????

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了