Using ChatGPT to Explore Claims Databases

Using ChatGPT to Explore Claims Databases

Incorporating ChatGPT into products and workflows has become almost indispensable. Of course, the irony is that the same technology enhancing our capabilities may, in the future, automate certain aspects of our roles. In the spirit of this paradox, I present an agent powered by ChatGPT for the analysis of medical claims data.

20 Years of Medicare Part B Claims Data

Each year, the Centers for Medicare & Medicaid Services (CMS) releases aggregate data detailing the procedures performed by healthcare providers and the corresponding reimbursements. This trove of information arrives in the form of Excel or CSV files with varying formats. For analytical purposes, I've diligently converted and stored these datasets in a comprehensive SQL database, augmented by metadata for each code. This database powers a dashboard that offers high-level insights. This month marks a significant milestone, as the database now encapsulates two decades' worth of data.

Sometimes I get a question for which the dashboard doesn't answer and then I have to query the tables directly. For obvious security reasons, I haven't really opened up SQL queries to users of the dashboard. So in a move that provides both usability for those who don't think in SQL or don't know how I have structured my tables, as well as to make it flexible enough to accept any query, I tried out using OpenAI's ChatGPT API and here's what I am learning.

Building an Agent for Analyzing Medical Claims Data

To keep it simple, I created a command-line interface where I ask a question and get an answer. I am using Python with the OpenAI packages and LlamaIndex's SQLDatabase service to query structured data. For this, I am using GPT-4, and I can say categorically that GPT-3.5 did not produce usable results. For the curious, I'll post the basic code below.

First, just for a use case, you can see how this works from a command line, where I am asking a question in natural language, and behind the scenes this is being turned into SQL that can reasonably be executed on the tables that I've given ChatGPT access to.

OpenAI is not ingesting my data in any way and is not actually doing any processing on the data directly. It is simply generating and executing SQL code on my own server. The code I wrote also returns the SQL that ChatGPT comes up with so I can evaluate if its approach to the data makes sense.

A Command Line Chat Interface for Natural Language Queries on Medical Claims Databases

Looking at the Results

A few observations from my initial alpha:

  1. Maturity: Overall, in untuned, simple code terms, this is not ready / production quality, but quite close.
  2. Proven Use Case: it provides answers that the dashboard's tables and graphs cannot surface without a user needing to know either SQL or my schema.
  3. Schema Intuition: It seems to know what the tables are for and how they are organized, even though I never provided a schema. I find this impressive. I did try adding some context for the tables, but it didn't have a substantial impact and sometimes produced inferior results.
  4. Latency: It's not fast. An experienced SQL developer with good knowledge of the CPT codes and how I've implemented my tables will usually get a result faster.
  5. Prompt Complexity: You can also ask more complex questions asking for math (comparison, growth rate) which requires looking at more than just one code and characterizing the different procedures.
  6. Wrong Without Warning: It can give a confidently inaccurate result. By looking at the SQL you can see that the approach, and therefore the result, is not correct, but this is not helpful to someone who cannot interpret the code.
  7. Poor Error Handling: It gives up if SQL returns nothing or an error. This means that users still need some knowledge of CPT codes and possibly their standard descriptions if they want to formulate a clean prompt. Again, you would need to interpret the SQL it shows you to see if you can improve your prompt. An improvement might be to detect when there are no results and ask the user for help improving the prompt.
  8. Token Utilization: Queries seem to cost an average of 280 tokens.
  9. Prompt Optimization: The closer the natural language query is to SQL and the data tables, the better the results.


Here are examples of questions that got good answers:

  • "Compare the number unique number of codes in 2002 versus 2022."
  • "In which year were imaging procedures the most expensive type of procedure?"
  • "What was the sum of services with a code starting with J performed in 2021?"
  • "What are the possible codes for psychotherapy and their descriptions?"
  • "Which laboratory test code had the largest percent gain between 2018 and 2022?"
  • "What are the most frequently used codes for telemedicine?"
  • "What was the difference in aflibercept administrations between 2018 and 2020?"
  • "Did reimbursement for code 86701 decline from 2018 to 2022?

Technical Background

I'm sure you are curious how this is put together. So here's a simplified version of the code that I'm using.

os.environ['OPENAI_API_KEY'] = '<open-ai-key>'
openai.api_key = os.environ.get('<open-ai-key>')

connection_uri = 
   f"mysql+pymysql://{db_user}:{db_password}@{db_host}/{db_name}"

engine = create_engine(connection_uri)
llm = OpenAI(temperature=0.5, model="gpt-4")

query_engine = NLSQLTableQueryEngine(
        sql_database = SQLDatabase(engine),
        tables = ["CMSB", "hcpcs_codes"],
        synthesize_response = True,
        service_context = ServiceContext.from_defaults(llm=llm),)

try: 
    response = query_engine.query(question)
    print("\033[91mANSWER:\n\033[0m" + response.response + "\n")
    print("\033[91mSQL:\n\033[0m" + str(response.metadata) + "\n")

except Exception as ex:
    print("Error " + str(ex))        

In terms of the database, here is the schema. There are quite a few more tables for finding provider-level data but in the interest of incremental gains, I just turned ChatGPT to the aggregate data:

mysql> desc CMSB;
+--------------------+-------------+------+-----+---------+-------+
| Field              | Type        | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+-------+
| year               | int         | NO   | MUL | NULL    |       |
| hcpcs_code         | varchar(8)  | YES  | MUL | NULL    |       |
| modifier           | varchar(15) | NO   |     | NULL    |       |
| no_of_services     | int         | YES  |     | NULL    |       |
| allowable_payments | int         | YES  |     | NULL    |       |
| total_payments     | int         | YES  |     | NULL    |       |
+--------------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

mysql> desc hcpcs_codes;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| hcpcs         | varchar(8)   | YES  |     | NULL    |       |
| description   | varchar(250) | YES  |     | NULL    |       |
| drug_ind      | char(1)      | YES  |     | NULL    |       |
| activity_type | varchar(45)  | YES  |     | NULL    |       |
| patient_type  | varchar(45)  | YES  |     | NULL    |       |
+---------------+--------------+------+-----+---------+-------+
5 rows in set (0.01 sec)        

I think one of the things to consider is to add some preamble context. I did try using SQLTableRetrieverQueryEngine instead of NSQLTableQueryEngine, but that increased query time substantially without gains in accuracy.

I also added some preamble directly to the question, e.g. "You are a data analyst providing information on medical claims using two tables, ..." and this might have made some improvements, but it was hard to tell.

Overall, this is one of many experiments I am doing with GPT-4 and both LangChain and LlamaIndex. My results with GPT-3.5 a few months ago were lackluster, particularly as I couldn't download my instinct for specific clinical contexts.

I have kept my dashboard unchanged for now, but ...

What do you think? Is chat the right kind of interface for getting insights from medical claims databases?


*Note that CPT codes are copyright 2023 by the American Medical Association and a license is required to sell or distribute codes.


要查看或添加评论,请登录

社区洞察