How to extract semi-structured data from text using ChatGPT API
Bunyamin Ergen
Artificial Intelligence Engineer @eTa??n | Conducting R&D on Advanced Multi-Agent AI Systems, LLM, State-of-the-Art Technologies, Multi-Modal Learning, Speech-to-Text, Computer Vision and Adversarial Machine Learning
I am very excited.?
Currently, I am working with a startup company researching rare diseases.?
We are using real patient data and hospital reports, but unfortunately these reports are kept in a very disorganized.?
Most of the information is in word documents or in a single column of a dataframe as raw text.?
My task is to extract data from this raw text and I think I did a good job but I was not satisfied because the text is so messy and the patterns in it are hard to decipher within it.
On the other hand, I was wondering if our job is actually to work with this messy raw text rather than working with machine learning and deep learning algorithms. Although it's part of our job, let's admit it, nobody likes data cleaning.
My team members, who are doctors and mathematicians that really good at their job , are can't waiting for the data to be ready and to work on it.?
Then, last night, I received an invitation.?
A friend of mine would like to organize a session on Chat GPT and we had a video meeting with a team for discussion about Chat GPT.?
After the meeting, we decided to hold a session and I decided to delve deeper into Chat GPT to prepare for the session.
Today, I woke up around 1 am, Turkey time, as usual, and started coding.
I started to examine the Chat GPT API.?
At the same time, I was writing codes related to my job.?
And then...
An idea came to my mind, "Can I directly send this text to Chat GPT and at least get it back in a dictionary format?"?
Because I could convert it later this dictionary format to a dataframe with just one line of code.?
And the result...
Yes ! It works!
But I couldn't get efficient results from the website, so I decided to do it with the Chat GPT API.?
By the way, I should say that this data gave me a project idea.?
The question was simple,?
领英推荐
"Is it possible to extract structured data from raw text data?
Or at least semi-structured data."?
Of course, using algorithms like LTSM, etc."?
Now I will share with you the Chat GPT API that make this possible.?
If you have raw text and there is data inside, you can convert it to a dictionary format.?
I think this is amazing.
I don't even care suspend my project because we already do with thanks to Chat GPT
I already have other projects in my mind... :)
If you want to collaborate on this idea "extract semi-structured data from text" , just contact me.
Github repo and other resources link below.
Note: Chat GPT does not work with PYthon 3.11. But 3.10 is works
Github Repo
How to get Chat GPT API
https://beta.openai.com/docs/api-reference
https://beta.openai.com/account/api-keys
https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety