How to extract semi-structured data from text using ChatGPT API
Cyborg reading book

How to extract semi-structured data from text using ChatGPT API

I am very excited.?


Currently, I am working with a startup company researching rare diseases.?


We are using real patient data and hospital reports, but unfortunately these reports are kept in a very disorganized.?


Most of the information is in word documents or in a single column of a dataframe as raw text.?


My task is to extract data from this raw text and I think I did a good job but I was not satisfied because the text is so messy and the patterns in it are hard to decipher within it.


On the other hand, I was wondering if our job is actually to work with this messy raw text rather than working with machine learning and deep learning algorithms. Although it's part of our job, let's admit it, nobody likes data cleaning.


My team members, who are doctors and mathematicians that really good at their job , are can't waiting for the data to be ready and to work on it.?


Then, last night, I received an invitation.?


A friend of mine would like to organize a session on Chat GPT and we had a video meeting with a team for discussion about Chat GPT.?


After the meeting, we decided to hold a session and I decided to delve deeper into Chat GPT to prepare for the session.


Today, I woke up around 1 am, Turkey time, as usual, and started coding.


I started to examine the Chat GPT API.?


At the same time, I was writing codes related to my job.?


And then...


An idea came to my mind, "Can I directly send this text to Chat GPT and at least get it back in a dictionary format?"?


Because I could convert it later this dictionary format to a dataframe with just one line of code.?


And the result...


Yes ! It works!


But I couldn't get efficient results from the website, so I decided to do it with the Chat GPT API.?


By the way, I should say that this data gave me a project idea.?


The question was simple,?

"Is it possible to extract structured data from raw text data?

Or at least semi-structured data."?


Of course, using algorithms like LTSM, etc."?


Now I will share with you the Chat GPT API that make this possible.?


If you have raw text and there is data inside, you can convert it to a dictionary format.?


I think this is amazing.


I don't even care suspend my project because we already do with thanks to Chat GPT


I already have other projects in my mind... :)


If you want to collaborate on this idea "extract semi-structured data from text" , just contact me.


Github repo and other resources link below.



No alt text provided for this image
Library


No alt text provided for this image
Open AI - API


No alt text provided for this image
Raw text data


No alt text provided for this image
Results


No alt text provided for this image
Dictionary


No alt text provided for this image
Dataframe


Note: Chat GPT does not work with PYthon 3.11. But 3.10 is works


Github Repo


How to get Chat GPT API

https://beta.openai.com/docs/api-reference

https://beta.openai.com/account/api-keys

https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety

要查看或添加评论,请登录

Bunyamin Ergen的更多文章

社区洞察

其他会员也浏览了