Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends
Sebastian Maurice, Ph.D.
Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder
What can I say? In just a few months (weeks?), Generative AI and ChatGPT have literally, finally, changed the AI natural language landscape. Our world will never be the same. This opens up so many opportunities for humans to leverage large language models (LLMs) to decipher textual information and synthesize information in a matter of seconds. While the ability of machines to learn, effectively, from data has been on going for decades, the accuracy and trust in the machine's output has dramatically improved, and this is now a game changer.
While the ability of machines to learn, effectively, from data has been on going for decades, the accuracy and trust in the machine's output has dramatically improved, and this is now a game changer.
I still remember my young University days, when we had to mathematically compute the estimated parameters of a OLS model using matrix (vectorized) math. Those were fun days - oh how things have changed.
Back to the blog. The reason for this blog is to discuss how ChatGPT can be contextualized with specific data, such as Health Care data, and its amazing abilities to analyse and summarize these data streams for patterns and trends. I wanted to thank Michael Folino for his blog as inspiration for my blog. Isn't it great when people share their knowledge and expertise. It makes the world go round.
By automating the analysis and summary of health data, we reduce the human time and cost needed for analysis. This can dramatically accelerate the insights extracted from data streams for faster, less costly, decision making. The architecture is shown below with ChatGPT integration. The steps 1-7 show the flow of data streams: from ingesting the data, processing the data, doing machine learning, then using ChatGPT to analyse and summarize the data with Q & A (Step 6), and finally visualizing the data. All of this happens in-memory and can be containerized and deployed with Kubernetes resulting in a very low cost but high value solution.
The solution architecture above is general and can be applied to any type of data streams: IoT data, Financial data, manufacturing data, cybersecurity data, etc. Step 6 is the integration of ChatGPT. Our use case was to use ChatGPT to analyse and summarize FHIR data streams that have been processed by TML to determine if there is a disease trend in the US population.
Our use case was to use ChatGPT to analyse and summarize FHIR data streams that have been processed by TML to determine if there is a disease trend in the US population.
Our process, as shown in the above architecture was the following:
- Ingest raw FHIR patient data stream
- Preprocess patient data using the Linux TML binary MAADS-Viper for occurrence of high blood pressure, high body temperature, oxygen deficiencies, high heart rate symptoms
- Once data is processed for every patient data stream we write the preprocessed results to a Kafka topic
- Using MAADSTML python library function: viperconsumefromtopic we consume the preprocessed data, format it for ChatGPT and send it to ChatGPT to answer our questions about the data: Is there a health issue in the population? If so, what are the issues? What city do majority of the patients live in? How many males with issues? how many females with issues? What languages do they speak? Is there a disease trend? How many total patients analysed? Is the trend increasing or decreasing?
- The responses from ChatGPT are then produced to another Kafka topic?
- Maads-Viperviz then consumes from the ChatGPT topic and streams the results to a client browser and automatically dashboards the results.
领英推è
For example, we sent ChatGPT 29,430 lines of text from the preprocessed topic.
ChatGPT's Response to our questions using the text-davinci-002 language model and Temperature=0.7 is:
There is a health issue in the population. The issues are Body-Weight, Body-temperature, and Diastolic-Blood-Pressure. The majority of the patients live in Miami. There are more males with issues than females. The languages spoken are English, Spanish, and French. There is a disease trend. The total number of patients analysed is 30. The trend is increasing.
Not a bad summary of 29,430 lines of text in less than 3 seconds.
Not a bad summary of 29,430 lines of text in less than 3 seconds.
We also integrated the FHIR data streams, TML, Kafka and ChatGPT in a real-time streaming dashboard show below:
There are few details of course not covered, but you can see the value and potential of integrating ChatGPT to analyse and summarize real-time data streams, at scale, in few seconds to give a richer, deeper, insights into the data to humans or other machines.
Our world is changing very fast, and I am very excited about the positive impacts this will have on humans and our abilities to create and innovate further. It will (must) generate newer and more efficient processes and solutions to improve our lives that can benefit the less fortunate and our planet.
Till next time...
BI Developer @ Genpact | Data Analytics | Business Intelligence | Writing
2 å¹´ChatGPT will surely enhance the informative part of the dashboard