Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka
https://spectrum.ieee.org/weather-predicting-black-swan#toggle-gdpr

Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

A wet and dark afternoon here in Toronto, making it a perfect time to write a blog. I was recently posed a question by Logan Mintz of predicting the "unpredictable" without knowing what you want to predict. This was an intriguing question, thanks Logan, and I could not stop thinking about it.

There has been lots of work on predicting black swan events. Many of us probably read Nassim Taleb's book The Black Swan. This book made us aware of the unpredictable nature of events like earthquakes, wild fires, pandemics, etc. that are extremely rare, but when they occur are highly impactful - what we call "tail events" (of some distribution). Machine learning (ML) is not good at predicting black swan events, for the simple fact that ML relies on historical data that already occurred, to predict the future.

Machine learning (ML) is not good at predicting black swan events, for the simple fact that ML relies on historical data that already occurred, to predict the future.

There are other recent methods for Black Swan predictions like from Stanford University researchers, and recent research from scientist from US that use Bayesian methods to predict the unpredictable. For me these methods advance our thinking of how AI and unsupervised methods can be used to predict the unpredictable. One issue I have with these methods is that they lack a real-time component and are complex to implement. Specifically, the question I asked myself is: "Is there a simpler way to determine or predict unpredictable events in real-time or as the data is being generated using TML, Kafka, and ChatGPT?"

One issue I have with these methods is that they lack a real-time component and they are complicated to implement. Specifically, the question I asked myself is: "Is there a simpler way to determine or predict unpredictable events in real-time or as the data are being generated using TML, Kafka and ChatGPT?"

The answer is maybe. One possible way that we can gain further insights into unpredictable events in real-time is to use TML, Kafka, and ChatGPT. The method is fairly simple and in real-time. Here is how. Let's take a healthcare use case like data from healthcare organizations on patients' symptoms. Specifically, blood pressure, heart rate, body temperature, oxygen levels, and body weight. WE DO NOT KNOW WHAT WE ARE LOOKING FOR IN THE DATA SO WE CANNOT USE MACHINE LEARNING FOR THE ANSWER (whatever that is). It may be that there is some disease that is trending in the population. The process is as follows:

  1. Stream every patient data into a Kafka Topic (call it TOPIC A) from real-time patient visits. We will use the FHIR data standard to capture and process patient data.
  2. Use TML to process data for EACH patient in real-time. We are going to use the following TML processing methods to process the patient data streams in Kafka in real-time and in-memory for each patient symptom: AVERAGE, and TREND. (TML offers over 30 preprocessing methods using the MAADSTML python library).
  3. Stream all of the preprocessed data into another Kafka Topic: TOPIC B. Note TML is preprocessing data in real-time: every second or minute.
  4. Assign a unique ID to each patient: 1, 2, 3, 4....N, where N can be any number of patient.
  5. Consume, in real-time, the preprocessed data for each patient and group the patients in a population. Here we want to get the aggregate (group) preprocessed data on patients regardless of where they live to determine the likelihood of any unpredictable events.
  6. Send the consumed data, in real-time, to ChatGPT and get it to play the "analyst" role. The question you can pose to chatGPT is: Is there any interesting events happening in the data? Stream the resulting answer to another Kafka Topic: TOPIC C.
  7. Visualize the data in TOPIC C in a real-time dashboard using Viperviz.
  8. Done!

The solution architecture is shown below; the actual implementation of this is shown in the Dashboard below. While this is using healthcare data, exactly the same methodology can be applied to almost anything: Sports, Finance, Gambling, IoT, manufacturing, etc.

No alt text provided for this image
By Author

The dashboard below implements the architecture. We streamed and processed data in STEPS 1-5. We then sent all of this processed data to ChatGPT to analyse, we asked ChatGPT (STEP 6): "Is there a health issue in the population? If so, what are the issues? What city do majority of the patients live in? How many males with issues? how many females with issues? What languages do they speak? Is there a disease trend? How many total patients analysed? Is the trend increasing or decreasing?"

No alt text provided for this image
By Author

CHATGPT RESPONSE: Based on the information given, it appears that there is a health issue in the population being monitored. The issues seem to be related to respiratory rate and diastolic blood pressure. The majority of the patients appear to live in Miami, and there seem to be more males with issues than females. The languages spoken by the patients include English, Spanish, and French. There does not seem to be a disease trend, but rather a general health issue in the population. It is not stated how many total patients are being analysed, but the trend does not seem to be increasing or decreasing.

Pretty COOL!

Now, this is not meant to solve or overly simplify the very complex quest of predicting Black Swan events, to the contrary. We are taking the complex nature of the problem of predicting Black Swan events, and breaking the problem down into smaller components (patients) that is: 1. Real-time, 2. Performing and processing data from individual patients, 3. Grouping the preprocessed patients' data, 4. Sending the grouped data to AI (ChatGPT) to further analyse the "preprocessed" data 5. Visualizing the results and repeating the process.

NOTE: The responses from ChatGPT can be grouped into a "SUPER" group, and this Super Group of responses can be re-sent to ChatGPT to summarize.

Advances in real-time processing of streaming data with Kafka, TML and Generative AI are opening up tremendous potential in large scale, in-memory, data processing that is low-cost and high value for scenarios and problems that have been VERY hard to solve, let alone finding a way to get further insights into the problem's solution set.

Advances in real-time processing of streaming data with TML and Generative AI are opening up tremendous potential in large scale, in-memory, data processing that is low-cost and high value for scenarios and problems that have been VERY hard to solve, let alone finding a way to further insights into the problem's solution set.

Till next time..

Choon-Wei Tay

Principal Project & Product Management

7 个月

Hi Sebastian, Interesting concept here, how would you propose to mitigate hallucinations?

The evolution of Machine Learning

回复

要查看或添加评论,请登录

Sebastian Maurice, Ph.D.的更多文章

社区洞察

其他会员也浏览了