登录查看更多内容

Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

Sebastian Maurice, Ph.D.

Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder

发布日期: 2023年4月22日

A wet and dark afternoon here in Toronto, making it a perfect time to write a blog. I was recently posed a question by Logan Mintz of predicting the "unpredictable" without knowing what you want to predict. This was an intriguing question, thanks Logan, and I could not stop thinking about it.

There has been lots of work on predicting black swan events. Many of us probably read Nassim Taleb's book The Black Swan. This book made us aware of the unpredictable nature of events like earthquakes, wild fires, pandemics, etc. that are extremely rare, but when they occur are highly impactful - what we call "tail events" (of some distribution). Machine learning (ML) is not good at predicting black swan events, for the simple fact that ML relies on historical data that already occurred, to predict the future.

Machine learning (ML) is not good at predicting black swan events, for the simple fact that ML relies on historical data that already occurred, to predict the future.

There are other recent methods for Black Swan predictions like from Stanford University researchers, and recent research from scientist from US that use Bayesian methods to predict the unpredictable. For me these methods advance our thinking of how AI and unsupervised methods can be used to predict the unpredictable. One issue I have with these methods is that they lack a real-time component and are complex to implement. Specifically, the question I asked myself is: "Is there a simpler way to determine or predict unpredictable events in real-time or as the data is being generated using TML, Kafka, and ChatGPT?"

One issue I have with these methods is that they lack a real-time component and they are complicated to implement. Specifically, the question I asked myself is: "Is there a simpler way to determine or predict unpredictable events in real-time or as the data are being generated using TML, Kafka and ChatGPT?"

The answer is maybe. One possible way that we can gain further insights into unpredictable events in real-time is to use TML, Kafka, and ChatGPT. The method is fairly simple and in real-time. Here is how. Let's take a healthcare use case like data from healthcare organizations on patients' symptoms. Specifically, blood pressure, heart rate, body temperature, oxygen levels, and body weight. WE DO NOT KNOW WHAT WE ARE LOOKING FOR IN THE DATA SO WE CANNOT USE MACHINE LEARNING FOR THE ANSWER (whatever that is). It may be that there is some disease that is trending in the population. The process is as follows:

Stream every patient data into a Kafka Topic (call it TOPIC A) from real-time patient visits. We will use the FHIR data standard to capture and process patient data.
Use TML to process data for EACH patient in real-time. We are going to use the following TML processing methods to process the patient data streams in Kafka in real-time and in-memory for each patient symptom: AVERAGE, and TREND. (TML offers over 30 preprocessing methods using the MAADSTML python library).
Stream all of the preprocessed data into another Kafka Topic: TOPIC B. Note TML is preprocessing data in real-time: every second or minute.
Assign a unique ID to each patient: 1, 2, 3, 4....N, where N can be any number of patient.
Consume, in real-time, the preprocessed data for each patient and group the patients in a population. Here we want to get the aggregate (group) preprocessed data on patients regardless of where they live to determine the likelihood of any unpredictable events.
Send the consumed data, in real-time, to ChatGPT and get it to play the "analyst" role. The question you can pose to chatGPT is: Is there any interesting events happening in the data? Stream the resulting answer to another Kafka Topic: TOPIC C.
Visualize the data in TOPIC C in a real-time dashboard using Viperviz.
Done!

The solution architecture is shown below; the actual implementation of this is shown in the Dashboard below. While this is using healthcare data, exactly the same methodology can be applied to almost anything: Sports, Finance, Gambling, IoT, manufacturing, etc.

领英推荐

OpenAI’s “deep research” gives a preview of the AI…

Fast Company 1 个月前

Beat your Bot: Building your moat against AI

Aswath Damodaran 6 个月前

This week's latest AI industry updates: February 19…

SymphonyAI 1 个月前

No alt text provided for this image — By Author

The dashboard below implements the architecture. We streamed and processed data in STEPS 1-5. We then sent all of this processed data to ChatGPT to analyse, we asked ChatGPT (STEP 6): "Is there a health issue in the population? If so, what are the issues? What city do majority of the patients live in? How many males with issues? how many females with issues? What languages do they speak? Is there a disease trend? How many total patients analysed? Is the trend increasing or decreasing?"

CHATGPT RESPONSE: Based on the information given, it appears that there is a health issue in the population being monitored. The issues seem to be related to respiratory rate and diastolic blood pressure. The majority of the patients appear to live in Miami, and there seem to be more males with issues than females. The languages spoken by the patients include English, Spanish, and French. There does not seem to be a disease trend, but rather a general health issue in the population. It is not stated how many total patients are being analysed, but the trend does not seem to be increasing or decreasing.

Pretty COOL!

Now, this is not meant to solve or overly simplify the very complex quest of predicting Black Swan events, to the contrary. We are taking the complex nature of the problem of predicting Black Swan events, and breaking the problem down into smaller components (patients) that is: 1. Real-time, 2. Performing and processing data from individual patients, 3. Grouping the preprocessed patients' data, 4. Sending the grouped data to AI (ChatGPT) to further analyse the "preprocessed" data 5. Visualizing the results and repeating the process.

NOTE: The responses from ChatGPT can be grouped into a "SUPER" group, and this Super Group of responses can be re-sent to ChatGPT to summarize.

Advances in real-time processing of streaming data with Kafka, TML and Generative AI are opening up tremendous potential in large scale, in-memory, data processing that is low-cost and high value for scenarios and problems that have been VERY hard to solve, let alone finding a way to get further insights into the problem's solution set.

Advances in real-time processing of streaming data with TML and Generative AI are opening up tremendous potential in large scale, in-memory, data processing that is low-cost and high value for scenarios and problems that have been VERY hard to solve, let alone finding a way to further insights into the problem's solution set.

Till next time..

Choon-Wei Tay

Principal Project & Product Management

7 个月

Hi Sebastian, Interesting concept here, how would you propose to mitigate hallucinations?

1 次回应

Pablo M.

1 年

The evolution of Machine Learning

查看更多评论

要查看或添加评论，请登录

Sebastian Maurice, Ph.D.的更多文章

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

2024年12月11日

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

As far as I can remember, very few companies are good at scaling real-time solutions. Not because they can not do it…
Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

2024年9月20日

Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

I have always been fascinated with things that move fast. Speed is exhilarating and gets us to places faster.
Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

2023年12月29日

Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

In the past few months, I have been discussing with some cybersecurity experts how real-time data is being used to…
Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

2023年11月20日

Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

As I am sure many people who write blogs feel, there has to be some inspiration and motivation to write something that…

2 条评论
Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

2023年9月10日

Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

It's been a while since my last blog, life has been extremely busy. So, I thought why not relax and write a blog.
Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

2023年6月15日

Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

Companies today are faced with a fast growing digital repository of data that is not just numeric, but textual such as…

3 条评论
Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

2023年5月15日

Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

The world of the Internet of Things (IoT) is growing rapidly especially as more devices and objects like bulbs, vacuum…
Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

2023年3月24日

Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

What can I say? In just a few months (weeks?), Generative AI and ChatGPT have literally, finally, changed the AI…

1 条评论
FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

2023年3月17日

FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

An area that has interested me for many years is the digital evolution of health care systems around the world with…
Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

2023年3月4日

Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

It is a beautiful snowy afternoon here in Toronto, and I was pondering real-time databases (RTDBs). Specifically, as it…

See all articles

Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

Sebastian Maurice, Ph.D.

Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder

领英推荐

Sebastian Maurice, Ph.D.的更多文章

社区洞察

其他会员也浏览了

How Can We Tackle The Problem Of Bias In Artificial Intelligence?

Where AI is Headed in 2025: A Builder’s Guide

Building Both a Fraud & Not-Fraud Model, Leveraging Time-Series Segmentation, and Top AI News in 2023 So?Far

Below the Waterline of the AI Iceberg: Data’s Evolution, History, and Exponential Rise

DeepSeek R1: A New Chapter in Global AI Realignment

?? Iron Man Isn't the Only One Who Can Create Jarvis.

Deepseek R1 - The Good, The Bad, and The Cloudy

The Biggest Limitation of AI in 2024

February 26, 2025

January AI News Roundup: Billion-Dollar Investments, Disruptive Models & Global AI Power Plays

领英推荐

Sebastian Maurice, Ph.D.的更多文章

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

社区洞察

其他会员也浏览了

How Can We Tackle The Problem Of Bias In Artificial Intelligence?

Where AI is Headed in 2025: A Builder’s Guide

Building Both a Fraud & Not-Fraud Model, Leveraging Time-Series Segmentation, and Top AI News in 2023 So?Far

Below the Waterline of the AI Iceberg: Data’s Evolution, History, and Exponential Rise

DeepSeek R1: A New Chapter in Global AI Realignment

?? Iron Man Isn't the Only One Who Can Create Jarvis.

Deepseek R1 - The Good, The Bad, and The Cloudy

The Biggest Limitation of AI in 2024

February 26, 2025

January AI News Roundup: Billion-Dollar Investments, Disruptive Models & Global AI Power Plays