登录查看更多内容

Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

Sebastian Maurice, Ph.D.

Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder

发布日期: 2023年12月29日

In the past few months, I have been discussing with some cybersecurity experts how real-time data is being used to protect corporate networks. (I also teach cloud/network security at Seneca Polytechnic.) There are a few key insights that have become apparent to me:

Many Cybersecurity companies still do not understand how to deeply extract insights from real-time data that, in many cases, are contained in log files.
Many cybersecurity companies claim to be using AI - but have no clue how best to leverage AI and Machine Learning to extract insights from data.
Many cybersecurity companies are overspending on real-time data infrastructure.
Many cybersecurity companies claim to have the "silver bullet" to protecting corporate networks - yet we still keep getting major network breaches.
It is very clear that cyber security companies are ALL-IN with AI and machine learning.

While no one has the silver bullet to protect corporate networks, other than unplugging all machines from the Internet, there are ways to protect corporate networks with real-time data, AI and machine learning that can be effective approaches.

While no one has the silver bullet to protect corporate networks, other than unplugging all machines from the Internet, there are ways to protect corporate networks with real-time data, AI and machine learning that can be effective approaches.

This blog discusses how to process real-time data from a corporate network integrated with Qdrant, AI, TML, Docker, Kafka, and privateGPT. Let me clearly discuss this use case and technology stack for real-time data processing to analyse network data.

Problem Statement: How to use real-time data from host machines (desktops, laptops, etc.), to process all real-time data locally, determine if a host machine is being hacked in real-time, produce alerts to humans for further investigation. and allow humans to use privateGPT (integrated with Qdrant) to query localized information about machines in real-time, while keeping the TCO of the solution as lowest as possible.

Approach: We will use:

Apache Kafka: Real-time data streaming platform
Transactional Machine Learning (TML) and binaries
Qdrant: Vector DB for localized GPT
PrivateGPT: 100% Secure, Local, and Free GPT
Docker: Containerize TML solutions
maadstml Python Library
Real-Time Dashboard

Solution Architecture

Lets go into the above solution architecture and discuss how real-time data processing works.

Hackers are trying to get into corporate networks. Now, hackers are very creative people and if truly motivated, I am sure they will find some way to breach a network. But, lets assume they are trying to digitally access a corporate network from the internet.
To analyse a corporate network made up of routers, switches, host machines, we use a SDN (software defined network) to extract data from the corporate network in real-time (every second or so).
We use the maadstml python library to extract network data from the SDN.
maadstml python library also connects to the TML binaries to produce this raw data to Apache Kafka. So, now we have raw data that is streaming, to process in real-time.
TML binaries process real-time data in parallel. Consider these TML binaries as microservices. TML processes real-time data, in-memory, using sliding time windows. In these sliding time windows, TML can perform advanced processing of data, as well as build machine learning models for each host machine and predict a risk level if a machine is being hacked.
privateGPT is used to further analyse the TML processed data and prompts are posed to privateGPT using locally embedded data stored in Qdrant vector database. By integrating GPT and Qdrant, we can localize the solution further and leverage the advanced Generative AI capabilities of privateGPT free of charge!
As data is being analysed by TML and privateGPT, the output of risk level for each host machine can be dashboarded.
The entire solution can be containerized with Docker and scaled with Kubernetes.

This solution discussed above is taught in Seneca Polytechnic. A Youtube video of this solution is here: https://www.youtube.com/watch?v=dVM2yz7wdQA&t=377s

领英推荐

BigID Uses Machine Learning to Simplify Data Privacy

Sramana Mitra 1 个月前

CxO, Security, Edge, Gestalt IT, Technology, Big Data,…

John J. McLaughlin 8 个月前

Security, Call Centers, ESG, Big Data, Cloud…

John J. McLaughlin 7 个月前

Key Conclusions:

Real-time data is growing in importance and companies are trying to get fast insights from real-time data. The above solution approach integrates and seamlessly connects LLM to real-time data thereby expanding its learning capacity to not just data it has been trained on, but real-world data being generated second by second while keeping costs very low!
The above solution can be run On-Premise or in the Cloud.
Apache kafka is a powerful streaming platform to manage and process unlimited amount of real-time data.
TML enables in-memory, advanced processing and machine learning, at the entity level, to analyse every host machine individually.
Using PrivateGPT and Qdrant, we can use generative AI that is 100% secure, local and Free to provide deeper AI analysis of the network data.
Total Cost of ownership of the above solution is drastically reduced because all data is processed in-memory. Also, privateGPT and Qdrant are run locally (so 100% free and secure). The entire solution can also be run on-premise (no cloud needed).

Users can run the above solution with privateGPT, Qdrant and TML using the following steps for Linux/Amd64 machines (if you have MAC or Linux/Arm64 then change amd64 to arm64 and change PGPTIP="host.docker.internal"):

Pull and Run the Qdrant docker container: docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
Pull and Run the privateGPT container: docker run -d -p 8001:8001 --net=host --env PORT=8001 --env GPU=0 --env COLLECTION=tml --env WEB_CONCURRENCY=2 --env CUDA_VISIBLE_DEVICES=0 maadsdocker/tml-privategpt-no-gpu-amd64
Pull and Run the TML Cybersecurity solution container: docker run -d --env VIPERVIZPORT=8080 --net=host --env RUNTYPE=2 --env PGPTIP="https://127.0.0.1" --env PGPTPORT=8001 --env KAFKAEMBEDDINGSFOLDER=kafkaembeddings --env DOCFOLDER="" --env USEEMBEDDINGS=0 --env DELETEKAFKAEMBEDDINGSHOURS=10 --env PGPTROLLBACK=4 --env BROKERHOSTPORT=127.0.0.1:9092 --env KAFKAPRODUCETOPIC=cisco-network-mainstream --env HACKEDHOSTS=5.100-i,6.18-i,5.18-i --env CLOUDUSERNAME= --env CLOUDPASSWORD= maadsdocker/tml-cisco-network-cyberthreats-privategpt-amd64Once all the containers are running access their dashboards:
Access the Qdrant UI in browser by entering the URL: https://127.0.0.1:6333/dashboard
Access the privateGPT dashboard: https://localhost:8001
Access the TML Cybersecurity dashboard: https://localhost:8080/tml-cisco-network-privategpt-monitor.html?topic=cisco-network-preprocess,cisco-network-privategpt&offset=-1&groupid=&rollbackoffset=150&topictype=prediction&append=0&secure=1

NOTE: The business value of the above solution is not only that it can be run securely and locally with minimal cost, but also using privateGPT and Qdrant we can give users the ability to query local information about the output from TML for additional insights.

NOTE: The business value of the above solution is not only that it can be run securely and locally with minimal cost, but also using privateGPT and Qdrant we can give users the ability to query local information about the output from TML for additional insights.

2024 will be an exciting year with (more) massive growth in the use of AI and machine learning. However, we must keep in mind, that both the good guys AND the bad guys have access to the same technologies. So, the factor is no longer technology know-how, but rather, creative innovations that combine these technologies in particular ways that are both cost effective, and effective in preventing cyber attacks.

2024 will be an exciting year with (more) massive growth in the use of AI and machine learning. However, we must keep in mind, that both the good guys AND the bad guys have access to the same technologies. So, the factor is no longer technology know-how, but rather, creative innovations that combine these technologies in particular ways that are both cost effective, and effective in preventing cyber attacks.

Till next time...

要查看或添加评论，请登录

Sebastian Maurice, Ph.D.的更多文章

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

2024年12月11日

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

As far as I can remember, very few companies are good at scaling real-time solutions. Not because they can not do it…
Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

2024年9月20日

Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

I have always been fascinated with things that move fast. Speed is exhilarating and gets us to places faster.
Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

2023年11月20日

Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

As I am sure many people who write blogs feel, there has to be some inspiration and motivation to write something that…

2 条评论
Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

2023年9月10日

Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

It's been a while since my last blog, life has been extremely busy. So, I thought why not relax and write a blog.
Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

2023年6月15日

Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

Companies today are faced with a fast growing digital repository of data that is not just numeric, but textual such as…

3 条评论
Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

2023年5月15日

Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

The world of the Internet of Things (IoT) is growing rapidly especially as more devices and objects like bulbs, vacuum…
Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

2023年4月22日

Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

A wet and dark afternoon here in Toronto, making it a perfect time to write a blog. I was recently posed a question by…

2 条评论
Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

2023年3月24日

Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

What can I say? In just a few months (weeks?), Generative AI and ChatGPT have literally, finally, changed the AI…

1 条评论
FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

2023年3月17日

FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

An area that has interested me for many years is the digital evolution of health care systems around the world with…
Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

2023年3月4日

Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

It is a beautiful snowy afternoon here in Toronto, and I was pondering real-time databases (RTDBs). Specifically, as it…

See all articles

Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

Sebastian Maurice, Ph.D.

Global AI and Machine Learning Leader | Teacher | Inventor | Author | Blogger | Coder

领英推荐

Sebastian Maurice, Ph.D.的更多文章

社区洞察

其他会员也浏览了

Elegantly designed Agentic AI seals the end of the Splunk Era.

Security, Call Centers, Realtime Analytics, Humor, Big Data, Linux, Wireless, Developer, Oracle, Microsoft, Red Hat, Events (310.2.Thursday)

How AI Is Making Data Security Possible

CxO, Big Data, MFA, Email, NVIDIA, Commvault, Red Hat, CxO Events (320.1.2) Monday Afternoon

Most Popular Articles in Vol 309 Issue 2, Posted Week of Dec. 18th

Most Popular Articles in Vol 319 Issue 1, Posted Week of Oct. 7th

Most Popular Articles in Vol 318 Issue 3, Posted Week of Sept. 23rd

Most Popular Articles in Volume 306 Issue 4, Posted Week of Oct. 2nd

Most Popular Articles in Vol 311 Issue 4, Posted Week of March 4th

Elegantly designed Agentic AI seals the end of the Splunk Era.

领英推荐

Sebastian Maurice, Ph.D.的更多文章

Automating the Scaling of Real-Time Solutions for the Enterprise with Kubernetes, TML, Kafka, CoreDNS, Docker, PrivateGPT and Qdrant

Accelerate Real-Time Solution Builds For the Enterprise with Airflow, Kafka, Docker, GitHub, TML, and ReadTheDocs

Streaming with PrivateGPT: 100% Secure, Local, Private, and Free with Docker

Why Most Companies Are Using Apache Kafka "Incorrectly" For Real-Time Analytics

Real-Time Text Extraction From PDFs, Audio, Video, Images and Processing with TML, Kafka, Blockchain and ChatGPT For Information Management

Containerizing Real-Time IoT Machine Learning Solutions with Docker, TML, Kafka, TMUX, and Python

Real-Time Predictions of Black Swan Events using ChatGPT, Transactional Machine Learning (TML), and Apache Kafka

Contextualizing ChatGPT with Health Care Data Streams, Kafka and TML: Analyse and Summarize Data Faster For Faster Understanding of Disease Trends

FHIR Data Streams: A Quick Approach For Real-Time Processing and Transactional Machine Learning using Apache Kafka

Three Reasons Why You Do NOT Need a Real-Time Database For Real-Time (Transactional) Machine Learning Only Apache Kafka

社区洞察

其他会员也浏览了

Elegantly designed Agentic AI seals the end of the Splunk Era.

Security, Call Centers, Realtime Analytics, Humor, Big Data, Linux, Wireless, Developer, Oracle, Microsoft, Red Hat, Events (310.2.Thursday)

How AI Is Making Data Security Possible

CxO, Big Data, MFA, Email, NVIDIA, Commvault, Red Hat, CxO Events (320.1.2) Monday Afternoon

Most Popular Articles in Vol 309 Issue 2, Posted Week of Dec. 18th

Most Popular Articles in Vol 319 Issue 1, Posted Week of Oct. 7th

Most Popular Articles in Vol 318 Issue 3, Posted Week of Sept. 23rd

Most Popular Articles in Volume 306 Issue 4, Posted Week of Oct. 2nd

Most Popular Articles in Vol 311 Issue 4, Posted Week of March 4th

Elegantly designed Agentic AI seals the end of the Splunk Era.