Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

Cybersecurity and Real-Time Data Processing using privateGPT, Kafka, TML, Qdrant VectorDB, Docker

In the past few months, I have been discussing with some cybersecurity experts how real-time data is being used to protect corporate networks. (I also teach cloud/network security at Seneca Polytechnic.) There are a few key insights that have become apparent to me:

  1. Many Cybersecurity companies still do not understand how to deeply extract insights from real-time data that, in many cases, are contained in log files.
  2. Many cybersecurity companies claim to be using AI - but have no clue how best to leverage AI and Machine Learning to extract insights from data.
  3. Many cybersecurity companies are overspending on real-time data infrastructure.
  4. Many cybersecurity companies claim to have the "silver bullet" to protecting corporate networks - yet we still keep getting major network breaches.
  5. It is very clear that cyber security companies are ALL-IN with AI and machine learning.

While no one has the silver bullet to protect corporate networks, other than unplugging all machines from the Internet, there are ways to protect corporate networks with real-time data, AI and machine learning that can be effective approaches.

While no one has the silver bullet to protect corporate networks, other than unplugging all machines from the Internet, there are ways to protect corporate networks with real-time data, AI and machine learning that can be effective approaches.

This blog discusses how to process real-time data from a corporate network integrated with Qdrant, AI, TML, Docker, Kafka, and privateGPT. Let me clearly discuss this use case and technology stack for real-time data processing to analyse network data.

Problem Statement: How to use real-time data from host machines (desktops, laptops, etc.), to process all real-time data locally, determine if a host machine is being hacked in real-time, produce alerts to humans for further investigation. and allow humans to use privateGPT (integrated with Qdrant) to query localized information about machines in real-time, while keeping the TCO of the solution as lowest as possible.

Approach: We will use:

Solution Architecture

By Author

Lets go into the above solution architecture and discuss how real-time data processing works.

  1. Hackers are trying to get into corporate networks. Now, hackers are very creative people and if truly motivated, I am sure they will find some way to breach a network. But, lets assume they are trying to digitally access a corporate network from the internet.
  2. To analyse a corporate network made up of routers, switches, host machines, we use a SDN (software defined network) to extract data from the corporate network in real-time (every second or so).
  3. We use the maadstml python library to extract network data from the SDN.
  4. maadstml python library also connects to the TML binaries to produce this raw data to Apache Kafka. So, now we have raw data that is streaming, to process in real-time.
  5. TML binaries process real-time data in parallel. Consider these TML binaries as microservices. TML processes real-time data, in-memory, using sliding time windows. In these sliding time windows, TML can perform advanced processing of data, as well as build machine learning models for each host machine and predict a risk level if a machine is being hacked.
  6. privateGPT is used to further analyse the TML processed data and prompts are posed to privateGPT using locally embedded data stored in Qdrant vector database. By integrating GPT and Qdrant, we can localize the solution further and leverage the advanced Generative AI capabilities of privateGPT free of charge!
  7. As data is being analysed by TML and privateGPT, the output of risk level for each host machine can be dashboarded.
  8. The entire solution can be containerized with Docker and scaled with Kubernetes.

This solution discussed above is taught in Seneca Polytechnic. A Youtube video of this solution is here: https://www.youtube.com/watch?v=dVM2yz7wdQA&t=377s

Key Conclusions:

  1. Real-time data is growing in importance and companies are trying to get fast insights from real-time data. The above solution approach integrates and seamlessly connects LLM to real-time data thereby expanding its learning capacity to not just data it has been trained on, but real-world data being generated second by second while keeping costs very low!
  2. The above solution can be run On-Premise or in the Cloud.
  3. Apache kafka is a powerful streaming platform to manage and process unlimited amount of real-time data.
  4. TML enables in-memory, advanced processing and machine learning, at the entity level, to analyse every host machine individually.
  5. Using PrivateGPT and Qdrant, we can use generative AI that is 100% secure, local and Free to provide deeper AI analysis of the network data.
  6. Total Cost of ownership of the above solution is drastically reduced because all data is processed in-memory. Also, privateGPT and Qdrant are run locally (so 100% free and secure). The entire solution can also be run on-premise (no cloud needed).

Users can run the above solution with privateGPT, Qdrant and TML using the following steps for Linux/Amd64 machines (if you have MAC or Linux/Arm64 then change amd64 to arm64 and change PGPTIP="host.docker.internal"):

  1. Pull and Run the Qdrant docker container: docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
  2. Pull and Run the privateGPT container: docker run -d -p 8001:8001 --net=host --env PORT=8001 --env GPU=0 --env COLLECTION=tml --env WEB_CONCURRENCY=2 --env CUDA_VISIBLE_DEVICES=0 maadsdocker/tml-privategpt-no-gpu-amd64
  3. Pull and Run the TML Cybersecurity solution container: docker run -d --env VIPERVIZPORT=8080 --net=host --env RUNTYPE=2 --env PGPTIP="https://127.0.0.1" --env PGPTPORT=8001 --env KAFKAEMBEDDINGSFOLDER=kafkaembeddings --env DOCFOLDER="" --env USEEMBEDDINGS=0 --env DELETEKAFKAEMBEDDINGSHOURS=10 --env PGPTROLLBACK=4 --env BROKERHOSTPORT=127.0.0.1:9092 --env KAFKAPRODUCETOPIC=cisco-network-mainstream --env HACKEDHOSTS=5.100-i,6.18-i,5.18-i --env CLOUDUSERNAME= --env CLOUDPASSWORD= maadsdocker/tml-cisco-network-cyberthreats-privategpt-amd64Once all the containers are running access their dashboards:
  4. Access the Qdrant UI in browser by entering the URL: https://127.0.0.1:6333/dashboard
  5. Access the privateGPT dashboard: https://localhost:8001
  6. Access the TML Cybersecurity dashboard: https://localhost:8080/tml-cisco-network-privategpt-monitor.html?topic=cisco-network-preprocess,cisco-network-privategpt&offset=-1&groupid=&rollbackoffset=150&topictype=prediction&append=0&secure=1

NOTE: The business value of the above solution is not only that it can be run securely and locally with minimal cost, but also using privateGPT and Qdrant we can give users the ability to query local information about the output from TML for additional insights.

NOTE: The business value of the above solution is not only that it can be run securely and locally with minimal cost, but also using privateGPT and Qdrant we can give users the ability to query local information about the output from TML for additional insights.

2024 will be an exciting year with (more) massive growth in the use of AI and machine learning. However, we must keep in mind, that both the good guys AND the bad guys have access to the same technologies. So, the factor is no longer technology know-how, but rather, creative innovations that combine these technologies in particular ways that are both cost effective, and effective in preventing cyber attacks.

2024 will be an exciting year with (more) massive growth in the use of AI and machine learning. However, we must keep in mind, that both the good guys AND the bad guys have access to the same technologies. So, the factor is no longer technology know-how, but rather, creative innovations that combine these technologies in particular ways that are both cost effective, and effective in preventing cyber attacks.

Till next time...



要查看或添加评论,请登录

Sebastian Maurice, Ph.D.的更多文章

社区洞察

其他会员也浏览了