登录查看更多内容

Unleashing the Power of IoT Sensor Data with Kafka, InfluxDB, and Grafana

Amit Gawate

Engineering leader | Building scalable systems in Cloud | Atlassian | Autodesk

发布日期: 2024年6月8日

In today's connected world, the Internet of Things (IoT) is generating massive amounts of data, providing invaluable insights into various applications ranging from smart cities to industrial automation. To harness the potential of this data, it's crucial to have a robust and scalable architecture that can handle high volumes of streaming data and provide real-time analytics. In this blog, we will explore how to build an end-to-end solution for streaming IoT sensor data using Apache Kafka, InfluxDB, and Grafana, achieving unlimited scaling and real-time visualization.

The Architecture

Let's start by understanding the architecture that powers this solution. The diagram below provides a high-level overview of the system:

Visualize the data in InfluxDB

Here's the link to the influxDB timeseries database that show data being updated in real time.

https://35.170.56.172:8086/orgs/94808787ae90dbae/data-explorer?bucket=my-bucket

username : my-user

password : my-password

The Code

The terraform code along with producer, consumer and raspberry pi client code can be found in this repo. https://github.com/amitgawate/IOT_Kafka

This architecture is designed to ensure seamless data ingestion, processing, and visualization, providing a scalable and reliable solution for IoT data streams.

领英推荐

2 high performing data architectures for IoT

Naveen Joshi 7 年前

Kafka in IoT and Edge Computing for Data…

Brindha Jeyaraman 4 个月前

Data Engineering for IoT and Edge Computing: Building…

Round The Clock Technologies (RTCTek) 6 个月前

Client Devices: These are the IoT sensors and devices that generate data. They could be anything from temperature sensors in a factory to traffic cameras in a smart city. Here we have used DHT11 sensor that sends temperature and humidity values from inside my office room.
AWS Gateway: The AWS Gateway serves as the entry point for all incoming data. It handles the requests from client devices, ensuring secure and reliable data transmission. We have used an API that will receive all the raspberry client requests that contain the sensor data and adds it to the Kafka producer. This is a low latency operation as Kafka records this data with high throughput. For this example we have used kafka.t3.small instance for demo purpose. In production, we may need to choose a larger instance.
AWS MSK (Managed Streaming Kafka Service): We use MSK as it is easy to setup and deploy within a VPC and private subnets using terraform.
AWS Lambda : For the producers and consumers to be scalable, we have used AWS Lambda to receive the sensor. The producer receives the sensor data and add it to MSK on the home-sensors topic. Topic creation for MSK needs to be from within the VPC and so we have another lamda function just for creating kafka topics.
Cloudwatch Events: We use cloudwatch event triggers as a source of trigger for Kafka consumer with a 5 min interval so that the Kafka consumer can start polling from MSK every 5 mins and check if there is unprocessed data available in streams and process it.
InfluxDB: For storing time series data, we use InfluxDB inside a docker app hosted within an EC2 instance. The time series data can be visualized with InfluxDB as well though we are not able to create dashboards and alerting in Graphana
Grafana: We use another EC2 instance to host Grafana docker app to read from the InfluxDB timeseries database and query the results and add alerts (eg. if temperature drops below certain threshold, alert the customer)
CloudWatch Logs: AWS CloudWatch is used for monitoring and logging. It collects and tracks metrics, logs, and events, providing insights into the application’s performance and operational health.

Data Ingestion with Kafka

Apache Kafka is at the heart of this architecture. Kafka's distributed nature and high throughput make it ideal for handling large volumes of IoT data. Here's how Kafka fits into the architecture:

Producers: The IoT devices act as Kafka producers, sending data to Kafka topics.
Brokers: Kafka brokers handle the data streams and ensure they are stored reliably.
Consumers: Different components of the architecture, such as data processing applications and storage systems, act as Kafka consumers.

Time-Series Data Storage with InfluxDB

InfluxDB is a high-performance time-series database that is optimized for fast, high-availability storage and retrieval of time-series data. It is well-suited for IoT data, which is inherently time-series in nature. In this architecture, InfluxDB stores the processed data from Kafka, making it available for real-time analytics.

Real-Time Visualization with Grafana

Grafana is an open-source platform for monitoring and observability. It provides powerful visualization capabilities that allow users to create dynamic dashboards and graphs. With Grafana, you can visualize the data stored in InfluxDB, providing real-time insights into the IoT data streams.

Unlimited Scaling

The combination of Kafka, InfluxDB, and Grafana, deployed on AWS, provides a highly scalable solution. Kafka's distributed architecture allows it to handle increasing amounts of data by simply adding more brokers. InfluxDB's high-performance design ensures that it can store and retrieve large volumes of time-series data efficiently. AWS's scalable infrastructure ensures that the entire system can grow seamlessly with the increasing data load.

Conclusion

By integrating Kafka, InfluxDB, and Grafana into a unified architecture, we can efficiently stream, store, and visualize high volumes of IoT sensor data. This solution not only provides real-time insights but also ensures unlimited scalability, making it ideal for various IoT applications. Whether you are dealing with industrial sensors or smart city infrastructure, this architecture will help you harness the full potential of your IoT data.

Feel free to reach out if you have any questions or need further assistance in setting up your IoT data streaming and analytics infrastructure. Happy data streaming!

Suchit Mate

9 个月

Amit Gawate Thank you for sharing.

1 次回应

要查看或添加评论，请登录

Amit Gawate的更多文章

Deploying a ML Service on AWS ECS using Terraform

2024年5月13日

Deploying a ML Service on AWS ECS using Terraform

In the ever-evolving landscape of technology, deploying machine learning services efficiently and robustly is a common…

4 条评论
Training a simple Neural Network to Predict the Parity of the Sum of Two Numbers

2024年4月11日

Training a simple Neural Network to Predict the Parity of the Sum of Two Numbers

Introduction Recently, I had an opportunity to work on training ML models using Pytorch and Keras. So I decided to…

2 条评论

Unleashing the Power of IoT Sensor Data with Kafka, InfluxDB, and Grafana

Amit Gawate

Engineering leader | Building scalable systems in Cloud | Atlassian | Autodesk

The Architecture

Visualize the data in InfluxDB

The Code

领英推荐

Data Ingestion with Kafka

Time-Series Data Storage with InfluxDB

Real-Time Visualization with Grafana

Unlimited Scaling

Conclusion

Amit Gawate的更多文章

社区洞察

其他会员也浏览了

?? Unlocking IoT and Edge Computing Potential with Couchbase

Apache Kafka and IoT: How Kafka Revolutionises Data Streams from Smart Devices

Streamlining IoT Architectures: Directly Routing from Azure IoT Hub to CosmosDB without using Azure Stream Analytics or Azure Functions

How to select the data backend for your IoT projects on Azure

An overview of IIOT Edge Software Stack

Why is MQTT better than Apache Kafka for IoT

Influence of Big Data and Internet of Things

Real-Time Data Processing Platforms in IoT: Empowering Smart and Responsive Systems

IoT Big Data Analytics

Fog Computing and Its relevance to the IoT

The Architecture

Visualize the data in InfluxDB

The Code

领英推荐

Data Ingestion with Kafka

Time-Series Data Storage with InfluxDB

Real-Time Visualization with Grafana

Unlimited Scaling

Conclusion

Amit Gawate的更多文章

Deploying a ML Service on AWS ECS using Terraform

Training a simple Neural Network to Predict the Parity of the Sum of Two Numbers

社区洞察

其他会员也浏览了

?? Unlocking IoT and Edge Computing Potential with Couchbase

Apache Kafka and IoT: How Kafka Revolutionises Data Streams from Smart Devices

Streamlining IoT Architectures: Directly Routing from Azure IoT Hub to CosmosDB without using Azure Stream Analytics or Azure Functions

How to select the data backend for your IoT projects on Azure

An overview of IIOT Edge Software Stack

Why is MQTT better than Apache Kafka for IoT

Influence of Big Data and Internet of Things

Real-Time Data Processing Platforms in IoT: Empowering Smart and Responsive Systems

IoT Big Data Analytics

Fog Computing and Its relevance to the IoT