Handling Streaming data with Amazon Kinesis Data Streams

Handling Streaming data with Amazon Kinesis Data Streams

What is streaming data?

Streaming data refers to information that is generated continuously and in real-time, usually in large volumes. It needs to be processed with minimal delay to provide timely insights. This type of data can come from various sources, such as devices, applications, or sensors, and is used to enhance decision-making in your businesses and can significantly improve real-time visibility into your business operations.

Consider a smartwatch you might be wearing throughout the day. It tracks real-time metrics such as footsteps, heart rate, and body temperature. The data is sent instantly to a server that processes it and returns information like your total kilometers walked throughout the day or your total calories burned.

Now that you have a better idea of what streaming data is, here are 3 key characteristics you should be aware of :

- Chronologically significant: Streaming data must be processed in the order it is received, as its value diminishes if delayed. For example, a restaurant recommendation app relies on real-time user location. If location data isn’t processed immediately, it quickly becomes irrelevant.

- Continuously flowing: Unlike traditional data sets, streaming data doesn’t have a defined beginning or end. It’s constantly generated and collected for as long as necessary. Server logs, for instance, accumulate continuously as long as the server operates.

- Unique: Re-processing streaming data is difficult due to its real-time nature and sensitivity. Once it’s missed or delayed, it can’t easily be retrieved for accurate real-time analysis.

How can you process streaming data?

The image below represents the basic components of real-time data streaming.

Streaming data processing Workflow

Real-time data streaming begins with the source, which could be devices or applications generating high-velocity data. This data is ingested from potentially thousands of sources and collected in real-time. Once ingested, the data is stored in sequence, ensuring that it can be replayed or accessed for a specified period. During processing, the data records are analyzed or transformed in the order they were received, supporting real-time analytics or ETL processes. Finally, the processed data can be sent to a destination such as a data lake, data warehouse, or database for further use.

Source devices or applications that generate real-time data are known as producers, and the destination systems that receive this data at the final step are called consumers. The data flowing between them is referred to as a stream.

Data flowing from producers to consumers

Ingesting real-time data in AWS: Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a serverless service designed to handle the real-time ingestion of large volumes of data. You don’t need to worry about managing the infrastructure—it automatically scales based on the amount of data being processed. Amazon Kinesis Data Stream handles the Stream Ingestion part of data streaming processing workflow.?

Key concepts of Amazon Kinesis Data Streams:

- Stream: A stream represents all the incoming data from your sources, which flows in real time. This stream is broken down into shards—the basic units that store and manage data records.

- Shards: A shard is a uniquely identifiable sequence of data records within a stream. It continuously ingests and outputs data, allowing up to 1MB of data per second for writes and 2MB per second for reads. If your data needs exceed these limits, more shards can be added to handle the load.

- Retention: Kinesis Data Streams is designed for temporary data storage, holding data for up to 24 hours by default. This can be extended to 365 days if needed, though it’s primarily meant for immediate, real-time processing.

- Latency: With a latency of 200 milliseconds, Kinesis Data Streams ensures that data is ingested and available to consumers almost immediately.

Streaming data ingestion with Amazon Kinesis Data Streams

One key point : Amazon Kinesis Data Streams only ingest the data, but does not process or store it long term. To process data in real-time, other AWS services like Amazon Kinesis Data Analytics can process data coming from Kinesis Data Streams.

Conclusion

Amazon Kinesis Data Streams is built for businesses that need to process and analyze data as soon as it is generated. By leveraging a serverless architecture that scales automatically, it ensures high-volume, real-time data ingestion and processing with minimal infrastructure management. Whether you’re handling data from wearable devices, sensors, or applications, Kinesis Data Streams allows you to act on insights in real-time.


要查看或添加评论,请登录

Georges Awono的更多文章