Which one to chose between kinesis data stream and firehose?

Which one to chose between kinesis data stream and firehose?

Confused between Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose? Let's clear the dobuts by answering few open questions in the article.

Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose services provided by AWS for processing and analyzing streaming data, but they serve different purposes and have distinct functionalities. Here are the key differences between Kinesis Data Streams and Kinesis Data Firehose:

Amazon Kinesis Data Streams

Purpose: Kinesis Data Streams is designed for real-time data streaming, allowing applications to continuously capture and process gigabytes of data per second from hundreds of thousands of sources.?

@ImageCredit - AWS


Key Features:

- Real-Time Processing: Allows for real-time data processing with low-latency.

- Custom Processing: Users can build custom applications (consumers) using AWS SDKs to process data streams.

- Shard-Based Architecture: Data streams are divided into shards, each providing a fixed write and read throughput.

- Retention Period: Data can be stored in the stream for up to 7 days (24 hours by default), allowing applications to reprocess and analyze data.

- Data Replay: Ability to replay data multiple times during the retention period.

- Scaling: Manually add or remove shards to scale the data stream.

Use Cases:

- Real-time analytics and monitoring

- Log and event data collection

- Real-time data ingestion for machine learning applications

- Real-time dashboards

Amazon Kinesis Data Firehose

Purpose: Kinesis Data Firehose is designed for loading streaming data into data lakes, data stores, and analytics services. It provides a fully managed service for data delivery with automatic scaling and error handling.

@ImageSource-AWS


Key Features:

- Fully Managed Service: Automatically scales to match the throughput of your data and handles all the management tasks.

- Data Transformation: Supports data transformation using AWS Lambda before data is delivered.

- Data Delivery: Delivers data to destinations like Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk.

- Buffering: Buffers incoming data before delivery based on size or time intervals to optimize delivery performance and cost.

- Automatic Retry: Automatically retries failed data delivery.

- Data Format Conversion: Supports conversion from raw data formats to formats like JSON and Parquet.

Use Cases:

- Loading streaming data into Amazon S3 for data lake creation

- Real-time log and event data streaming to Amazon Redshift for analytics

- Stream processing and delivery to Amazon Elasticsearch Service for real-time monitoring

- Integration with third-party services like Splunk for log analysis

Summary of Differences

In summary, if you need real-time data processing with custom logic, Kinesis Data Streams is the better choice. If you want to stream data to other AWS services or third-party services with minimal management overhead, Kinesis Data Firehose is more suitable.

Vikram Chaudhary

Staff Software Engineer : AWS Certified, C#, Java : Tech Blogger, Speaker

9 个月

Nice comparison, easy to grasp.

要查看或添加评论,请登录

Sajid Mohammed的更多文章

社区洞察

其他会员也浏览了