Which one to chose between kinesis data stream and firehose?
Sajid Mohammed
EX-Lead Architect - Deloitte Consulting | 10x AWS Certified | AWS Cloud Architecture & Sol Design | Technology Strategy & Transforamtion | Amazon Connect | AWS Authorized Instructor | Cloud Security | DevOps | FinOps
Confused between Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose? Let's clear the dobuts by answering few open questions in the article.
Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose services provided by AWS for processing and analyzing streaming data, but they serve different purposes and have distinct functionalities. Here are the key differences between Kinesis Data Streams and Kinesis Data Firehose:
Amazon Kinesis Data Streams
Purpose: Kinesis Data Streams is designed for real-time data streaming, allowing applications to continuously capture and process gigabytes of data per second from hundreds of thousands of sources.?
Key Features:
- Real-Time Processing: Allows for real-time data processing with low-latency.
- Custom Processing: Users can build custom applications (consumers) using AWS SDKs to process data streams.
- Shard-Based Architecture: Data streams are divided into shards, each providing a fixed write and read throughput.
- Retention Period: Data can be stored in the stream for up to 7 days (24 hours by default), allowing applications to reprocess and analyze data.
- Data Replay: Ability to replay data multiple times during the retention period.
- Scaling: Manually add or remove shards to scale the data stream.
Use Cases:
- Real-time analytics and monitoring
- Log and event data collection
- Real-time data ingestion for machine learning applications
- Real-time dashboards
领英推荐
Amazon Kinesis Data Firehose
Purpose: Kinesis Data Firehose is designed for loading streaming data into data lakes, data stores, and analytics services. It provides a fully managed service for data delivery with automatic scaling and error handling.
Key Features:
- Fully Managed Service: Automatically scales to match the throughput of your data and handles all the management tasks.
- Data Transformation: Supports data transformation using AWS Lambda before data is delivered.
- Data Delivery: Delivers data to destinations like Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk.
- Buffering: Buffers incoming data before delivery based on size or time intervals to optimize delivery performance and cost.
- Automatic Retry: Automatically retries failed data delivery.
- Data Format Conversion: Supports conversion from raw data formats to formats like JSON and Parquet.
Use Cases:
- Loading streaming data into Amazon S3 for data lake creation
- Real-time log and event data streaming to Amazon Redshift for analytics
- Stream processing and delivery to Amazon Elasticsearch Service for real-time monitoring
- Integration with third-party services like Splunk for log analysis
Summary of Differences
In summary, if you need real-time data processing with custom logic, Kinesis Data Streams is the better choice. If you want to stream data to other AWS services or third-party services with minimal management overhead, Kinesis Data Firehose is more suitable.
Staff Software Engineer : AWS Certified, C#, Java : Tech Blogger, Speaker
9 个月Nice comparison, easy to grasp.