Understanding Kinesis Data Streams Consumers: Classic vs. Enhanced Fan-Out

Understanding Kinesis Data Streams Consumers: Classic vs. Enhanced Fan-Out

Imagine you’re building a real-time analytics platform that ingests data from thousands of IoT devices, processes it, and then triggers actions like sending alerts or storing results in a database. To manage this massive flow of data, you choose AWS Kinesis Data Streams for its scalability and ability to handle real-time streaming data. Now, you need to decide how your consumers will process this data efficiently. Should you use classic fan-out or enhanced fan-out? What about integrating Lambda for serverless data processing?

In this article, we’ll explore the various consumer options for Kinesis Data Streams, compare classic and enhanced fan-out, and dive into use cases that demonstrate how these consumption models work.


What Are Kinesis Data Streams Consumers?

In AWS Kinesis, consumers are the applications or services that process data from the stream. Once data is ingested by a Kinesis stream, consumers read and process the records based on business requirements. Common types of consumers include:

  • AWS Lambda functions, which process data serverlessly.
  • Kinesis Data Firehose, which delivers data to storage services like Amazon S3, Redshift, or Elasticsearch.
  • Kinesis Data Analytics, which performs real-time analytics on data streams.
  • Custom consumers, which use the AWS SDK or the Kinesis Client Library (KCL) for more complex processing needs.

Example Use Case: Real-Time Stock Market Data

Imagine a stock trading platform that needs to process and analyze stock price changes in real-time. Data from multiple stock exchanges is ingested into a Kinesis Data Stream. Several consumers, such as Lambda functions, an analytics platform, and a data archiving service, need to access and process this stream of stock prices simultaneously. How do you ensure all consumers get the data they need without bottlenecks or delays? That’s where the choice of fan-out method comes in.


Classic vs. Enhanced Fan-Out: What’s the Difference?

1. Classic (Shared Throughput) Consumer

In the classic fan-out mode, multiple consumers share the available throughput of a Kinesis shard. Each shard in a stream can provide up to 2 MB per second of data throughput, which is shared among all consumers.

  • Pull Model: Classic consumers use the GetRecords API to pull data from the stream. This means the consumers actively fetch data from the stream at a rate of up to 2 MB per second, per shard—shared across all consumers.
  • Limited Throughput: If you have multiple consumers, they split the available bandwidth. For example, if you have three consumers and each shard provides 2 MB per second, then each consumer will get approximately 666 KB per second.
  • Latency: The latency of this approach is around 200 milliseconds, as consumers pull data in batches.
  • Low Cost: Classic fan-out is cost-effective, especially for applications with a limited number of consumers and moderate throughput requirements.

2. Enhanced Fan-Out Consumer

In enhanced fan-out, each consumer gets dedicated throughput from the Kinesis stream. Rather than sharing the available throughput, every consumer receives 2 MB per second per shard, independently.

  • Push Model: Enhanced consumers use the SubscribeToShard API, where data is pushed directly to the consumer, providing a more efficient streaming experience.
  • Higher Throughput: Each consumer gets a full 2 MB per second per shard. This means if you have three consumers, you effectively have 6 MB per second of throughput for that shard.
  • Low Latency: Enhanced fan-out has much lower latency—around 70 milliseconds—because the data is pushed to the consumers.
  • Higher Cost: Enhanced fan-out comes with an additional cost compared to classic fan-out, but it’s ideal for use cases with multiple consumers that need to process data in near-real-time.


Shared and enhanced Kinesis Fan-out diagram

Summary of Differences:

Feature Classic Fan-Out Enhanced Fan-Out Throughput 2 MB/s per shard (shared) 2 MB/s per consumer, per shard API Used GetRecords (Pull) SubscribeToShard (Push) Latency ~200 ms ~70 ms Cost Lower Higher Ideal Use Case Few consumers, moderate throughput Multiple consumers, low-latency, high-throughput


Using Lambda as a Kinesis Consumer

AWS Lambda is a popular choice for processing Kinesis streams in a serverless architecture. Lambda can consume records from a Kinesis stream in either classic or enhanced fan-out mode.

How Lambda Works with Kinesis

  • Lambda reads records from Kinesis in batches. You can configure the batch size and batch window to control how many records are processed at once.
  • Lambda automatically scales based on the number of shards in your stream. For example, if you have a stream with 5 shards, Lambda will automatically spawn 5 parallel Lambda instances to process the data from each shard.
  • In the event of an error, Lambda retries processing until the record succeeds or the data expires from the Kinesis stream.

Example: Processing E-Commerce Orders with Lambda

Imagine you have an e-commerce platform where every order placed is sent to a Kinesis Data Stream. You can use Lambda to process these orders in real-time and store the data in DynamoDB for later analysis.

  1. Ingestion: Customer order data is ingested into a Kinesis Data Stream with 5 shards.
  2. Processing with Lambda: Lambda functions are triggered for each shard to process the order records, calculate totals, and update the DynamoDB database.
  3. Serverless Scalability: As the number of orders increases, Kinesis automatically scales the shards and Lambda processes the data in parallel, ensuring high performance.

With Lambda, you don’t need to manage servers or worry about scaling the infrastructure. It’s an ideal choice for event-driven, real-time processing tasks.


Key Considerations When Choosing a Consumer Model

When deciding whether to use classic or enhanced fan-out for your Kinesis consumers, consider the following:

1. Number of Consumers

If you have only a few consumers, classic fan-out is likely sufficient and will help reduce costs. However, if you need to serve multiple applications with real-time data, enhanced fan-out will provide the dedicated throughput each consumer requires.

2. Throughput Requirements

If your use case involves high throughput—for example, processing large amounts of IoT telemetry or financial data streams—enhanced fan-out is better suited, as it allows each consumer to receive full bandwidth from the stream.

3. Latency Sensitivity

For applications where latency is critical (e.g., fraud detection systems or live analytics), enhanced fan-out is the best option due to its low-latency, push-based model.

4. Cost Considerations

While enhanced fan-out offers significant performance benefits, it comes with a higher cost. For applications with budget constraints, the classic fan-out mode may be a better fit, especially when throughput and latency are less of a concern.


Conclusion: Choosing the Right Consumer for Kinesis Data Streams

AWS Kinesis Data Streams provides a flexible and scalable way to manage real-time data streams, but selecting the right consumer model is crucial for optimizing performance and cost.

  • Classic fan-out is ideal for applications with fewer consumers and moderate throughput requirements, providing a cost-effective solution for real-time data processing.
  • Enhanced fan-out shines when you have multiple consumers, high throughput needs, or low-latency requirements, giving each consumer its own dedicated pipeline to the data stream.

Whether you’re building a real-time analytics system, an IoT data processor, or an event-driven serverless architecture, understanding these consumption models will help you design a more efficient and scalable system.


Sources:

Choosing the right consumer strategy for your AWS Kinesis streams is critical to building a high-performance, scalable, and cost-efficient real-time data processing system. Whether it's Lambda, Kinesis Data Firehose, or custom consumers, Kinesis has you covered for any real-time data challenge.

要查看或添加评论,请登录