登录查看更多内容

Amazon Kinesis - Overview

Huy Nguyen

?Backend Developer | Software Engineer ?

发布日期: 2024年11月4日

Overview of Amazon Kinesis

Amazon Kinesis is designed to make it simple to collect, process, and analyze streaming data in real time. This includes any data generated at high speed, such as:

Application logs
Metrics
Website clickstreams
IoT telemetry data

If data is produced continuously and needs to be analyzed as it arrives, it qualifies as real-time streaming data, making Kinesis an ideal solution.

Core Components in Kinesis

Amazon Kinesis is comprised of four services, each designed for specific streaming data needs:

Kinesis Data Streams: Captures, processes, and stores data streams. Allows real-time data streaming with reliable throughput and data persistence.
Kinesis Data Firehose: Loads data streams into AWS storage and analytics services, or even external systems. Simplifies the process of delivering streaming data to destinations like Amazon S3, Redshift, Elasticsearch, or third-party solutions.
Kinesis Data Analytics: Analyzes data streams using SQL or Apache Flink. Enables real-time data analysis for monitoring, reporting, and quick insights, using familiar languages and tools.
Kinesis Video Streams: Captures, processes, and stores video streams.

Kinesis Data Stream

Overview of Kinesis Data Streams

Kinesis Data Streams is designed for real-time, big data streaming within AWS. It enables continuous data streaming across various sources, providing a flexible and scalable way to process data in near real-time.

Core Components

Shards

A Kinesis Data Stream comprises multiple shards, each identified by a unique number (e.g., Shard 1, Shard 2, etc.).
When creating a Kinesis Data Stream, we specify the number of shards, which determines our stream's capacity in terms of ingestion and consumption rates.
Shards can be scaled up or down based on demand.

Producers

Producers are responsible for sending data to Kinesis Data Streams.
Examples of producers include applications, desktop and mobile clients, and AWS services like the Kinesis Producer Library (KPL) and Kinesis Agent, which stream application logs to Kinesis Data Streams.
Each record produced consists of: Partition Key: Used to determine which shard the record will be stored in. Data Blob: The actual data, up to 1 MB in size.
Producers can send data at a rate of 1 MB/sec or 1,000 messages per second per shard. Therefore, if we have 6 shards, our stream's capacity is 6 MB/sec or 6,000 messages per second.

Consumers

Consumers retrieve data from Kinesis Data Streams which can take many forms, including Applications using the AWS SDK or Kinesis Client Library (KCL). AWS Lambda functions for serverless processing. Kinesis Data Firehose or Kinesis Data Analytics.
When a consumer reads a record, it receives the partition key, sequence number (indicating its position in the shard), and data blob.
Enhanced Fan-Out mode allows each consumer to have a throughput of 2 MB/sec per shard.

Data Retention and Immutability

Kinesis Data Streams retain data for a specified period, which can be set from 1 day up to 365 days.
Data in Kinesis is immutable, meaning it cannot be deleted once it is inserted, allowing for reprocessing and replay of data when needed.

Capacity Modes

Provisioned Mode: In this mode, we manually set the number of shards. Each shard provides an ingestion rate of 1 MB/sec or 1,000 records per second, and an out-throughput rate of 2 MB/sec. We pay per shard, so careful planning is required.
On-Demand Mode: Ideal for unpredictable or spiky workloads, as it scales capacity automatically based on usage. The default capacity is 4 MB/sec or 4,000 records per second, with automatic scaling based on peak throughput over the last 30 days. Pricing is based on data stream usage, including per hour and per GB I/O.

Security in Kinesis Data Streams

IAM Policies: Control access to Kinesis shards for both producer and consumer.
Encryption: Data is encrypted in transit (HTTPS) and at rest (KMS). Client-side encryption is also available for additional security but requires custom implementation for encryption and decryption.
VPC Endpoints: Allow private access to Kinesis from within a VPC, bypassing the public internet.
Monitoring: All API calls are logged and can be monitored via AWS CloudTrail.

Overview of Kinesis Data Firehose

Kinesis Data Firehose is a helpful tool for ingesting data from multiple producers and delivering it to various destinations. Producers can include:

领英推荐

Insights From Amazon: Making Digital Transformation…

Bernard Marr 4 年前

Insights From Amazon: Making Digital Transformation…

Bernard Marr 4 年前

Amazon Kinesis vs DynamoDB Streams

Neal K. Davis 3 年前

Applications, clients, SDKs, and Kinesis agents
Kinesis Data Streams, Amazon CloudWatch Logs, and Events

Once data enters Kinesis Data Firehose, it can optionally be transformed using a Lambda function. After optional transformation, the data is written in batches to specified destinations, with no additional coding required for the writing process.

Destination Types

Kinesis Data Firehose supports multiple types of destinations:

AWS Destinations

Amazon S3: Stores data directly.
Amazon Redshift: Uses Amazon S3 as an intermediary before issuing a COPY command to transfer data from S3 to Redshift.
Amazon OpenSearch: Allows for analytics and search capabilities.

Third-Party Partner Destinations

Firehose can deliver data to third-party services such as Datadog, Splunk, New Relic, MongoDB, and others.

Custom Destinations

Custom HTTP endpoints can be used for specific use cases, enabling data delivery to our own applications via APIs.

In addition to the primary destinations, Firehose offers options to:

Backup all data to an S3 bucket
Backup only failed data to a separate S3 bucket if there are issues writing to the primary destination

Key Features of Kinesis Data Firehose

Fully Managed and Serverless: Requires no server management or manual scaling. Firehose automatically scales to accommodate the incoming data.
Cost Efficiency: Charges only for the data volume processed.
Near Real-Time Delivery: Data is delivered in batches, making it “near real-time.” Buffer intervals range from 0 to 900 seconds, and buffer sizes start at a minimum of 1 MB. Even with a 0-second buffer, slight delays (a few seconds) classify Firehose as near real-time.

Data Formats and Transformations

Kinesis Data Firehose supports multiple data formats, compressions, and conversions. We can also use AWS Lambda for custom transformations.

Backup and Recovery

Firehose provides the option to back up all data or only failed data into S3, ensuring a reliable data recovery option.

Kinesis Data Streams vs. Kinesis Data Firehose Comparison

Here’s a quick comparison to clarify when to use each:

Kinesis Data Streams: For high-scale data ingestion with custom coding for producers and consumers. Supports real-time processing (latency of around 70-200 ms). Requires manual scaling and management of shards (e.g., shard splitting and merging). Allows multiple consumers, with storage duration between 1 and 365 days, and supports data replay.
Kinesis Data Firehose: For automated data delivery to AWS services (like S3, Redshift, OpenSearch), third-party applications, or custom HTTP endpoints. Fully managed with automated scaling and is near real-time. No data storage or replay capability, meaning once data is delivered, it cannot be accessed for reprocessing. Ideal when we want ease of use without worrying about infrastructure or scaling.

D??ng Xuan ?à

??Java Software Engineer | Oracle Certified Professional

4 个月

Interesting

1 次回应

要查看或添加评论，请登录

Huy Nguyen的更多文章

Chia s? v? "lên may" trong m?t ngày may m?a

2025年2月23日

Chia s? v? "lên may" trong m?t ngày may m?a

Trong m?t ngày may m?a, anh em Wecommit Thanh Xuan chúng mình ng?i xu?ng và cùng chia s? v? Cloud, Amazon Web Services…

10 条评论
Dev quèn h?c cách dùng index trong DB

2025年2月9日

Dev quèn h?c cách dùng index trong DB

T?i ?ang s? d?ng Oracle DB cùng v?i t?p data c?a StackOverflow. B?ng POSTS ???c dùng n?ng kho?ng 27.
Nhìn l?i hành trình ch?p ch?ng b??c vào ??i

2025年1月1日

Nhìn l?i hành trình ch?p ch?ng b??c vào ??i

Nhan ti?n n?m 2024 v?a khép l?i, th?y nh?ng ng??i anh em Wecommit ??u ??ng bài review l?i n?m 2024 xem b?n than có…

13 条评论
#13 LeetCodeDaily - Comeback sau chu?i ngày drop series ??

2024年11月19日

#13 LeetCodeDaily - Comeback sau chu?i ngày drop series ??

?? bài h?m nay t?i th?y khá d? th?, kh?ng nh? m?y h?m tr??c ?? Ok kh?ng lòng vòng n?a, vào vi?c th?i. ?? bài…

4 条评论
#10 LeetCodeDaily - Bài này khó quá nh?ng t?i ?? làm ???c!

2024年11月12日

#10 LeetCodeDaily - Bài này khó quá nh?ng t?i ?? làm ???c!

Sau chu?i bài bit manipulation là nh?ng bài d?ng s? và m?ng khó nh?n, ch? kh?ng có d? th? tí nào. ?? bài 2070.

2 条评论
#9 LeetCode Daily - M?t kho?ng ngh? nho nh? sau lo?t bit manipulation problem khó th?

2024年11月11日

#9 LeetCode Daily - M?t kho?ng ngh? nho nh? sau lo?t bit manipulation problem khó th?

Sau m?t chu?i vài ngày daily problem trên leetcode là nh?ng bài bit manipulation khó th?, cu?i cùng thì ch? ?? c?ng ??…
Làm vi?c sau ch? ??ng làm con "sau" trong c?ng vi?c!

2024年11月9日

Làm vi?c sau ch? ??ng làm con "sau" trong c?ng vi?c!

??nh ngh?a Làm vi?c sau: Là ho?t ??ng chuyên m?n ? c?nh gi?i cao nh?t, là tr?ng thái t?p trung, kh?ng b? sao l?ng, làm…
#6 Leetcode daily - gà s?p hoá kh?ng long

2024年11月8日

#6 Leetcode daily - gà s?p hoá kh?ng long

?? bài 1829. Maximum XOR for Each Query Tóm t?t: cho m?ng nums g?m các s? nguyên kh?ng am, ??a ra m?ng answer v?i các…

8 条评论
#5 LeetCode Daily - Hành trình t? con gà tr? thành chi?n th?n thu?t toán

2024年11月7日

#5 LeetCode Daily - Hành trình t? con gà tr? thành chi?n th?n thu?t toán

Problem h?m nay https://leetcode.com/problems/largest-combination-with-bitwise-and-greater-than-zero Tóm t?t s??ng…

4 条评论
#3 LeetCode Daily t? c?c gà

2024年11月5日

#3 LeetCode Daily t? c?c gà

?? bài Tr? v? s? l?n thay ??i t?i thi?u ?? làm ??p cho chu?i s V?y nh? th? nào là ??p? ??p t?c là có th? chia nh? chu?i…

3 条评论

See all articles

Amazon Kinesis - Overview

Huy Nguyen

?Backend Developer | Software Engineer ?

Overview of Amazon Kinesis

Core Components in Kinesis

Kinesis Data Stream

Overview of Kinesis Data Streams

Core Components

Data Retention and Immutability

Capacity Modes

Security in Kinesis Data Streams

Overview of Kinesis Data Firehose

领英推荐

Destination Types

Key Features of Kinesis Data Firehose

Data Formats and Transformations

Backup and Recovery

Kinesis Data Streams vs. Kinesis Data Firehose Comparison

Huy Nguyen的更多文章

社区洞察

其他会员也浏览了

AWS re:Invent 2024 | 7 takeaways after drinking from the firehose

Cloud Pricing No Longer Works in an AI World

Deploying Scalable AI Models on AWS Elastic Inference

LangChain on AWS: Develop the Future of AI in the Cloud

The Amazon AWS GenAI Strategy Comes with a Big Q

AI DevSummit Goodie ?? Free $1000 Cloudchipr Credit ?? Google AI Data Center Expansion ??

Re:Invent 2023: AWS Redefines the Future with Game-Changing Innovations

How AWS EC2 needful for AI ML and high performance computing applications with powerful GPUs

Google Cloud vs. Competitors: Which AI/ML Certification Is Right for You?

AWS is all in on Gen AI

Overview of Amazon Kinesis

Core Components in Kinesis

Kinesis Data Stream

Overview of Kinesis Data Streams

Core Components

Data Retention and Immutability

Capacity Modes

Security in Kinesis Data Streams

Overview of Kinesis Data Firehose

领英推荐

Destination Types

Key Features of Kinesis Data Firehose

Data Formats and Transformations

Backup and Recovery

Kinesis Data Streams vs. Kinesis Data Firehose Comparison

Huy Nguyen的更多文章

Chia s? v? "lên may" trong m?t ngày may m?a

Dev quèn h?c cách dùng index trong DB

Nhìn l?i hành trình ch?p ch?ng b??c vào ??i

#13 LeetCodeDaily - Comeback sau chu?i ngày drop series ??

#10 LeetCodeDaily - Bài này khó quá nh?ng t?i ?? làm ???c!

#9 LeetCode Daily - M?t kho?ng ngh? nho nh? sau lo?t bit manipulation problem khó th?

Làm vi?c sau ch? ??ng làm con "sau" trong c?ng vi?c!

#6 Leetcode daily - gà s?p hoá kh?ng long

#5 LeetCode Daily - Hành trình t? con gà tr? thành chi?n th?n thu?t toán

#3 LeetCode Daily t? c?c gà

社区洞察

其他会员也浏览了

AWS re:Invent 2024 | 7 takeaways after drinking from the firehose

Cloud Pricing No Longer Works in an AI World

Deploying Scalable AI Models on AWS Elastic Inference

LangChain on AWS: Develop the Future of AI in the Cloud

The Amazon AWS GenAI Strategy Comes with a Big Q

AI DevSummit Goodie ?? Free $1000 Cloudchipr Credit ?? Google AI Data Center Expansion ??

Re:Invent 2023: AWS Redefines the Future with Game-Changing Innovations

How AWS EC2 needful for AI ML and high performance computing applications with powerful GPUs

Google Cloud vs. Competitors: Which AI/ML Certification Is Right for You?

AWS is all in on Gen AI