登录查看更多内容

Cost optimization in Kafka data pipeline

Vivek Anandaraman

Help Project Managers Estimate and Track AWS Cost during Build using Jira | Mentor | Speaker

发布日期: 2024年4月10日

Customer has event driven Architecture using kafka data pipelines. The producers and consumers are springboot Java applications.

Consumer A consumes from Topic T1 and then enriches and produces messages to Topic T2. Consumer B consumes messages from T1 and produces to T3. This goes on. Now there are more than one consumer in a Topic, so you can appreciate the kafka Data Flow Diagram, it will look like a node graph.

Considerations

Application is deployed in kubernetes but the k8s native Horizontal Pod Autoscaler feature is not turned on because this had introduced kafka partition rebalancing leading to consumer lags.

How to optimize this workload

On analyzing the consumer lags and message ingestion rates for the entire pipeline we found out that a vast majority of the time there were no messages to process in the topic. But the consumers were always up polling for new messages.

Scale in the consumers to zero when there is no consumer lag and scale out the consumers when there is a consumer message lag, this was implemented using KEDA. The projected savings for the customer is more than 30% of the on-prem hardware.

Create a Scaled Object (sample below)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: kafka-application
  pollingInterval: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: localhost:9092
      consumerGroup: my-group       
      topic: test-topic
      # Optional
      lagThreshold: "50"
      minReplicaCount: "5"
      offsetResetPolicy: latest

要查看或添加评论，请登录

Vivek Anandaraman的更多文章

What is a cost-aware product cycle and why do we need it?

2024年10月9日

What is a cost-aware product cycle and why do we need it?

Bigger context As cloud adoption progresses, addressing costs from both underutilised resources and wasteful resource…

1 条评论
Query finops Azure FOCUS Datasets using duckdb

2024年8月12日

Query finops Azure FOCUS Datasets using duckdb

Now that the #FOCUS datasets are available in #Azure we can gather meaningful Insights using simple queries. Here I…
Stackql - Cloud Governance using SQL

2024年3月23日

Stackql - Cloud Governance using SQL

StackQL Studios will definitely make life simple for Cloud Governance teams. The idea is to replace all api calls to…
How to query Azure Blob using DuckDB

2024年1月20日

How to query Azure Blob using DuckDB

DuckDB is quickly establishing itself as the default query engine for csv file and parquet files, not only in File…
Postgres as Analytics DB for Prometheus data

2024年1月11日

Postgres as Analytics DB for Prometheus data

We are seeing increasing scenarios where Prometheus data is required for Analytical purposes beyond the traditional…

See all articles

Cost optimization in Kafka data pipeline

Vivek Anandaraman

Help Project Managers Estimate and Track AWS Cost during Build using Jira | Mentor | Speaker

Considerations

How to optimize this workload

Create a Scaled Object (sample below)

Vivek Anandaraman的更多文章

社区洞察

其他会员也浏览了

Cluster Architecture in APACHE SPARK

Apache KAFKA Connect 101 - Part (1/2)

ZERO to HERO in 5 minutes in Apache KAFKA

Kafka Streams vs. Apache Flink: Choosing the Right Tool for Stream Processing

Ripping apart the Kafka Broker Architecture !!

Why Apache Beam is the Future of Data Engineering

Apache Pulsar

Apache Kafka in Event Driven Architecture

Dynamic Resource Allocation

Publishing Events - Outbox Pattern

Considerations

How to optimize this workload

Create a Scaled Object (sample below)

Vivek Anandaraman的更多文章

What is a cost-aware product cycle and why do we need it?

Query finops Azure FOCUS Datasets using duckdb

Stackql - Cloud Governance using SQL

How to query Azure Blob using DuckDB

Postgres as Analytics DB for Prometheus data

社区洞察

其他会员也浏览了

Cluster Architecture in APACHE SPARK

Apache KAFKA Connect 101 - Part (1/2)

ZERO to HERO in 5 minutes in Apache KAFKA

Kafka Streams vs. Apache Flink: Choosing the Right Tool for Stream Processing

Ripping apart the Kafka Broker Architecture !!

Why Apache Beam is the Future of Data Engineering

Apache Pulsar

Apache Kafka in Event Driven Architecture

Dynamic Resource Allocation

Publishing Events - Outbox Pattern