Cost optimization in Kafka data pipeline
https://keda.sh/

Cost optimization in Kafka data pipeline

Customer has event driven Architecture using kafka data pipelines. The producers and consumers are springboot Java applications.

Consumer A consumes from Topic T1 and then enriches and produces messages to Topic T2. Consumer B consumes messages from T1 and produces to T3. This goes on. Now there are more than one consumer in a Topic, so you can appreciate the kafka Data Flow Diagram, it will look like a node graph.

Considerations

Application is deployed in kubernetes but the k8s native Horizontal Pod Autoscaler feature is not turned on because this had introduced kafka partition rebalancing leading to consumer lags.

How to optimize this workload

On analyzing the consumer lags and message ingestion rates for the entire pipeline we found out that a vast majority of the time there were no messages to process in the topic. But the consumers were always up polling for new messages.

Scale in the consumers to zero when there is no consumer lag and scale out the consumers when there is a consumer message lag, this was implemented using KEDA. The projected savings for the customer is more than 30% of the on-prem hardware.

Create a Scaled Object (sample below)

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: kafka-application
  pollingInterval: 30
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: localhost:9092
      consumerGroup: my-group       
      topic: test-topic
      # Optional
      lagThreshold: "50"
      minReplicaCount: "5"
      offsetResetPolicy: latest        

要查看或添加评论,请登录

Vivek Anandaraman的更多文章

  • What is a cost-aware product cycle and why do we need it?

    What is a cost-aware product cycle and why do we need it?

    Bigger context As cloud adoption progresses, addressing costs from both underutilised resources and wasteful resource…

    1 条评论
  • Query finops Azure FOCUS Datasets using duckdb

    Query finops Azure FOCUS Datasets using duckdb

    Now that the #FOCUS datasets are available in #Azure we can gather meaningful Insights using simple queries. Here I…

  • Stackql - Cloud Governance using SQL

    Stackql - Cloud Governance using SQL

    StackQL Studios will definitely make life simple for Cloud Governance teams. The idea is to replace all api calls to…

  • How to query Azure Blob using DuckDB

    How to query Azure Blob using DuckDB

    DuckDB is quickly establishing itself as the default query engine for csv file and parquet files, not only in File…

  • Postgres as Analytics DB for Prometheus data

    Postgres as Analytics DB for Prometheus data

    We are seeing increasing scenarios where Prometheus data is required for Analytical purposes beyond the traditional…

社区洞察

其他会员也浏览了