登录查看更多内容

Apache Kakfa with Change data capture events

Kedar Bhumkar

Salesforce,Java and MuleSoft Architect, Dreamforce speaker

发布日期: 2019年4月28日

The recent release of the Change data capture (CDC) events capability within Salesforce has provided a much needed functionality for ensuring data synchronization changes are propagated in a more deterministic and scalable manner.

The CDC capability leverages streaming API's to send data updates occurring on Salesforce standard and custom objects to interested subscribers via events and channels.

Typical usecases for this includes full replication of Salesforce data as well as incremental and partial updates sent towards external systems. This is generally needed for external reporting, archiving, compliance reasons as well as transforming and storing data in a canonical form among other applications. The API limits for CDC are different then then ones used for SOAP/REST limits and hence offer more flexibility.

Apache Kafka is the defacto industry standard for large scale event processing applications. It was developed at LinkedIn before becoming an Apache project. It is primarily targeted towards enterprise class applications that need to process enormous amounts of data with high throughput and horizontal scalability. Apache Kafka is generally deployed in a cluster and supports in-built replication and fault tolerance.

As depicted above we can now subscribe to CDC events via Kafka topics and further process them in southbound systems like Hadoop. Apache kafka stores every event in the log file for upto 7 days. New events are appended to the front of the log and their position in the log is defined by an Id field called as the replayId. As the name implies it allows reading of data from an arbitrary point of time which allows recreating the entire state of the data on Salesforce from scratch if desired .

Heroku supports Apache Kafka natively and more information can be found here.

要查看或添加评论，请登录

Kedar Bhumkar的更多文章

Part 2 - Training data for LLM fine tuning

2023年12月18日

Part 2 - Training data for LLM fine tuning

In part 1 of this series, a general LLM architecture for APIs and inference overview was presented. In this article, we…

1 条评论
Part 1 - Generalized Architecture for LLM API's in Client Applications

2023年12月12日

Part 1 - Generalized Architecture for LLM API's in Client Applications

The below diagram presents a generalized inference architecture designed for the integration of Language Model (LLM)…
Github copilot runs on LWC

2021年12月20日

Github copilot runs on LWC

Recently, I got access to the GitHub copilot Beta program. GitHub copilot is the GPT3 based AI pair programming tool…

2 条评论
Performance tracking using the chrome network tab

2021年5月10日

Performance tracking using the chrome network tab

The Chrome network tab is a powerful arsenal for performance tracking any web application including Salesforce. In this…
Leveraging Kafka for ETL operations within Salesforce

2021年4月9日

Leveraging Kafka for ETL operations within Salesforce

Traditional ETL is slowly getting replaced within the enterprise landscape with real time event based ETL. The classic…

1 条评论
Active and passive API callout strategies

2020年12月30日

Active and passive API callout strategies

API failure management is critical and needs to be designed properly to ensure the API calls are successfully made as…
Using Platform events to overcome async operation limits

2020年12月22日

Using Platform events to overcome async operation limits

Platform events (PE) are great for what they are designed for; which is event based loose coupling and processing…

1 条评论
Using AWS for ML insights inside Salesforce

2020年11月28日

Using AWS for ML insights inside Salesforce

Salesforce Einstein is a fairly capable predictive analytics tool available for use within Salesforce. However it has…
Simulating DB locking errors

2020年11月12日

Simulating DB locking errors

"Unable to lock row" is a peculiar and difficult to simulate error which often occurs when multiple requests try to…
Offloading batch processing using Map-reduce

2020年11月10日

Offloading batch processing using Map-reduce

Recently, I was involved in a large scale migration operation on Salesforce. It was done using the Salesforce Batch…

1 条评论

See all articles

Apache Kakfa with Change data capture events

Kedar Bhumkar

Salesforce,Java and MuleSoft Architect, Dreamforce speaker

Kedar Bhumkar的更多文章

社区洞察

其他会员也浏览了

Kafka's origin story at LinkedIn

Apache Kafka - Introduction

Ingesting and Digesting Big Data - Software Development

Oracle To Hive/HDFS Ingestion

?? A collection of Kafka-related talks ??

AVRO Serialization vs JSON Serialization

A Practical Experience in Transitioning from CDH to CDP

Ambari 2.2.2 and Grafana

Quick Jump Start on Apache Kafka – CLI:

Apache Ignite "In-Memory Data Fabric"

Kedar Bhumkar的更多文章

Part 2 - Training data for LLM fine tuning

Part 1 - Generalized Architecture for LLM API's in Client Applications

Github copilot runs on LWC

Performance tracking using the chrome network tab

Leveraging Kafka for ETL operations within Salesforce

Active and passive API callout strategies

Using Platform events to overcome async operation limits

Using AWS for ML insights inside Salesforce

Simulating DB locking errors

Offloading batch processing using Map-reduce

社区洞察

其他会员也浏览了

Kafka's origin story at LinkedIn

Apache Kafka - Introduction

Ingesting and Digesting Big Data - Software Development

Oracle To Hive/HDFS Ingestion

?? A collection of Kafka-related talks ??

AVRO Serialization vs JSON Serialization

A Practical Experience in Transitioning from CDH to CDP

Ambari 2.2.2 and Grafana

Quick Jump Start on Apache Kafka – CLI:

Apache Ignite "In-Memory Data Fabric"