Apache Kakfa with Change data capture events

Apache Kakfa with Change data capture events

The recent release of the Change data capture (CDC) events capability within Salesforce has provided a much needed functionality for ensuring data synchronization changes are propagated in a more deterministic and scalable manner.

The CDC capability leverages streaming API's to send data updates occurring on Salesforce standard and custom objects to interested subscribers via events and channels.

Typical usecases for this includes full replication of Salesforce data as well as incremental and partial updates sent towards external systems. This is generally needed for external reporting, archiving, compliance reasons as well as transforming and storing data in a canonical form among other applications. The API limits for CDC are different then then ones used for SOAP/REST limits and hence offer more flexibility.

Apache Kafka is the defacto industry standard for large scale event processing applications. It was developed at LinkedIn before becoming an Apache project. It is primarily targeted towards enterprise class applications that need to process enormous amounts of data with high throughput and horizontal scalability. Apache Kafka is generally deployed in a cluster and supports in-built replication and fault tolerance.

No alt text provided for this image

As depicted above we can now subscribe to CDC events via Kafka topics and further process them in southbound systems like Hadoop. Apache kafka stores every event in the log file for upto 7 days. New events are appended to the front of the log and their position in the log is defined by an Id field called as the replayId. As the name implies it allows reading of data from an arbitrary point of time which allows recreating the entire state of the data on Salesforce from scratch if desired .

Heroku supports Apache Kafka natively and more information can be found here.




要查看或添加评论,请登录

Kedar Bhumkar的更多文章

  • Part 2 - Training data for LLM fine tuning

    Part 2 - Training data for LLM fine tuning

    In part 1 of this series, a general LLM architecture for APIs and inference overview was presented. In this article, we…

    1 条评论
  • Part 1 - Generalized Architecture for LLM API's in Client Applications

    Part 1 - Generalized Architecture for LLM API's in Client Applications

    The below diagram presents a generalized inference architecture designed for the integration of Language Model (LLM)…

  • Github copilot runs on LWC

    Github copilot runs on LWC

    Recently, I got access to the GitHub copilot Beta program. GitHub copilot is the GPT3 based AI pair programming tool…

    2 条评论
  • Performance tracking using the chrome network tab

    Performance tracking using the chrome network tab

    The Chrome network tab is a powerful arsenal for performance tracking any web application including Salesforce. In this…

  • Leveraging Kafka for ETL operations within Salesforce

    Leveraging Kafka for ETL operations within Salesforce

    Traditional ETL is slowly getting replaced within the enterprise landscape with real time event based ETL. The classic…

    1 条评论
  • Active and passive API callout strategies

    Active and passive API callout strategies

    API failure management is critical and needs to be designed properly to ensure the API calls are successfully made as…

  • Using Platform events to overcome async operation limits

    Using Platform events to overcome async operation limits

    Platform events (PE) are great for what they are designed for; which is event based loose coupling and processing…

    1 条评论
  • Using AWS for ML insights inside Salesforce

    Using AWS for ML insights inside Salesforce

    Salesforce Einstein is a fairly capable predictive analytics tool available for use within Salesforce. However it has…

  • Simulating DB locking errors

    Simulating DB locking errors

    "Unable to lock row" is a peculiar and difficult to simulate error which often occurs when multiple requests try to…

  • Offloading batch processing using Map-reduce

    Offloading batch processing using Map-reduce

    Recently, I was involved in a large scale migration operation on Salesforce. It was done using the Salesforce Batch…

    1 条评论

社区洞察

其他会员也浏览了