Harnessing the Power of Adobe Experience Platform Sink Connector for Real-Time Data Ingestion from Kafka
credit: Chatgpt

Harnessing the Power of Adobe Experience Platform Sink Connector for Real-Time Data Ingestion from Kafka

Not too long ago, a customer came to me with a challenge involving the Adobe Experience Platform (AEP) Sink Connector. While they had a strong understanding of their business and marketing needs, they weren't deeply familiar with the Kafka ecosystem. They understood that Kafka could help power their real-time data flows, but they needed expert guidance on how to architect a solution that would seamlessly integrate this data into AEP. More specifically, they were focused on how this data could support their marketing use cases, particularly around real-time retargeting.

The client wasn’t looking to dive deep into the technical intricacies—they just needed to know how to leverage these tools to improve their marketing campaigns. They wanted to understand how they could use real-time data to fine-tune their customer journeys without worrying too much about the technical details, as long as the technology could effectively support their retargeting efforts.

This is where I came in. My goal was to simplify the complexities of the Kafka ecosystem and AEP Sink Connector, so my client could focus on their marketing objectives without getting bogged down in technical jargon. I broke down the solution into actionable steps, showing them how to use the AEP Sink Connector to efficiently ingest real-time streaming data from Kafka into AEP—enabling them to target customers more accurately and adjust campaigns in real-time.

In this blog, I’ll share how I helped my client navigate through the complexities of the AEP Sink Connector and Kafka ecosystem. Together, we explored how to leverage the AEP Sink Connector to ingest streaming data from Kafka into AEP, enabling real-time use cases. If you’re facing a similar challenge, this guide will help you get started with your own streaming data architecture. Let’s dive in!

Prerequisite Concepts

Before diving into the technical details, it is essential to understand some foundational terms and concepts that will be referenced throughout this blog.

Having a clear understanding of these terms will make it easier to follow the implementation steps and appreciate the benefits of integrating Kafka with AEP.

  • What is Docker? For readers unfamiliar with Docker, it is an open-source platform that enables developers to create, deploy, and run applications inside lightweight, portable containers. These containers package everything needed to run an application, including code, runtime, libraries, and dependencies, ensuring consistency across different environments. Docker is particularly valuable when setting up environments like Kafka and the AEP Sink Connector, as it simplifies deployment and scaling.
  • What is Kafka? Kafka is a distributed messaging system used for building real-time data pipelines and streaming applications. It enables the transfer of data between systems or applications in the form of messages organized in topics. Companies use Kafka to handle high-throughput, fault-tolerant data streams, making it ideal for processing and streaming KPIs such as ROI, lead conversions, and customer satisfaction scores. In this context, Kafka acts as the backbone for feeding data into the AEP Sink Connector.
  • What is jq? For readers who may encounter "jq" during data transformation, it is a lightweight and flexible command-line JSON processor. In the context of this blog, jq can assist in validating or transforming data to ensure it adheres to the Experience Data Model (XDM) schema required by AEP.
  • What is the AEP Sink Connector? The AEP Sink Connector is a component of Kafka Connect that facilitates the transfer of data streams from Kafka into Adobe Experience Platform. It enables real-time ingestion of data into AEP’s datasets, aligning with the platform’s Experience Data Model (XDM). By using this connector, businesses can streamline data integration pipelines, transforming raw data streams into actionable insights.

Understanding Kafka Connect and Kafka Instance in the Context of AEP Streaming Connector

Let’s break this down into simpler terms, focusing on the AEP Streaming Connector and its integration with Kafka.

1. What is Kafka Connect?

Kafka Connect is a framework that simplifies the integration of external systems (like databases, other message brokers, or applications) with Apache Kafka. It provides ready-made connectors for common data systems and makes it easier to move data between Kafka and other systems without writing custom code.

Kafka Connect can run in two modes:

  • Standalone Mode: Ideal for simpler setups, where a single instance of Kafka Connect is used.
  • Distributed Mode: This is used for larger, fault-tolerant setups with multiple Kafka Connect instances running in a cluster.

The AEP Streaming Connector can be installed as a Kafka Connect Plugin, which is essentially a JAR file that enables data to flow from Kafka into Adobe Experience Platform (AEP).

2. AEP Streaming Connector with Kafka Connect

If you have your own Kafka Connect Cluster:

  • Setup: When you have a Kafka Connect Cluster, you can use the AEP Streaming Connector as a Kafka Connect Plugin.
  • Installation: The AEP Streaming Connector is essentially a JAR file (Uber JAR) that you drop into your Kafka Connect Cluster. This enables your Kafka Connect instance to pull data from Kafka topics and send it to AEP.

Steps for setting this up:

  1. Install the Kafka Connect Plugin: Drop the AEP Streaming Connector JAR into the Kafka Connect Cluster.
  2. Run the Connector: After installation, you can configure and run instances of the AEP Streaming Connector to send data to AEP from Kafka topics.

The Kafka Connect Cluster handles the communication between Kafka and AEP, managing things like data ingestion, scaling, and fault tolerance.

3. AEP Streaming Connector with Kafka but Without Kafka Connect Instance

If you have your own Kafka deployment but do not have a Kafka Connect instance, the Kafka Connect Plugin cannot be used in the traditional way. In this case, you can still use the AEP Streaming Connector by running it via Docker, which directly communicates with your Kafka brokers to send data to AEP.

In simpler terms:

  • Kafka brokers: These are the actual servers or instances running Apache Kafka that store and manage your data streams (topics).
  • Without Kafka Connect Instance: If you don’t have Kafka Connect set up but still want to send data from Kafka to AEP, you can run the AEP Streaming Connector in a Docker container to connect directly to your Kafka brokers.

Key Differences:

  • With Kafka Connect: You are using the Kafka Connect cluster to manage the data integration. Kafka Connect handles configuration, scaling, and fault tolerance for connecting Kafka to AEP. It's the recommended approach when you have a managed Kafka Connect instance.
  • Without Kafka Connect: If you don’t have a Kafka Connect instance, you can use Docker to run the AEP Streaming Connector directly against your Kafka brokers. This is a simpler setup but may lack some of the advanced management features that Kafka Connect provides (like scaling and fault tolerance).

Which Setup to Use?

  • Use Kafka Connect if you already have a Kafka Connect Cluster in place, as it offers better scalability, fault tolerance, and easier management for connecting Kafka to AEP.
  • Use Docker (Without Kafka Connect) if you don’t have Kafka Connect and want a simpler, smaller-scale deployment, especially for development or testing environments.

This flexibility allows businesses to choose the right approach based on their existing infrastructure and scalability requirements.

Data Exchange Setup for Adobe Experience Platform (AEP) Sink Connector

The data exchange process between Kafka Connect, IMS, and Adobe Experience Platform (AEP)

The sequence diagram provides a step-by-step view of the data exchange process between Kafka Connect, IMS, and Adobe Experience Platform (AEP). This layout is designed to make the workflow clear and accessible. Key steps include:

  1. Authentication: Fetching an IMS token to authenticate with the AEP organization.
  2. Schema Creation: Setting up the XDM Schema to define the data structure.
  3. Dataset Configuration: Creating a dataset for the schema to store incoming data.
  4. Streaming Connection: Establishing a streaming connection using the schema and dataset ID.
  5. Sink Connector Setup: Creating the AEP Sink Connector to enable data transfer.
  6. Data Ingestion: Sending data from Kafka’s source topic to AEP's streaming endpoint.

This visually intuitive format makes it easier to trace the sequence of interactions and understand the technical dependencies involved in real-time data ingestion workflows.

To begin the setup on your local machine, start by downloading the streaming connector from this GitHub link.

Please note that the connector currently uses Service Account (JWT) Authentication, which will be deprecated starting June 3, 2024. After this date, new Service Account (JWT) credentials cannot be created or added to projects. If you plan to continue using the application, the existing JWT authentication will no longer generate the token.

To ensure a smooth transition, I’ve updated the code to support the new OAuth Server-to-Server authentication, the changes to the respective files can be found in my GitHub repository here. Migrating to this method is straightforward and allows for a zero-downtime migration, ensuring your application remains functional during the transition.

To learn how to migrate from Service Account to OAuth Server-to-Server credentials, follow the instructions provided in this link.

Now, build the Docker image using the commands below. Once the command completes, the output generated will be shown in the screenshot.

./gradlew clean build
docker build -t streaming-connect .
docker-compose up -d        
Output of gradle with warrings
Output of gradle success

Before executing the command below, make sure to obtain an API Key and IMS Token to access Adobe Cloud Platform APIs.

docker exec -i experience-platform-streaming-connect-kafka-connect-1 ./setup.sh        
The resulting output should resemble the one shown in the screenshot.

Simplifying the Output:

Let's break down the output mentioned and explain it in simpler terms:

1. Streaming Connection Source:

"My Streaming Source-20241226153128": This is the name of the source where the data originates. In your case, this source could be a Kafka topic or another streaming data source.

2. Creating the Streaming Connection:

Making call to create streaming connection to https://platform.adobe.io/...: This message shows that a request is being made to Adobe's platform to create a connection between your streaming source (e.g., Kafka) and Adobe Experience Platform.

3. Streaming Connection ID:

Streaming Connection: e9b0175a-314d-467a-b595-656c3347b40b: This is a unique identifier generated for the streaming connection. It's used to track and manage the connection in AEP.

4. Streaming Connection URL:

https://dcs.adobedc.net/collection/...: This URL is where the data will be sent in real-time once the connection is established. It represents the destination where Adobe Experience Platform will receive and process the data.

5. Topic Created:

Created topic connect-test-20241226153128: A topic (a kind of "data stream" or channel) has been created with the name connect-test-20241226153128. This topic will receive the streaming data from Kafka and send it to AEP.

6. AEP Sink Connector:

AEP Sink Connector aep-sink-connector-20241226153128: This refers to the AEP Sink Connector that facilitates the connection between the streaming source (Kafka) and AEP. It is responsible for receiving data from the Kafka stream and pushing it into AEP for real-time processing.

Here is a screenshot of the data generated by Kafka topics, which is then consumed within AEP.


Kafka Topics
Connect

Verify Data Landing into AEP

To verify that the data is successfully landing in Adobe Experience Platform (AEP), log in to AEP and use the monitoring dashboard to track the streaming data as it gets ingested. Then, review the data within the dataset to ensure it has been ingested correctly.

Monitoring: Streaming end-to-end
Data Preview

Conclusion

The AEP Sink Connector is a powerful tool for integrating Kafka data streams into Adobe Experience Platform. By leveraging this connector, businesses can unlock the full potential of their streaming data for real-time insights and improved customer experiences. Whether you’re looking to optimize marketing campaigns, enhance customer engagement, or drive revenue growth, the AEP Sink Connector provides the foundation for success.

Start building your real-time data pipelines with the AEP Sink Connector today and transform your data into actionable insights.

要查看或添加评论,请登录

Sainath Revankar的更多文章

社区洞察

其他会员也浏览了