Salesforce Data Cloud Data Retrieval Patterns & Best Practice
Problem Statement & Business Drivers
When Data Cloud launched as a Customer Data Platform (CDP) five years ago, it primarily focused on batch data processing to consolidate data from various sources, build segments, and activate them across multiple channels. Over time, the platform has evolved, incorporating near real-time and real-time data processing capabilities. This evolution has unlocked significant potential, enabling customer truth profiles and insights—whether analytical or predictive—to be activated across various channels, directly or indirectly through downstream applications near real-time, to ensure a consistent customer experience.
Our platform now offers a variety of building blocks for data retrieval. This document provides an overview of these capabilities, the specific problems they are designed to solve, and best practices and guidance for optimizing data retrieval strategies
Data Retrieval Capabilities
Salesforce Data Cloud Query API
The Salesforce Data Cloud Query API is designed to enable users to query and retrieve data stored in Salesforce Data Cloud. This API allows for flexible access to unified, harmonized customer data and other information within the Data Cloud, which is often sourced from multiple platforms.
Here are the key features and capabilities of the Salesforce Data Cloud Query API:
In essence, the Salesforce Data Cloud Query API provides a powerful way to interact with and derive insights from the massive amounts of customer data unified in Salesforce Data Cloud. On top of that Query engine is the backbone that superpowers the innovative zero copy integration with major data lake platforms such as Snowflake, Redshift & DataBricks.?
Salesforce Data Graph Overview
Data Cloud provides a standard data model that captures the most commonly collected customer data, encompassing both profile and engagement information across various industries. When customers ingest data into Data Cloud, they map these data assets to our standard model for smooth integration. While this normalized structure is effective for ad-hoc querying and batch processing, it poses challenges for real-time and interactive workloads. In the following use cases, the table joins required to combine related data can become costly and unpredictable, particularly as data volumes increase and the number of joins grows.
To address this challenge, Data Cloud introduces Data Graph, a powerful solution that enables customers or applications to define and maintain denormalized object (graph) models. It utilizes graph database technology (such as GraphQL) to represent data as a network of interconnected entities (nodes) and relationships (edges), making it particularly effective for modeling complex relationships and hierarchies. Users can define relationships between different data entities, allowing them to visualize and analyze connections that may not be obvious in traditional databases.
Data Graph transforms normalized table data from Data Model Objects (DMOs) into new, denormalized & materialized views of the data. Since the data is pre-calculated and in an optimized format, fewer calls are needed, resulting in near real-time query responses. External DMOs integrated via zero-copy technology can be included in Data Graphs only if their underlying external Data Lake Objects (DLOs) are accelerated. Customers can query Data Graph through APIs or use it in activation workflows to enrich profiles and events. Data Graphs provide a snapshot of the data, which refreshes every 24 hours or can be manually triggered.
An example of Data Graph including Unified Profiles as well as individuals associated, and contact email, phone and insights such as customer lifetime values and 360 insight.?
Example: How a Data Graph Is Constructed
This data graph has three DMOs. The primary DMO is Unified Individual. The Unified Link Individual DMO is a related object and a child of the Unified Individual DMO. The Individual DMO, which is a child of the Unified Link Individual DMO, is also selected as a related object. Each DMO has one or more fields selected. Insight DMOs for unified customer lifetime values and 360 insights are also included. Taken together, the selected fields from these DMOs define the data graph’s JSON schema.
After the data graph is created, the schema determines which data is available in the JSON blob. The JSON of a typical data graph looks like this example.
Additionally, Data Graphs are seamlessly integrated with Flows, Related Lists, and Prompt Studio on the Salesforce platform through simple point-and-click tools. This integration gives Salesforce users an intuitive and efficient way to access interconnected data, apply filters, and select columns from various tables within the denormalized graph—without needing to manually handle complex joins. With highly optimized data retrieval, Data Graph is scalable for B2C use cases.
In the following flow example, Data Graphs, including real-time ones, can be accessed through actions in a flow. Admins can easily configure the flow to leverage available Data Graphs, enabling them to tap into customer insights built on the Data Cloud platform. These insights can then be used to enhance CRM personalization and power AI-driven use cases
In the following prompt template example, Data Cloud's Data Graph can be accessed through the flow template. Customer insights from the Data Graph, including real-time data and zero-copy data from external data lake platforms, are used as part of Retrieval Augmented Generation (RAG) in prompts sent to OpenAI. This ensures that the responses are more relevant and accurate based on up-to-date data
领英推荐
Real-Time Data Graphs
As our product capabilities advance, the need for real-time data delivery has expanded, with an increasing focus on completing end-to-end data processing within milliseconds. A key use case involves stitching an anonymous web browser user to a known user through our real-time ID resolution capabilities, generating immediate insights, and exposing a Real-Time Data Graph for rule-based or machine learning decision engines to deliver next best offers at a consumer scale.
For real-time data ingestion, we currently support all web and mobile events triggered via the Data Cloud SDK. Looking ahead, we are introducing server-side API integrations to enable a broader range of real-time use case scenarios
The following is an example of a Real-Time Data Graph, which includes real-time ingestion of web product browsing engagement events into the real-time layer, followed by immediate ID resolution and real-time insights generation. All of this real-time data is then exposed in the form of a Real-Time Data Graph, enabling instant decision-making and personalization.
Record Caching and User Session Length;
It is very important to note that when configuring a real-time data graph, there are two important parameters to manage, the number of records held in the cache and user session length.?
Record Caching
Record caching enables storing recently active users' records in the real-time layer, allowing for instant personalization from the very first page of your website. You can configure the cache to control the maximum number of records it holds and the duration for which records are considered active. A larger and longer-lasting cache allows more users to receive personalized experiences from their first interaction, while reducing the cache size and retention period can help lower consumption costs.
The real-time layer's cache can store up to 100 million records, with a maximum retention period of 180 days. If needed, record caching can be disabled by selecting the 'Disable record caching' option, in which case records will only remain in the cache during the user’s active session.
One of our customers, with over 400 million customer profiles, collaborated with us to develop an optimal caching strategy. By analyzing different customer tiers, we ensured that Tier 1 customer records remain in the cache for minimal latency. It's important to note that if a record isn't present in the cache during real-time processing, the system will retrieve the customer data from the lakehouse layer, which may result in slightly higher latency compared to accessing cached records
User Session Length
The session length defines the amount of time after a user stops engaging with your website that the session is considered to be active. If you set the timeout period to 30 minutes, a user’s session is considered inactive 30 minutes after the most recent engagement from the user. Any engagement that happens at 31 minutes or after is considered to be a new session.
If the website receives high traffic, the user’s session record doesn't always remain in the real-time layer’s data cache after the session times out.
The industry standard for session expiration is 30 minutes of inactivity. The maximum supported session duration is 48 hours.
Our recommendations & best practices
In summary, Data Cloud allows you to integrate multiple data sources from various backend systems through batch, streaming, or real-time data ingestion. These data are harmonized into a unified 360-degree customer profile and help build meaningful insights, which can then be shared with downstream applications to ensure consistent customer touchpoint experiences. Depending on your latency and throughput requirements, you can select different integration options.
Finally, the recommendation provided is grounded in the current capabilities of Salesforce Data Cloud. The platform is advancing swiftly, with ongoing evaluations of market demands and use case requirements and frequent introductions of new features aimed at meeting business needs and optimizing platform value. We will introduce additional updates as new features, capabilities, and architectural patterns become available.
Distinguished Enterprise Architect - HLS @ Salesforce
3 个月Great article Stella H. !
Solutions Leader- Healthcare and Life Sciences Industry at Salesforce, Distinguished Enterprise Architect
3 个月Well done Stella! So proud of ya.
Salesforce Certified Application Architect
4 个月This is great, Stella!
Director - Solution Engineering at Salesforce
4 个月As always, great insights and information Stella!!!