Salesforce Data Cloud Data Retrieval Patterns & Best Practice

Salesforce Data Cloud Data Retrieval Patterns & Best Practice

Problem Statement & Business Drivers

When Data Cloud launched as a Customer Data Platform (CDP) five years ago, it primarily focused on batch data processing to consolidate data from various sources, build segments, and activate them across multiple channels. Over time, the platform has evolved, incorporating near real-time and real-time data processing capabilities. This evolution has unlocked significant potential, enabling customer truth profiles and insights—whether analytical or predictive—to be activated across various channels, directly or indirectly through downstream applications near real-time, to ensure a consistent customer experience.

Our platform now offers a variety of building blocks for data retrieval. This document provides an overview of these capabilities, the specific problems they are designed to solve, and best practices and guidance for optimizing data retrieval strategies

Data Retrieval Capabilities

Salesforce Data Cloud Query API

The Salesforce Data Cloud Query API is designed to enable users to query and retrieve data stored in Salesforce Data Cloud. This API allows for flexible access to unified, harmonized customer data and other information within the Data Cloud, which is often sourced from multiple platforms.

Here are the key features and capabilities of the Salesforce Data Cloud Query API:

  1. Access to Unified Customer Profiles: It allows you to retrieve data from the unified customer profiles that Data Cloud generates by combining data from various sources such as CRM, ERP, and third-party platforms.
  2. Flexible Querying: SQL API allowing you to query datasets in Data Cloud in a familiar manner, such as filtering, joining, and aggregating data.
  3. Interactive and Batch Queries: You can execute both real-time and asynchronous batch queries depending on the size of the data and the required performance. Interactive queries? are optimized for immediate data needs, while batch queries can handle larger datasets.
  4. Complex Data Relationships: The Query API can work with complex, multi-level data structures, including nested JSON objects. This feature supports more intricate queries where multiple entities or attributes need to be correlated.
  5. Event-Based Queries: It can query event data, such as interactions from websites or IoT devices, making it ideal for applications like real-time marketing, predictive analysis, and customer engagement.
  6. Interactive querying with Zero Copy External Data Objects?
  7. Scalability: Designed for high performance, the Query API scales with large volumes of data, supporting enterprises that handle massive customer datasets across global teams.

In essence, the Salesforce Data Cloud Query API provides a powerful way to interact with and derive insights from the massive amounts of customer data unified in Salesforce Data Cloud. On top of that Query engine is the backbone that superpowers the innovative zero copy integration with major data lake platforms such as Snowflake, Redshift & DataBricks.?

Salesforce Data Graph Overview

Data Cloud provides a standard data model that captures the most commonly collected customer data, encompassing both profile and engagement information across various industries. When customers ingest data into Data Cloud, they map these data assets to our standard model for smooth integration. While this normalized structure is effective for ad-hoc querying and batch processing, it poses challenges for real-time and interactive workloads. In the following use cases, the table joins required to combine related data can become costly and unpredictable, particularly as data volumes increase and the number of joins grows.

  • B2C Applications: B2C applications need rapid access to pre-processed data in Data Cloud, with low latency and efficient point lookups at scale. We have several customers who use Data Cloud to harmonize customer data from multiple sources in near real-time, creating comprehensive 360-degree profiles that downstream applications leverage near real-time to deliver consistent experiences across all customer touch points.
  • CRM Scenarios: In Service Cloud, agents require immediate access to a contact’s unified profile from Data Cloud, which encompasses engagement data, key metrics such as LTV, and segments—all delivered with low latency and high throughput.
  • Discover & Attribute Use Case: B2C web and mobile applications can track user interactions and seamlessly transition from anonymous to known profiles through real-time profile unification.

To address this challenge, Data Cloud introduces Data Graph, a powerful solution that enables customers or applications to define and maintain denormalized object (graph) models. It utilizes graph database technology (such as GraphQL) to represent data as a network of interconnected entities (nodes) and relationships (edges), making it particularly effective for modeling complex relationships and hierarchies. Users can define relationships between different data entities, allowing them to visualize and analyze connections that may not be obvious in traditional databases.

Data Graph transforms normalized table data from Data Model Objects (DMOs) into new, denormalized & materialized views of the data. Since the data is pre-calculated and in an optimized format, fewer calls are needed, resulting in near real-time query responses. External DMOs integrated via zero-copy technology can be included in Data Graphs only if their underlying external Data Lake Objects (DLOs) are accelerated. Customers can query Data Graph through APIs or use it in activation workflows to enrich profiles and events. Data Graphs provide a snapshot of the data, which refreshes every 24 hours or can be manually triggered.

An example of Data Graph including Unified Profiles as well as individuals associated, and contact email, phone and insights such as customer lifetime values and 360 insight.?

Example: How a Data Graph Is Constructed

This data graph has three DMOs. The primary DMO is Unified Individual. The Unified Link Individual DMO is a related object and a child of the Unified Individual DMO. The Individual DMO, which is a child of the Unified Link Individual DMO, is also selected as a related object. Each DMO has one or more fields selected. Insight DMOs for unified customer lifetime values and 360 insights are also included. Taken together, the selected fields from these DMOs define the data graph’s JSON schema.


After the data graph is created, the schema determines which data is available in the JSON blob. The JSON of a typical data graph looks like this example.


Additionally, Data Graphs are seamlessly integrated with Flows, Related Lists, and Prompt Studio on the Salesforce platform through simple point-and-click tools. This integration gives Salesforce users an intuitive and efficient way to access interconnected data, apply filters, and select columns from various tables within the denormalized graph—without needing to manually handle complex joins. With highly optimized data retrieval, Data Graph is scalable for B2C use cases.

In the following flow example, Data Graphs, including real-time ones, can be accessed through actions in a flow. Admins can easily configure the flow to leverage available Data Graphs, enabling them to tap into customer insights built on the Data Cloud platform. These insights can then be used to enhance CRM personalization and power AI-driven use cases


In the following prompt template example, Data Cloud's Data Graph can be accessed through the flow template. Customer insights from the Data Graph, including real-time data and zero-copy data from external data lake platforms, are used as part of Retrieval Augmented Generation (RAG) in prompts sent to OpenAI. This ensures that the responses are more relevant and accurate based on up-to-date data


Real-Time Data Graphs

As our product capabilities advance, the need for real-time data delivery has expanded, with an increasing focus on completing end-to-end data processing within milliseconds. A key use case involves stitching an anonymous web browser user to a known user through our real-time ID resolution capabilities, generating immediate insights, and exposing a Real-Time Data Graph for rule-based or machine learning decision engines to deliver next best offers at a consumer scale.

For real-time data ingestion, we currently support all web and mobile events triggered via the Data Cloud SDK. Looking ahead, we are introducing server-side API integrations to enable a broader range of real-time use case scenarios

The following is an example of a Real-Time Data Graph, which includes real-time ingestion of web product browsing engagement events into the real-time layer, followed by immediate ID resolution and real-time insights generation. All of this real-time data is then exposed in the form of a Real-Time Data Graph, enabling instant decision-making and personalization.



Record Caching and User Session Length;

It is very important to note that when configuring a real-time data graph, there are two important parameters to manage, the number of records held in the cache and user session length.?


Record Caching

Record caching enables storing recently active users' records in the real-time layer, allowing for instant personalization from the very first page of your website. You can configure the cache to control the maximum number of records it holds and the duration for which records are considered active. A larger and longer-lasting cache allows more users to receive personalized experiences from their first interaction, while reducing the cache size and retention period can help lower consumption costs.

The real-time layer's cache can store up to 100 million records, with a maximum retention period of 180 days. If needed, record caching can be disabled by selecting the 'Disable record caching' option, in which case records will only remain in the cache during the user’s active session.

One of our customers, with over 400 million customer profiles, collaborated with us to develop an optimal caching strategy. By analyzing different customer tiers, we ensured that Tier 1 customer records remain in the cache for minimal latency. It's important to note that if a record isn't present in the cache during real-time processing, the system will retrieve the customer data from the lakehouse layer, which may result in slightly higher latency compared to accessing cached records

User Session Length

The session length defines the amount of time after a user stops engaging with your website that the session is considered to be active. If you set the timeout period to 30 minutes, a user’s session is considered inactive 30 minutes after the most recent engagement from the user. Any engagement that happens at 31 minutes or after is considered to be a new session.

If the website receives high traffic, the user’s session record doesn't always remain in the real-time layer’s data cache after the session times out.

The industry standard for session expiration is 30 minutes of inactivity. The maximum supported session duration is 48 hours.

Our recommendations & best practices

In summary, Data Cloud allows you to integrate multiple data sources from various backend systems through batch, streaming, or real-time data ingestion. These data are harmonized into a unified 360-degree customer profile and help build meaningful insights, which can then be shared with downstream applications to ensure consistent customer touchpoint experiences. Depending on your latency and throughput requirements, you can select different integration options.

  • Use the Query API when query response times within seconds are acceptable, throughput is moderate, and the data model relationships are relatively straightforward.?
  • For scenarios requiring low latency and high throughput, Data Graph is the ideal solution. It precomputes data for rapid retrieval, with data currently refreshed every 24 hours, though we plan to introduce much more frequent refresh schedules soon.
  • For use cases that involve real-time optimization and recommendations during active user sessions in milliseconds, the Real-Time Data Graph is the best option. It's important to understand best practices around cache size and session duration to ensure optimal experiences while managing resources efficiently
  • Most importantly, there is rarely a one-size-fits-all solution. Think of these capabilities as building blocks, giving you the flexibility to combine them as needed to create a tailored, composable application integration pattern

Finally, the recommendation provided is grounded in the current capabilities of Salesforce Data Cloud. The platform is advancing swiftly, with ongoing evaluations of market demands and use case requirements and frequent introductions of new features aimed at meeting business needs and optimizing platform value. We will introduce additional updates as new features, capabilities, and architectural patterns become available.


Erling Egestad

Distinguished Enterprise Architect - HLS @ Salesforce

3 个月

Great article Stella H. !

Marjorie Tan-Thronson

Solutions Leader- Healthcare and Life Sciences Industry at Salesforce, Distinguished Enterprise Architect

3 个月

Well done Stella! So proud of ya.

Devayani Avadhani

Salesforce Certified Application Architect

4 个月

This is great, Stella!

Brendan McCormick

Director - Solution Engineering at Salesforce

4 个月

As always, great insights and information Stella!!!

要查看或添加评论,请登录

Stella H.的更多文章

社区洞察

其他会员也浏览了