Lifting the Lid of Salesforce Genie's Architecture
Jannis Kearney Bott ?
Co-CEO @ J4RVIS | CTA | Advisory Board Member | Entrepreneur | Mentor
Last week I wrote an article about my initial opinion and understanding of Salesforce Genie. Shortly after I published the article, I went to a session with the Salesforce product management team who explained in detail how Genie works and how it is embedded in the platform.
In this article, I will focus on summarising what I heard and will also attempt to compare how Genie differs from existing platform integration patterns.
Let's talk about architecture
Don't be fooled by the pretty picture at the top. Whilst I fully agree that this is a beautiful marketing slide, there is one bit in there that is important to note, the "Real-Time Genie Hyperscale Platform".
To better understand this, let's look back at how Salesforce is essentially built at its core. If you pay attention to the bottom "Transactional Data" layer of the diagram, you are looking at the traditional relational database that every Salesforce org shares with all tenants of the instance. In that database, organisations store everything from the Account, Contact, and Opportunity to Custom Object data. The key point to call out here is that the data is physically stored in that database and hence you can run triggers, flows and process automation on the data with ease.
To make sense of all that data that is stored in the "Transactional Data" layer, Salesforce has its "Unified Metadata Dictionary" which describes the data structures to enforce data integrity (e.g. a number field must hold a number) when saving records. With those two layers embedded at the core of the platform, Salesforce then adds layers for "Security & Access Control" to ensure data is only visible to the relevant users in a specific org and their AI layer "Einstein" which can leverage AI models to drive insights (note that Einstein sits above the Security & Access layer and hence only access data in your org).
The three layers above for core platform capabilities (IDAM, APIs, Automation, DX etc.), "Lightning Design System" (the entire web component architecture and how it interacts with the Salesforce platform) and then all "Applications" or "Clouds" (e.g. Sales, Service etc.) are the layers that we as consultants, architects and customers interact with daily.
Where Genie fits in
Why am I explaining all of this, well, when considering how deep down the "Transactional Data" layer sits within the overall platform architecture, it opens the question of where Genie's "Real-Time" data sits and what it is. To cut it short, Salesforce has embedded a new Data Lakehouse layer to its platform that sits next to the traditional data layer.
Data Lakehouse? I am glad you ask.
I am by all means no data lake expert, but essentially a Data Lakehouse architecture "is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of?data lakes?with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data". Read more about the architecture concepts here.
So by adding a new Data Lakehouse layer, we now get the ability to store significantly larger volumes of data whilst maintaining some benefits of traditional relational databases like ACID transactions. But that's not all, because the Lakehouse is embedded in the core of the platform and stores the data we now can drive automation of the back of it. We can also start to run queries against the data, use data pipelines to prepare new datasets without the need of any ETL and connect it to BI applications like CRM Analytics etc.
领英推荐
How to bring data into the Lakehouse?
To go even deeper into the architecture, the question that arises is how do we get data into the Data Lakehouse and what tools will be available to support the import? There are essentially two ways:
On top of all that, you also can bring your own Data Warehouse. If you happen to have for example an existing Snowflake deployment, you can mount your data lake/warehouse with zero copies into Genie which provides the benefit that we can act on data that doesn't even reside with Salesforce' Lakehouse - I think that fact in itself is very beneficial for customers that have existing data lake deployments.
How to get data out of the Lakehouse?
As you can imagine, given that the idea is to deal with very large data volumes (billions of rows) it is important to process data efficiently. To do so, Genie is leveraging Spark, which "is?a general-purpose distributed processing system used for big data workloads. It has been deployed in every type of big data use case to detect patterns and provide real-time insight". Essentially as data is being processed, at the end it outputs action triggers which result in form of Platform Events, Webhooks and invocation of journeys that will help us to automate business processes and logic.
What is in it for you?
The fact that you can access large data directly within Salesforce without the need for more connectors opens up quite some opportunities. It's certainly not the holy grail for everything but the ability to query and access large data volume (LDV) in real time natively in Salesforce is pretty powerful. Equally, the ability to trigger flows to execute when data changes whilst maintaining CRUD and FLS is great.
Lastly, with this architecture, Salesforce opens the door to migrating Marketing Cloud and Commerce Cloud directly into Salesforce. In the past, it has always been difficult to integrate those platforms due to the volumes of data they generate.
Does this replace everything else?
Absolutely not. There are still many use cases to synchronise data into the transactional database in Salesforce (e.g. ETL, or MuleSoft etc.) or to virtualise data entirely (e.g. Salesforce Connect etc.).
I hope this helped to provide greater detail. As I learn more I will update this article.
Management Consulting firm | Growth Hacking | Global B2B Conference | Brand Architecture | Business Experience |Business Process Automation | Software Solutions
2 年Jannis, thanks for sharing!
17x Salesforce Certified Application and System Architect/Salesforce Solution Architect
2 年Thank you for composing this article, very informative and helps understand the concept :)
Salesforce Practice Lead | Lead Solution/Technical Architect | E2E Architectural Solution| Trailhead Ranger| Active Listener| Problem Solver| Learner| Blogger
2 年Excellent explanation of Salesforce Genie architecture, Jannis Kearney Bott ? My question is how do I access the data lake as a developer? Just like we create custom objects on traditional Salesforce database using UI or Metadata API or do we have the separate APIs to bring data into the Salesforce data lake just like Rest or Soap API?
Executive IT Strategist | 20+ Years in Digital Transformation | CRM & Automation Expert | Writer
2 年Thanks Jannis Kearney Bott ? .. This is probably the best explanation of Genie till now. I believe there are obvious winners where given automated lakehouse dev/maintenance/access for Salesforce data. I wonder what would be key considerations getting into it, given large enterprises often have more complex data platforms/architectures.
Solutions Architect at Asana
2 年Thanks Jannis Kearney Bott ? for sharing this. The way you explained is very nice. It answered some of the doubts i had from the Dreamforce Round Table Talks session.