登录查看更多内容

Snowflake and ELVT vs [ELT|ETL] - Case Study Part 2, Real Time Availability for Single Row INSERTs

Jeffrey Jacobs

CTO/Founder of AltaSQL.io

发布日期: 2022年1月23日

In my previous article, Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture, I described challenges and solutions to building a data analytics architecture before development of the application had even begun. In addition to the four requirements below discussed in that article:

The primary source of data will be a custom Salesforce application, using both standard and custom Salesforce Objects.
Every change to every Salesforce Object record must be captured.
The architecture must support the full data life cycle, from development through production and maintenance.
There will be ongoing changes to the Salesforce Object fields over time.

There is fifth requirement:

Data must be available within 10 seconds of creation/update in Salesforce.

As?reminder, the data is delivered via Kafka Connector, which is set to load available data into Snowflake every 10 seconds.

For any given Salesforce object, this effectively becomes single row INSERTs, resulting in single row micro-partitions. This will result in ongoing declining performance.

The challenge is having the data available and still maintain good performance.

STREAMs to the rescue!

Recall that the architecture is INSERT only and that VIEWs are used for all data access. (See Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture).

The solution is both simple and tunable. (It’s not clear if all Salesforce objects require 10-second availability).

We create both staging tables loaded by Kafka Connector and “optimized” tables, e.g. ACCOUNT_KAFKA_STG, the staging table and ACCOUNT, the “optimized” table.

领英推荐

Data Modeling for Mere Mortals – Part 3: All we need…

Nikola Ilic 1 年前

Evolution of Data Architectures

Dr. RVS Praveen Ph.D 1 年前

Special edition: 19 gotchas to look out for when…

Prukalpa ? 2 年前

CREATE TABLE 
    SFORCE_PHYSICAL_SCHEMA.ACCOUNT IF NOT EXISTS 
    ( 
        LAST_MODIFIED_DTIME TIMESTAMP_TZ(9), 
        CREATED_DTIME TIMESTAMP_TZ(9), 
        RECORD_CONTENT VARIANT 
    ) 
    DATA_RETENTION_TIME_IN_DAYS = 90;
CREATE TABLE 
    SFORCE_PHYSICAL_SCHEMA.ACCOUNT_KAFKA_STG IF NOT EXISTS 
    ( 
        RECORD_METADATA VARIANT, 
        RECORD_CONTENT VARIANT 
    ) 
    DATA_RETENTION_TIME_IN_DAYS = 30;

We create STREAMs on the staging tables, e.g. ACCOUNT_KAFKA_STREAM. The relevant VIEWS perform a UNION ALL between the STREAM and the optimized table.

CREATE STREAM 
IF NOT EXISTS SFORCE_PHYSICAL_SCHEMA.ACCOUNT_KAFKA_STREAM ON TABLE 
    SFORCE_PHYSICAL_SCHEMA.ACCOUNT_KAFKA_STG APPEND_ONLY = TRUE SHOW_INITIAL_ROWS =FALSE;

A TASK is created for each set of Snowflake tables which periodically performs INSERT from the STREAM into the optimized table:

CREATE
OR 
REPLACE TASK 
SFORCE_PHYSICAL_SCHEMA.INSERT_INTO_ACCOUNT_TASK 
SCHEDULE = '1440 MINUTE'
ALLOW_OVERLAPPING_EXECUTION FALSE WAREHOUSE = 
SFORCE_TASK_USAGE_WH USER_TASK_TIMEOUT_MS 300000
WHEN SYSTEM$STREAM_HAS_DATA('SFORCE_PHYSICAL_SCHEMA.ACCOUNT_KAFKA_STREAM') AS
    INSERT INTO 
        SFORCE_PHYSICAL_SCHEMA.ACCOUNT
        (   SELECT
                convert_timezone( 'America/Los_Angeles', TO_TIMESTAMP_TZ(record_content 
                :Target_Payload__c.LastModifiedDate::STRING, 'YYYY-MM-DD"T" HH24:MI:SS.FF TZHTZM'
                )) AS LAST_MODIFIED_DTIME,
                convert_timezone( 'America/Los_Angeles', TO_TIMESTAMP_TZ(record_content 
                :Target_Payload__c.CreatedDate::STRING, 'YYYY-MM-DD"T" HH24:MI:SS.FF TZHTZM' )) AS 
                CREATED_DTIME
            FROM 
                SFORCE_PHYSICAL_SCHEMA.ACCOUNT_KAFKA_STREAM 
            ORDER BY 
                LAST_MODIFIED_DTIME 
        );

When a TASK is created,it is SUSPENDed; be sure to start/RESUME it:

ALTER TASK SFORCE_PHYSICAL_SCHEMA.INSERT_INTO_ACCOUNT_TASK RESUME;

The default schedule for the TASKs is 24 hours, resulting in a full day’s data in a micro-partition. Note that both the Kafka Connector settings and the TASK scheduling are tunable if necessary in the future.

This provides both the real-time availability as well as more efficient use of Snowflake micro-partitions.

A second task periodically removes old micro-partitions from the staging tables.

This pattern provide a consistent solution for the project’s availability requirement in a flexible manner.

要查看或添加评论，请登录

Jeffrey Jacobs的更多文章

AltaSQL Bulk Updating Expressions Across All View Definitions

2025年2月23日

AltaSQL Bulk Updating Expressions Across All View Definitions

Bulk Updating Expressions Across All View Definitions Using AltaSQL AltaSQL allows users to efficiently apply changes…
Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

2025年2月13日

Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

The AltaSQL SELECT Discover Demo uses the Chinook Music database whose columns are in CamelCase. This is the same demo…
Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

2022年5月16日

Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

The SQL PIVOT function has very limited functionality. It is only useful for numeric data, with very explicit, "hard…

4 条评论
Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

2022年3月20日

Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

This article provides both a guide and script for creating and deploying standardized, fully functional, Snowflake…

1 条评论
Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

2022年1月8日

Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

In working with multiple Snowflake accounts, I found a serious issue that I have not seen addressed elsewhere. NOTE:…

7 条评论
Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

2021年11月27日

Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

This article discusses the relevance of the recent TPC-DS “results” to potential customers. Let’s start with…

2 条评论
Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

2021年11月15日

Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

This article is a case study of a real-world implementation. The client and purpose of the application are not relevant…

11 条评论
Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

2021年5月16日

Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

This is the promised follow up to my last article, Snowflake and EVLT vs [ELT, ETL], Part 1, discussing the advantages…

1 条评论
Snowflake and ELVT vs [ELT, ETL], Part 1

2021年4月16日

Snowflake and ELVT vs [ELT, ETL], Part 1

Over several generations of RDBMS technologies, I have learned that common practices, knowledge, and attitudes become…

23 条评论
Snowflake Micro-partition vs Legacy Macro-partition Pruning

2021年4月4日

Snowflake Micro-partition vs Legacy Macro-partition Pruning

I have been in the data business through several RDBM generations and have seen many attempts at comparing performance…

2 条评论

See all articles

Snowflake and ELVT vs [ELT|ETL] - Case Study Part 2, Real Time Availability for Single Row INSERTs

Jeffrey Jacobs

CTO/Founder of AltaSQL.io

领英推荐

Jeffrey Jacobs的更多文章

社区洞察

其他会员也浏览了

Revolutionizing Data Engineering: The Power of Data Mesh Over Traditional Architectures

Note 1: Architecting Data Solutions: A High-Level Overview

Data Vault is no longer required in our Archipelago on the high Seas of Data

Data Management News for the Week of June 7; Updates from Cloudera, Snowflake, Informatica & More

Episode #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro

Data Architecture-as-a-Service: Liberation for Data Users

MDS Newsletter #31

How data catalogs should look, Data lineage from a business perspective and data catalogs in the mesh and beyond

THE RISE OF THE DATA LAKEHOUSE

Can we really call it "Big" Data ?

领英推荐

Jeffrey Jacobs的更多文章

AltaSQL Bulk Updating Expressions Across All View Definitions

Generating 50+ SQL Statements in Under 10 Minutes; No Hand Written SQL!

Pivot ANYTHING in Snowflake, Without the SQL PIVOT Function

Snowflake DBA-101; Deploying Standardized, Fully Functional Databases

Snowflake and Duo Mobile, How to Lose Login Ability and MFA Across Multiple Accounts:ERROR: "USER IS NOT ENROLLED IN DUO SECURITY. CONTACT YOUR LOCAL

Snowflake vs Databricks: TPCS-DS Benchmark Wars – Who Cares?

Snowflake and ELVT vs [ELT|ETL] – Case Study Part 1, The “No Data Model” Data Architecture

Snowflake and ELVT vs [ELT,ETL], Part 2, The ELVT Reference Architecture

Snowflake and ELVT vs [ELT, ETL], Part 1

Snowflake Micro-partition vs Legacy Macro-partition Pruning

社区洞察

其他会员也浏览了

Revolutionizing Data Engineering: The Power of Data Mesh Over Traditional Architectures

Note 1: Architecting Data Solutions: A High-Level Overview

Data Vault is no longer required in our Archipelago on the high Seas of Data

Data Management News for the Week of June 7; Updates from Cloudera, Snowflake, Informatica & More

Episode #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro

Data Architecture-as-a-Service: Liberation for Data Users

MDS Newsletter #31

How data catalogs should look, Data lineage from a business perspective and data catalogs in the mesh and beyond

THE RISE OF THE DATA LAKEHOUSE

Can we really call it "Big" Data ?