Snowflake vs SQL Server

Snowflake vs SQL Server

Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables and Iceberg tables but there are a few things that you should not use Snowflake for. And one of them is OLTP.

OLTP stands for Online Transaction Processing. The word "online" means real time, i.e. when you ask the database to insert a record or select a row, it does it immediately, usually completed within milliseconds. Whereas in Snowflake it could take a few seconds. In OLTP databases like SQL Server, inserting a row into a table 1000 times takes a few seconds, whereas in Snowflake it takes a few minutes.

The word "transaction" means database transaction. A classic example is debiting an account & crediting the contra account. This means updating two tables in one "transaction". Meaning that the two tables must both be updated. You can't update one table without the other. Meaning that if one table failed to be updated, the other table must not be updated either. This behaviour is called Atomic.

Transaction is the "trade mark" of OLTP databases. Apart from Atomic, a transaction must also be consistent, isolated, durable. They are known as ACID: atomic, consistent, isolated, durable. Durable means it must be written successfully to disk. Isolated means each transaction must not influence other transactions. Consistent means the transaction must follow the FK and PK. And other constraint like Not Null.

Snowflake, Databricks, Redshift, SAP Hana and Big Query all support ACID transactions. But all of them are terrible OLTP database. Meaning that they are not good at inserting a row 1000 times. And that is because they all use columnar file format, such as Parquet files. Or Iceberg files. Because they use columnar storage, they are called columnar databases.

Columnar databases are excellent for summarising and aggregating data. Columnar databases have very good read performance on large tables because they read only the necessary columns, and the values for a particular column are stored together. Analytics and reporting applications do a lot of summarising and aggregating data. And they do a lot of data read, with minimal data write. That is why Snowflake, Databricks, Redshift, SAP Hana and Big Query are very good for analytics and reporting.

One more thing. In a columnar database, empty cells are omitted. And the same data type is stored together. Because of that, columnar databases have very good compression. And that makes data read a lot faster. That is why columnar databases are very good for analytics and reporting.

But all that makes columnar databases poor at transactional workload with lots of frequent data updates or insert. And row-by-row data retrieval. This is because the columnar databases store the values for a column together.

And that is where OLTP databases excel. They store each record in a single row. And every attribute is stored in a column. That's why OLTP databases are very efficient when doing lots of frequent, small data updates. Like inserting a row 1000 times.

But in OLTP databases like Oracle, MySQL, Postgres and SQL Server, different data types are stored together in a row. And that makes the compression not as good as columnar databases. And that makes reading large tables not as good as columnar databases. So OLTP databases are very good at inserting a row 1000 times, but not good at aggregating data in large tables.

When your application does lots of frequent, small updates/inserts, like ecommerce websites and EPOS (Electronic Point of Sales, like supermarket checkouts), then you need to use OLTP database. The most popular OLTP database is Oracle, SQL Server and MySQL (link). Whereas if your application does anakytics and reporting, the you need to use OLAP database like Snowflake, Redshift and Big Query. Databricks is not an OLAP database (see below). SAP Hana is both OLTP database and OLAP database.

Below is the performance in Snowflake Enterprise Edition on a table containing 2 columns and 100 rows, using an XS virtual warehouse:

  • Insert a row: 0.3 to 0.8 seconds
  • Insert a row 100 times: 29-30 seconds
  • Update a row: 0.2 to 0.5 seconds.
  • Update a row 100 times: 33 seconds
  • Select a row: 0.1-0.5 second
  • Select a row 100 times: 5 to 10 seconds

Technically speaking Databricks is not an OLAP database. It is an OLAP solution but not an OLAP database. The reason Databricks is not a database, is because it does not store data in a relational database engine. Databricks is built on Apache Spark, which is is a big data solution and a very good analytical engine. Databricks can be queried using SQL, but it is not a SQL database, nor a relational database (RDBMS).

SAP Hana is both OLTP database and OLAP database. It is a columnar database, and it is an in-memory database. Because it is both OLAP and OLTP database, it is called HTAP database (stands for Hybrid Transactional and Analytical Processing).

Snowflake has Hybrid tables, which are very good for OLTP workload (like inserting a row 1000 times). But there are a lot of limitations on Hybrid tables, see here: link. Snowflake also has Iceberg tables, enabling us to read and write directly into Iceberg files: link. Even though Iceberg tables support ACID transaction, I don't think it is suitable for OLTP workload.

I would not recommend Snowflake, Databricks, Bug Query, Redshift and SAP Hana for OLTP database and OLTP workload. Instead I would recommend Oracle, SQL Server, MySQL and Postgres.

Disclaimer: My views in this article (and any other articles that I wrote on LinkedIn and Wordpress) are my own personal opinion. Not the opinion of my employer or previous employer. Not the opinion of my client or previous client. Not the opinion of any company I mentioned in this article. It is my own opinion, which is subjective, meaning that it is not a fact but my own thinking, which is different from other people, or companies.

I hope the above is useful. If I'm wrong about any of the above, I would appreciate it if you can let me know on the comment below or via DM. If you have any questions or comment please let me know below or DM me. Thank you for reading.

Keep learning!

Ramesh Subramaniam

Building a new data platform for investment management with Snowflake and Cloud technologies

3 小时前

Why would you use Snowflake for OLTP?

回复
James Arthur

BI Developer || Power BI || R || Excel || SQL || Snowflake || SAP || SuccessFactors|| Forecasting

8 小时前

Learnt something

回复
Anjani Kumar

Tech CEO & Founder at Multicloud4u Technologies | Former Microsoft & Publicis Sapient | Enterprise & Data Architect | Bestselling Data Engineering Author | Hands-on Coder

12 小时前

Looks like someone tried running his web application on snowflake free tier ?????

So, would you say that 12K S/4 ERP SAP customers are completely insane for running acid transactions on SAP HANA ?

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • Data engineer becoming solution architect

    Data engineer becoming solution architect

    Are you a data engineer thinking about transitioning to a cloud solution architect? Data engineer are good with…

    2 条评论
  • Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    If you work in banking or investment or any other sector in financial services, you might be wondering about the above.…

  • Data Warehousing Basics: Cost

    Data Warehousing Basics: Cost

    If you call yourself a data engineer you need to be aware of 2 additional things compared to a developer. The first one…

    2 条评论
  • My Linkedin post & articles

    My Linkedin post & articles

    The list below goes back to Nov 2024. For older than that see here.

    9 条评论
  • Data Warehousing Basics: Single Customer View

    Data Warehousing Basics: Single Customer View

    Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance…

    2 条评论
  • Data Warehousing Basics: NFR

    Data Warehousing Basics: NFR

    What I’m about to tell you today failed a lot of data warehousing projects which is why it’s worth paying attention so…

    1 条评论
  • ML and AI - What's the difference?

    ML and AI - What's the difference?

    Machine Learning covers about 20-30 algorithms such as Logistic Regression, Decision Tree, Gradient Boosting, Random…

    4 条评论
  • Microsoft Fabric or Synapse Analytics?

    Microsoft Fabric or Synapse Analytics?

    When it comes to Data Warehousing, Microsoft is confusing. Why? Because it has Microsoft Fabric and it also has Synapse…

    16 条评论
  • Data Warehousing Basics: Transformations

    Data Warehousing Basics: Transformations

    As Bill Inmon said, T is the most difficult thing to do in the ETL. And that is why ETL vendors swap it around - they…

    2 条评论
  • Data Warehousing Basics: Ingestion

    Data Warehousing Basics: Ingestion

    In data warehousing, ingestion is vital. It brings data from the source systems into the data warehouse.

    4 条评论