Snowflake vs SQL Server
Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables and Iceberg tables but there are a few things that you should not use Snowflake for. And one of them is OLTP.
OLTP stands for Online Transaction Processing. The word "online" means real time, i.e. when you ask the database to insert a record or select a row, it does it immediately, usually completed within milliseconds. Whereas in Snowflake it could take a few seconds. In OLTP databases like SQL Server, inserting a row into a table 1000 times takes a few seconds, whereas in Snowflake it takes a few minutes.
The word "transaction" means database transaction. A classic example is debiting an account & crediting the contra account. This means updating two tables in one "transaction". Meaning that the two tables must both be updated. You can't update one table without the other. Meaning that if one table failed to be updated, the other table must not be updated either. This behaviour is called Atomic.
Transaction is the "trade mark" of OLTP databases. Apart from Atomic, a transaction must also be consistent, isolated, durable. They are known as ACID: atomic, consistent, isolated, durable. Durable means it must be written successfully to disk. Isolated means each transaction must not influence other transactions. Consistent means the transaction must follow the FK and PK. And other constraint like Not Null.
Snowflake, Databricks, Redshift, SAP Hana and Big Query all support ACID transactions. But all of them are terrible OLTP database. Meaning that they are not good at inserting a row 1000 times. And that is because they all use columnar file format, such as Parquet files. Or Iceberg files. Because they use columnar storage, they are called columnar databases.
Columnar databases are excellent for summarising and aggregating data. Columnar databases have very good read performance on large tables because they read only the necessary columns, and the values for a particular column are stored together. Analytics and reporting applications do a lot of summarising and aggregating data. And they do a lot of data read, with minimal data write. That is why Snowflake, Databricks, Redshift, SAP Hana and Big Query are very good for analytics and reporting.
One more thing. In a columnar database, empty cells are omitted. And the same data type is stored together. Because of that, columnar databases have very good compression. And that makes data read a lot faster. That is why columnar databases are very good for analytics and reporting.
But all that makes columnar databases poor at transactional workload with lots of frequent data updates or insert. And row-by-row data retrieval. This is because the columnar databases store the values for a column together.
And that is where OLTP databases excel. They store each record in a single row. And every attribute is stored in a column. That's why OLTP databases are very efficient when doing lots of frequent, small data updates. Like inserting a row 1000 times.
But in OLTP databases like Oracle, MySQL, Postgres and SQL Server, different data types are stored together in a row. And that makes the compression not as good as columnar databases. And that makes reading large tables not as good as columnar databases. So OLTP databases are very good at inserting a row 1000 times, but not good at aggregating data in large tables.
When your application does lots of frequent, small updates/inserts, like ecommerce websites and EPOS (Electronic Point of Sales, like supermarket checkouts), then you need to use OLTP database. The most popular OLTP database is Oracle, SQL Server and MySQL (link). Whereas if your application does anakytics and reporting, the you need to use OLAP database like Snowflake, Redshift and Big Query. Databricks is not an OLAP database (see below). SAP Hana is both OLTP database and OLAP database.
Below is the performance in Snowflake Enterprise Edition on a table containing 2 columns and 100 rows, using an XS virtual warehouse:
Technically speaking Databricks is not an OLAP database. It is an OLAP solution but not an OLAP database. The reason Databricks is not a database, is because it does not store data in a relational database engine. Databricks is built on Apache Spark, which is is a big data solution and a very good analytical engine. Databricks can be queried using SQL, but it is not a SQL database, nor a relational database (RDBMS).
SAP Hana is both OLTP database and OLAP database. It is a columnar database, and it is an in-memory database. Because it is both OLAP and OLTP database, it is called HTAP database (stands for Hybrid Transactional and Analytical Processing).
Snowflake has Hybrid tables, which are very good for OLTP workload (like inserting a row 1000 times). But there are a lot of limitations on Hybrid tables, see here: link. Snowflake also has Iceberg tables, enabling us to read and write directly into Iceberg files: link. Even though Iceberg tables support ACID transaction, I don't think it is suitable for OLTP workload.
I would not recommend Snowflake, Databricks, Bug Query, Redshift and SAP Hana for OLTP database and OLTP workload. Instead I would recommend Oracle, SQL Server, MySQL and Postgres.
Disclaimer: My views in this article (and any other articles that I wrote on LinkedIn and Wordpress) are my own personal opinion. Not the opinion of my employer or previous employer. Not the opinion of my client or previous client. Not the opinion of any company I mentioned in this article. It is my own opinion, which is subjective, meaning that it is not a fact but my own thinking, which is different from other people, or companies.
I hope the above is useful. If I'm wrong about any of the above, I would appreciate it if you can let me know on the comment below or via DM. If you have any questions or comment please let me know below or DM me. Thank you for reading.
Keep learning!
Building a new data platform for investment management with Snowflake and Cloud technologies
3 小时前Why would you use Snowflake for OLTP?
BI Developer || Power BI || R || Excel || SQL || Snowflake || SAP || SuccessFactors|| Forecasting
8 小时前Learnt something
Tech CEO & Founder at Multicloud4u Technologies | Former Microsoft & Publicis Sapient | Enterprise & Data Architect | Bestselling Data Engineering Author | Hands-on Coder
12 小时前Looks like someone tried running his web application on snowflake free tier ?????
So, would you say that 12K S/4 ERP SAP customers are completely insane for running acid transactions on SAP HANA ?