Backup in Snowflake

Many of us who work with databases like SQL Server and Oracle are used to the concept of database backup. We backed up our databases every day, and in the event of database failure, we restore that backup. Then, to bring our restored database up to date, we restore the transaction log up to a specific point in time.

We are now in the era of Snowflake and Databricks, where there is no concept of backup. What we have Snowflake is time travel. As you know, we query a database using SELECT statement. Any database. In Snowflake too. For instance: SELECT contract_date, contract_value FROM contract. But in Snowflake at the end of your SELECT statement you can add “at” like this: SELECT contract_date, contract_value FROM contract AT ‘2024-04-15 07:30:00’. You need to use timestamp data type actually, but you get the idea. And you can put any date and time up to 90 days.

You also have time travel in Databricks, which works the same way. It’s just that instead of AT, at the end of the SELECT statement you put “AS OF” like this: SELECT contract_date, contract_value FROM contract AS OF “2024-04-15 07:30:00”. Time travel is an amazing feature! It eliminates the need for doing backup and restore on a database.

Not only that, when you accidentally dropped a table, and shouted “Oh no!” now you have no reason to panic. Because just like we have a DROP statement, in Snowflake we also have UNDROP statement. It is amazing! For example, if we did “DROP TABLE contract” accidentally, we can just type: “UNDROP TABLE contract” to bring our contract table back from the dead. And you can do it up to 90 days after you dropped the table. And before you ask, yes in Databricks you can also undrop tables. Same syntax, i.e. UNDROP TABLE table_name.

In addition to that, in Snowflake you also have CLONE. It is like copying a database without actually doing any copying. For example, to copy DB1 into DB2, we say: CREATE DATABASE DB2 CLONE DB1. Now we have a new database called DB2 which is exactly the same as DB1. But unlike SQL Server copy database, Snowflake CLONE statement does not actually make a copy. It only puts a marker in DB1 at a particular point in time, labelled as DB2. And as we update DB1, those updates are not applied to DB2. And vice versa, when we update DB2, you will not find that update in DB1.

Oh yes, you CLONE tables too. So you can say CREATE TABLE Table2 CLONE Table1. And voila, you now have a new table called Table2, which is exactly like Table1. And you don’t occupy any disk space when doing that. Isn’t it amazing?

But the beauty of time travel in Snowflake is that at the end of that CLONE statement we can add “AT”. So we can say: CREATE DATABASE DB2 CLONE DB1 AT ‘2024-04-15 07:30:00’. And now we have a new database called DB2 which is exactly the same as DB1 on 15th of April. And yes, before you ask, you can do that with tables too. When I found this, my jaw dropped to the floor. That is an amazing feature.

Now the boring bit. The caveats. By default those time travel feature is only set for 1 day. In your Snowflake account there is something called DATA_RETENTION_TIME_IN_DAYS. If you set this to 90 you’ll have time travel feature set for 90 days. And yes you need Enterprise edition for this, which is 1.5x more expensive than the Standard edition.

Related to backup there is one more thing I’d like to mention: unload into files. Using COPY INTO you can bulk unload your tables into text files in S3 or ADLS. You can unload all the data into a single file, or into multiple files. You can then load those files back into your tables/database. But who wants to do that when you have the CLONE AT and SELECT AT features?

PS. For those of you who are interested in trying out the SELECT AT, to use timestamp data type the syntax is like this:

SELECT * FROM TABLE1 AT (TIMESTAMP => '2024-04-28 10:58:00'::TIMESTAMP);

something like SELECT contract_date, contract_value FROM contract BETWEEN ‘2024-01-15 07:30:00’ AND ‘2024-04-15 07:30:00’ would be nice -we'd see the 'transaction history or/alternatively SELECT [first [n] | last [n] | * ] TRANSACTIONS (contract_date, contract_value) from contract [BETWEEN fromDate toDate] ..... etc etc

Andre Wihardjo

build your tomorrow

10 个月

Very insightful article Vincent Rainardi. Thanks for sharing this.

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • Snowflake vs SQL Server

    Snowflake vs SQL Server

    Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables…

    6 条评论
  • Data engineer becoming solution architect

    Data engineer becoming solution architect

    Are you a data engineer thinking about transitioning to a cloud solution architect? Data engineer are good with…

    2 条评论
  • Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    If you work in banking or investment or any other sector in financial services, you might be wondering about the above.…

  • Data Warehousing Basics: Cost

    Data Warehousing Basics: Cost

    If you call yourself a data engineer you need to be aware of 2 additional things compared to a developer. The first one…

    2 条评论
  • My Linkedin post & articles

    My Linkedin post & articles

    The list below goes back to Nov 2024. For older than that see here.

    9 条评论
  • Data Warehousing Basics: Single Customer View

    Data Warehousing Basics: Single Customer View

    Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance…

    2 条评论
  • Data Warehousing Basics: NFR

    Data Warehousing Basics: NFR

    What I’m about to tell you today failed a lot of data warehousing projects which is why it’s worth paying attention so…

    1 条评论
  • ML and AI - What's the difference?

    ML and AI - What's the difference?

    Machine Learning covers about 20-30 algorithms such as Logistic Regression, Decision Tree, Gradient Boosting, Random…

    4 条评论
  • Microsoft Fabric or Synapse Analytics?

    Microsoft Fabric or Synapse Analytics?

    When it comes to Data Warehousing, Microsoft is confusing. Why? Because it has Microsoft Fabric and it also has Synapse…

    16 条评论
  • Data Warehousing Basics: Transformations

    Data Warehousing Basics: Transformations

    As Bill Inmon said, T is the most difficult thing to do in the ETL. And that is why ETL vendors swap it around - they…

    2 条评论

社区洞察

其他会员也浏览了