UNDROP in Snowflake

In Snowflake you have UNDROP command, which recovers deleted tables. By default the data retention period in Snowflake is 1 day, so after accidentally dropping a table, or running CTAS (CREATE OR REPLACE TABLE AS) then you can recover that table by simply running:

ALTER TABLE DWH.TABLE1 RENAME TO DWH.TABLE1_BACKUP; 
UNDROP TABLE DWH.TABLE1;        

Replace DWH with your schema name, e.g. STAGING, etc.

And if you dropped many tables, well hopefully not, but if you did, then script it up in Excel. Put the table names in column A like this, and write a simple formula to rename and undrop in column B:

And just copy down that cell B2. Copy paste column B to Snowflake and click "Run All" on the top right hand corner.

And voila! All your tables are restored!

Afterwards, you can just drop those "_BACKUP" tables. Use the same sheet as above, which has TABLE1 to TABLE100 on column A. Just replace cell B2 with "DROP TABLE DWH." & A2 & ";" and copy it down to B100.

Right that's it then. Oh and if you use dbt and you want to find out which source tables are used for all your warehouse models, simply type this:

dbt ls -s +models/dwh/*.sql -r source

Replace dwh with your folder name.

Say you have 100 source tables, but your warehouse only use 60 of them, then this command will list those 60 tables.

The ls command lists down the model names. It doesn't run or build them, just listing them down.

The + sign indicates that you want to include any models that makes up on your warehouse tables. And that includes all your staging tables, snapshots, and all your source tables.

The -r is a short for --resource-type, which accepts parameters like models, seed, test, source and snapshots. Above I use "source" which means that it will list all the source tables.

Second one: SELECT

The SELECT STATEMENT in Snowflake has a few features: EXCLUDE, RENAME, REPLACE.

If you use SELECT *, you can exclude a column like this:

SELECT * EXCLUDE COLUMN1 FROM TABLE1;

Why would you want to exclude a column?

Because in dbt, you work in layers. And a big models with lots of CTEs will perform faster if it is broken down into a few models.

Here's an example. You start by selecting columns from staging tables, then pass down to the next CTE

WITH CTE1 AS
( SELECT COL1, COL2, COL3 FROM STAGE.TABLE1
  UNION
  SELECT COL1, COL2, COL3 FROM STAGE.TABLE2
),
CTE2 AS
( SELECT *, 
	CASE WHEN T3.COL2 = 2 THEN C1.COL2
		 WHEN T3.COL2 = 3 THEN C1.COL3
		 ELSE 'DEFAULT'
	END AS COL4
  FROM CTE1 AS C1
  INNER JOIN STAGE.TABLE3 AS T3 ON C1.COL1 = T3.COL1
),
CTE3 AS
( SELECT *,
	...
  FROM CTE2 
  LEFT JOIN ...
)
SELECT DISTINCT *,
	...
FROM CTE3
LEFT JOIN ...        

So with each CTE, you are adding more and more columns, and more and more logic. That's the dbt world for you, it's all about transforming data, in layers. And sometimes when a model takes more than 5 seconds to build, it's time to break it into 2-3 pieces so that it will only take a second or two to build.

Now, during this final SELECT DISTINCT *, sometimes you need to exclude a column which was an intermediate calculation because that column is making the rows split into many rows. Columns like customer/product identifiers, etc. That's when you use EXCLUDE in your SELECT clause. That's in Snowflake, not in other database platform though.

In Snowflake you can also use RENAME or REPLACE in your SELECT * statement.

With REPLACE you need to specify the exact column name, like RENAME PRODUCT_NAME AS PRODUCT_DESCRIPTION.

With RENAME you can rename many columns in one go, for example: REPLACE ('PRODUCT_LEVEL_' || GROUP_ID AS 'PRODUCT_GROUP_' || GROUP_ID). This renames PRODUCT_LEVEL_1 column to PRODUCT_GROUP_1, PRODUCT_LEVEL_2 to PRODUCT_GROUP_2, and so on.

That's it folks. Two things about Snowflake: UNDROP and SELECT in Snowflake.

Keep learning!

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • Data Warehousing Basics: Analytics

    Data Warehousing Basics: Analytics

    If you just started in Data Warehousing, before you do anything, you need to understand what it’s all about. It’s about…

  • Data Warehousing Basics: Dimensional Model

    Data Warehousing Basics: Dimensional Model

    If you build a warehouse or a lakehouse for analytics or reporting, in most cases the best data model is dimensional…

    5 条评论
  • DQ Engineering

    DQ Engineering

    DQ stands for Data Quality. If you don't have a background in data quality, read this first: https://www.

    7 条评论
  • Data Product

    Data Product

    For those of you who don't know what a data product and “data as a product” are, please read this first:…

    13 条评论
  • Snowflake vs SQL Server

    Snowflake vs SQL Server

    Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables…

    6 条评论
  • Data engineer becoming solution architect

    Data engineer becoming solution architect

    Are you a data engineer thinking about transitioning to a cloud solution architect? Data engineer are good with…

    2 条评论
  • Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    If you work in banking or investment or any other sector in financial services, you might be wondering about the above.…

  • Data Warehousing Basics: Cost

    Data Warehousing Basics: Cost

    If you call yourself a data engineer you need to be aware of 2 additional things compared to a developer. The first one…

    2 条评论
  • My Linkedin post & articles

    My Linkedin post & articles

    The list below goes back to Nov 2024. For older than that see here.

    10 条评论
  • Data Warehousing Basics: Single Customer View

    Data Warehousing Basics: Single Customer View

    Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance…

    2 条评论