UNDROP in Snowflake
In Snowflake you have UNDROP command, which recovers deleted tables. By default the data retention period in Snowflake is 1 day, so after accidentally dropping a table, or running CTAS (CREATE OR REPLACE TABLE AS) then you can recover that table by simply running:
ALTER TABLE DWH.TABLE1 RENAME TO DWH.TABLE1_BACKUP;
UNDROP TABLE DWH.TABLE1;
Replace DWH with your schema name, e.g. STAGING, etc.
And if you dropped many tables, well hopefully not, but if you did, then script it up in Excel. Put the table names in column A like this, and write a simple formula to rename and undrop in column B:
And just copy down that cell B2. Copy paste column B to Snowflake and click "Run All" on the top right hand corner.
And voila! All your tables are restored!
Afterwards, you can just drop those "_BACKUP" tables. Use the same sheet as above, which has TABLE1 to TABLE100 on column A. Just replace cell B2 with "DROP TABLE DWH." & A2 & ";" and copy it down to B100.
Right that's it then. Oh and if you use dbt and you want to find out which source tables are used for all your warehouse models, simply type this:
dbt ls -s +models/dwh/*.sql -r source
Replace dwh with your folder name.
Say you have 100 source tables, but your warehouse only use 60 of them, then this command will list those 60 tables.
The ls command lists down the model names. It doesn't run or build them, just listing them down.
The + sign indicates that you want to include any models that makes up on your warehouse tables. And that includes all your staging tables, snapshots, and all your source tables.
The -r is a short for --resource-type, which accepts parameters like models, seed, test, source and snapshots. Above I use "source" which means that it will list all the source tables.
Second one: SELECT
The SELECT STATEMENT in Snowflake has a few features: EXCLUDE, RENAME, REPLACE.
If you use SELECT *, you can exclude a column like this:
SELECT * EXCLUDE COLUMN1 FROM TABLE1;
Why would you want to exclude a column?
Because in dbt, you work in layers. And a big models with lots of CTEs will perform faster if it is broken down into a few models.
Here's an example. You start by selecting columns from staging tables, then pass down to the next CTE
WITH CTE1 AS
( SELECT COL1, COL2, COL3 FROM STAGE.TABLE1
UNION
SELECT COL1, COL2, COL3 FROM STAGE.TABLE2
),
CTE2 AS
( SELECT *,
CASE WHEN T3.COL2 = 2 THEN C1.COL2
WHEN T3.COL2 = 3 THEN C1.COL3
ELSE 'DEFAULT'
END AS COL4
FROM CTE1 AS C1
INNER JOIN STAGE.TABLE3 AS T3 ON C1.COL1 = T3.COL1
),
CTE3 AS
( SELECT *,
...
FROM CTE2
LEFT JOIN ...
)
SELECT DISTINCT *,
...
FROM CTE3
LEFT JOIN ...
So with each CTE, you are adding more and more columns, and more and more logic. That's the dbt world for you, it's all about transforming data, in layers. And sometimes when a model takes more than 5 seconds to build, it's time to break it into 2-3 pieces so that it will only take a second or two to build.
Now, during this final SELECT DISTINCT *, sometimes you need to exclude a column which was an intermediate calculation because that column is making the rows split into many rows. Columns like customer/product identifiers, etc. That's when you use EXCLUDE in your SELECT clause. That's in Snowflake, not in other database platform though.
In Snowflake you can also use RENAME or REPLACE in your SELECT * statement.
With REPLACE you need to specify the exact column name, like RENAME PRODUCT_NAME AS PRODUCT_DESCRIPTION.
With RENAME you can rename many columns in one go, for example: REPLACE ('PRODUCT_LEVEL_' || GROUP_ID AS 'PRODUCT_GROUP_' || GROUP_ID). This renames PRODUCT_LEVEL_1 column to PRODUCT_GROUP_1, PRODUCT_LEVEL_2 to PRODUCT_GROUP_2, and so on.
That's it folks. Two things about Snowflake: UNDROP and SELECT in Snowflake.
Keep learning!