Making life easier with Hive Tables
Have you ever tried finding all the delta tables within all the databases in the hive catalog? It takes some serious python and dataframe coding to accomplish this task. The high level algorithm is below.
1 - grab a list of all databases
show databases;
2 - grab a list of all tables in a given database such as dim.
show tables from dim;
3 - for each table, grab detailed table information.
describe table extended dim.employee;
Obviously, there are some lists (dataframes) to be traversed to obtain this information. I will be putting this complete solution on my blog in the future. Is there a better way to solve this problem?
Databricks has implemented the information schema in the release of the unity catalog. As you can see, it went GA yesterday. Please check you Azure region for availability.
If you are using the unity catalog, this complex task is boiled down to the following pyspark query.
select * from information_schema.tables where DATA_SOURCE_FORMAT = 'delta'
In short, this is one of the many new and exciting features of the unity catalog.
Data Architect | Author
2 å¹´This is great to know, thanks for sharing. You just made my life much easier!