Databricks SQL - The new Cloud Data Ware(Lake)house
Deepak Rajak
Data Engineering /Advanced Analytics Technical Delivery Lead at Exusia, Inc.
Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS Redshift, Google BigQuery & Azure Synapse Analytics.
Databricks SQL allows customers to operate a multi-cloud?Lakehouse architectre that provides data warehousing performance at data lake economics.
In this article, We will learn step by step process to get started with Databricks SQL Analytics.
Note- The purpose of this article not to compare these different technologies, the purpose here to learn - How to work on Databricks SQL Analytics
Step1: Login to Databricks Workspace. We see 3 workspace, I am going into the first one Data Science & Engineering
Step2: Create a Database & we will use this database to register table in this instead of the default database
Step3: Read the file, Create Dataframe & save the table
Step4: Query the table to check if everything is working fine. We also can see the table - "departdelay" is appearing inside database - "dbtraining"
Now Let's Jump to SQL Analytics workspace.
We will query this table - "departdelay" from SQL Analytics workspace.
Additionally we will also create a Table in SQL Analytics & can query or use from Data Science & Engineering workspace.
Step1: Select "SQL" in the workspace tab below the Machine Learning.
Step2: Click on SQL Endpoints. Basically its a cluster to execute our queries. Automatically a starter end point is getting created for us of size X ( T-shirt sizing like Snowflake )
Step3: You can optionally create the End point of your choice if you need larger compute [ Like the below ] . It also have multi cluster option like Snowflake. You can also enable the ultra fast Photon engine.
领英推荐
I am fine with X size as of now. Wait for few minutes for up & running the cluster.
Step4: Click on the Queries Tab. You will be navigated to the new query window.
Step5: Create your first query. Select Database, Table & Endpoint of your choice.
We can see here the same Database & Table which we have created in the - Data Science & Engineering workspace. Why? because both are using the same Hive Metastore.
Step6: Let's create a Table here.
Step7: You can see an additional table got created in our database. Let's query it
Step8: Now let's jump again to - "Data Science & Engineering" workspace & check the Data Tab.
Yes we have an additional table which we have created in the SQL Analytics workspace. Now we can use this table from the Data Science & Engineering workspace for any purpose.
So we have seen how beautifully the SQL Analytics Workspace is integrated with Data Science & Engineering Workspace.
We can also create Dashboards in SQL Analytics, We can connect the Tables from Tableau and Microsoft Power BI via the SQL Endpoints & secure token. ( I will cover that in my next write up on SQL Analytics )
Thats it. We are done.
This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.?
Thanks !! Happy Learning !!
Data Analysis | Data Visualization | Bigdata | Power BI
1 年pls. share step of databricks stup in AWS
Learning Data architecture ...& more
2 年Hi Deepak - As I understand photon is mostly for Analytics purpose as its relatively costly offering from DataBricks. I see X-Large is (80DBU). For data engineering perspective normal data engineering workspace is right.
--
2 年what about hive metastore and how did u pull it? is it possible in community edition?
Data Engineer | Snowflake | Azure Databricks| PYSPARK| Azure Data Factory |SQL | ADF | Big Query | DBT | Airflow | Azure | GCP| Airflow |Informatica | OBIEE | Power BI| MicroStrategy
2 年Hi Deepak, I couldn't see the SQL , i can see only Data science& Engineering and Machine learning. Can you please provide the steps to enable SQL analytics
Consultant at Deloitte USI | ETL Developer | Talend | SQL | Amazon Redshift | Amazon S3 | Teradata | DWH
2 年+ Saibharadwaj C.heekoti