Databricks SQL - The new Cloud Data Ware(Lake)house

Databricks SQL - The new Cloud Data Ware(Lake)house

Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS Redshift, Google BigQuery & Azure Synapse Analytics.

Databricks SQL allows customers to operate a multi-cloud?Lakehouse architectre that provides data warehousing performance at data lake economics.

In this article, We will learn step by step process to get started with Databricks SQL Analytics.

Note- The purpose of this article not to compare these different technologies, the purpose here to learn - How to work on Databricks SQL Analytics

Step1: Login to Databricks Workspace. We see 3 workspace, I am going into the first one Data Science & Engineering

No alt text provided for this image


Step2: Create a Database & we will use this database to register table in this instead of the default database

No alt text provided for this image


Step3: Read the file, Create Dataframe & save the table

No alt text provided for this image


Step4: Query the table to check if everything is working fine. We also can see the table - "departdelay" is appearing inside database - "dbtraining"

No alt text provided for this image
No alt text provided for this image

Now Let's Jump to SQL Analytics workspace.

We will query this table - "departdelay" from SQL Analytics workspace.

Additionally we will also create a Table in SQL Analytics & can query or use from Data Science & Engineering workspace.

Step1: Select "SQL" in the workspace tab below the Machine Learning.

No alt text provided for this image

Step2: Click on SQL Endpoints. Basically its a cluster to execute our queries. Automatically a starter end point is getting created for us of size X ( T-shirt sizing like Snowflake )

No alt text provided for this image

Step3: You can optionally create the End point of your choice if you need larger compute [ Like the below ] . It also have multi cluster option like Snowflake. You can also enable the ultra fast Photon engine.

No alt text provided for this image


I am fine with X size as of now. Wait for few minutes for up & running the cluster.

Step4: Click on the Queries Tab. You will be navigated to the new query window.

No alt text provided for this image

Step5: Create your first query. Select Database, Table & Endpoint of your choice.

We can see here the same Database & Table which we have created in the - Data Science & Engineering workspace. Why? because both are using the same Hive Metastore.

No alt text provided for this image

Step6: Let's create a Table here.

No alt text provided for this image


Step7: You can see an additional table got created in our database. Let's query it

No alt text provided for this image

Step8: Now let's jump again to - "Data Science & Engineering" workspace & check the Data Tab.

Yes we have an additional table which we have created in the SQL Analytics workspace. Now we can use this table from the Data Science & Engineering workspace for any purpose.

No alt text provided for this image
No alt text provided for this image


So we have seen how beautifully the SQL Analytics Workspace is integrated with Data Science & Engineering Workspace.

We can also create Dashboards in SQL Analytics, We can connect the Tables from Tableau and Microsoft Power BI via the SQL Endpoints & secure token. ( I will cover that in my next write up on SQL Analytics )

Thats it. We are done.

This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.?

Thanks !! Happy Learning !!


Shweta Gupta

Data Analysis | Data Visualization | Bigdata | Power BI

1 年

pls. share step of databricks stup in AWS

回复
Chandan Nandy

Learning Data architecture ...& more

2 年

Hi Deepak - As I understand photon is mostly for Analytics purpose as its relatively costly offering from DataBricks. I see X-Large is (80DBU). For data engineering perspective normal data engineering workspace is right.

回复

what about hive metastore and how did u pull it? is it possible in community edition?

回复
Narayana Ankireddypalli

Data Engineer | Snowflake | Azure Databricks| PYSPARK| Azure Data Factory |SQL | ADF | Big Query | DBT | Airflow | Azure | GCP| Airflow |Informatica | OBIEE | Power BI| MicroStrategy

2 年

Hi Deepak, I couldn't see the SQL , i can see only Data science& Engineering and Machine learning. Can you please provide the steps to enable SQL analytics

  • 该图片无替代文字
回复
Harish Ramesh Siripuram

Consultant at Deloitte USI | ETL Developer | Talend | SQL | Amazon Redshift | Amazon S3 | Teradata | DWH

2 年

要查看或添加评论,请登录

社区洞察

其他会员也浏览了