登录查看更多内容

Databricks SQL - The new Cloud Data Ware(Lake)house

Deepak Rajak

Data Engineering /Advanced Analytics Technical Delivery Lead at Exusia, Inc.

发布日期: 2021年11月10日

+ 关注

Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS Redshift, Google BigQuery & Azure Synapse Analytics.

Databricks SQL allows customers to operate a multi-cloud?Lakehouse architectre that provides data warehousing performance at data lake economics.

In this article, We will learn step by step process to get started with Databricks SQL Analytics.

Note- The purpose of this article not to compare these different technologies, the purpose here to learn - How to work on Databricks SQL Analytics

Step1: Login to Databricks Workspace. We see 3 workspace, I am going into the first one Data Science & Engineering

Step2: Create a Database & we will use this database to register table in this instead of the default database

Step3: Read the file, Create Dataframe & save the table

Step4: Query the table to check if everything is working fine. We also can see the table - "departdelay" is appearing inside database - "dbtraining"

Now Let's Jump to SQL Analytics workspace.

We will query this table - "departdelay" from SQL Analytics workspace.

Additionally we will also create a Table in SQL Analytics & can query or use from Data Science & Engineering workspace.

Step1: Select "SQL" in the workspace tab below the Machine Learning.

Step2: Click on SQL Endpoints. Basically its a cluster to execute our queries. Automatically a starter end point is getting created for us of size X ( T-shirt sizing like Snowflake )

Step3: You can optionally create the End point of your choice if you need larger compute [ Like the below ] . It also have multi cluster option like Snowflake. You can also enable the ultra fast Photon engine.

Arpit Bhayani 1 年前

Synapse Analytics Dedicated SQL Pools – Everything you…

Nikola Ilic 1 年前

Databricks SQL Series — Introduction to Databricks SQL…

Krishna Yogi Kolluru 4 个月前

I am fine with X size as of now. Wait for few minutes for up & running the cluster.

Step4: Click on the Queries Tab. You will be navigated to the new query window.

Step5: Create your first query. Select Database, Table & Endpoint of your choice.

We can see here the same Database & Table which we have created in the - Data Science & Engineering workspace. Why? because both are using the same Hive Metastore.

Step6: Let's create a Table here.

Step7: You can see an additional table got created in our database. Let's query it

Step8: Now let's jump again to - "Data Science & Engineering" workspace & check the Data Tab.

Yes we have an additional table which we have created in the SQL Analytics workspace. Now we can use this table from the Data Science & Engineering workspace for any purpose.

So we have seen how beautifully the SQL Analytics Workspace is integrated with Data Science & Engineering Workspace.

We can also create Dashboards in SQL Analytics, We can connect the Tables from Tableau and Microsoft Power BI via the SQL Endpoints & secure token. ( I will cover that in my next write up on SQL Analytics )

Thats it. We are done.

This marks the end to this article. I hope, I am able to provide you something new to learn. Thanks for reading, Please provide your feedback in the comment section. Please like & share if you have liked the content.?

Thanks !! Happy Learning !!

Shweta Gupta

Data Analysis | Data Visualization | Bigdata | Power BI

1 年

pls. share step of databricks stup in AWS

Chandan Nandy

Learning Data architecture ...& more

2 年

Hi Deepak - As I understand photon is mostly for Analytics purpose as its relatively costly offering from DataBricks. I see X-Large is (80DBU). For data engineering perspective normal data engineering workspace is right.

murrthuza ali

2 年

what about hive metastore and how did u pull it? is it possible in community edition?

Narayana Ankireddypalli

2 年

Hi Deepak, I couldn't see the SQL , i can see only Data science& Engineering and Machine learning. Can you please provide the steps to enable SQL analytics

Harish Ramesh Siripuram

2 年

+ Saibharadwaj C.heekoti

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Databricks SQL - The new Cloud Data Ware(Lake)house

Deepak Rajak

Data Engineering /Advanced Analytics Technical Delivery Lead at Exusia, Inc.

Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS Redshift, Google BigQuery & Azure Synapse Analytics.

Databricks SQL allows customers to operate a multi-cloud?Lakehouse architectre that provides data warehousing performance at data lake economics.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Databricks SQL Series — Optimizing Data Queries with Databricks SQL — Part 2

The Future of SQL: Why It Remains Essential in Data Analytics

The Latest in Distributed SQL - September

Building a Data Warehouse on Google Cloud Platform That Scales With the Business

The Latest In Distributed SQL - July

The Data Story of Powerplay (Part-2)

Understanding the Differences Between Elasticsearch and SQL Databases

Databricks SQL Analytics

Querying hierarchical data in Snowflake

Top 9 Important Tools that every Data Engineer Needs

Databricks SQL is a product offering from Databricks which they are pitching against the likes of Snowflake, AWS Redshift, Google BigQuery & Azure Synapse Analytics.

Databricks SQL allows customers to operate a multi-cloud?Lakehouse architectre that provides data warehousing performance at data lake economics.

领英推荐

Multi Tasks Job in Databricks

2022年1月12日

Deploying Databricks on Azure

2022年1月10日

Create Tables in Databricks & Query it from AWS Athena

2021年11月8日

AWS Glue Data Catalog as the Metastore for Databricks

2021年11月1日

Deploying Databricks on AWS

2021年10月31日

Danny's Diner Case Study using Pyspark on Databricks

2021年10月6日

Azure Cloud Data Engineering

2021年6月8日

Deploying Databricks on Google Cloud Platform

2021年4月13日

CI / CD in Azure Databricks using Azure DevOps

2021年4月9日

Read / Write from AWS S3 , Azure DataLake Storage & Google Cloud Storage without mounting via Databricks

2021年4月3日

社区洞察

其他会员也浏览了

Databricks SQL Series — Optimizing Data Queries with Databricks SQL — Part 2

The Future of SQL: Why It Remains Essential in Data Analytics

The Latest in Distributed SQL - September

Building a Data Warehouse on Google Cloud Platform That Scales With the Business

The Latest In Distributed SQL - July

The Data Story of Powerplay (Part-2)

Understanding the Differences Between Elasticsearch and SQL Databases

Databricks SQL Analytics

Querying hierarchical data in Snowflake

Top 9 Important Tools that every Data Engineer Needs