Preparing to take DP-900 Exam Hands-on experience Creating a Notebook in Azure Synapse Analytics
Architecture Created with app.diagram.net

Preparing to take DP-900 Exam Hands-on experience Creating a Notebook in Azure Synapse Analytics

Microsoft Learn LinkedIn for Learning Whizlabs

Currently Preparing to take my DP-900 Exam. Have been learning from Microsoft Learn content at?https://learn.microsoft.com/en-us/credentials/certifications/azure-data-fundamentals/?practice-assessment-type=certification?, Watching LinkedIn DP-900 Content at Prepare for the Azure Data Fundamentals (DP-900) Certification Exam (linkedin.com) and further decided to work on Hands-on labs from Whizlabs at https://www.whizlabs.com/microsoft-azure-certification-dp-900/ to familiarizing myself with resources used in Azure Data Analytics and how they are been used.

In this lab, I have learned to create a Notebook to connect to the Apache Spark pool and run Spark SQL queries against files and tables.

Below is the Architecture Diagram of Creating a Notebook in Azure Synapse Analytics

Creating a Synapse Analytics workspace

Login to Azure Portal and use the search bar to search for Synapse Analytics workspace as per the below Screenshot?

Select the Synapse Analytics workspace, you will be directed to the Workspace Page

You can now create a Synapse workspace by clicking on any of the below as per the screenshot

You will be redirected to a creation page?

You can create a new resource group or select an existing one if you do have one created already.

Choose your workspace name and remember the Workspace name below when choosing your resource name)

  • Descriptive Names: Each resource’s name should indicate its function or role. For instance, a virtual machine handling web traffic might be named ‘WebTrafficVM’
  • Use Delimiters: Azure doesn’t support spaces in names. Instead, use hyphens (-) or underscores (_) to separate terms. For example EastUS-Prod-WebApp
  • Avoid Special Characters: Except for a few exceptions, special characters can complicate command-line management and may be disallowed in many resource names.
  • Region & Environment Indication: Include a region (e.g., EastUS or WestEU) or environment role (e.g., Prod, Dev, Test) in the name for immediate context.
  • Lowercase for Storage Accounts: Azure storage account names should be all lowercase and globally unique (e.g., eastusprodwebapp)

Choose your preferred region (I have chosen East US)

Select the Data Lake Storage Gen2 (From Subscription as per the screenshot below)

Now you need to select or create a new Storage Account and the File System below is the definition of each

  • An?Azure Storage Account?is like a?cloud container?for your data in Microsoft Azure. It holds various types of data objects, including?blobs,?files,?queues, and?tables.
  • Azure Files?is like a?cloud file cabinet?where you store files. It’s a serverless cloud service that provides fully managed file shares accessible via?SMB?(Server Message Block) and?NFS?(Network File System) protocols.

Click on Next to Security Tab?

For Authentication Method Select both local and Microsoft Entra ID Authentication.

Create your Amin Login Credentials as preferred(Screenshot below) and leave other settings as defaults.

Click in Review + Create?

Your Synapse will be validated and get a success as per below:

Now Can click Create and wait while Synapse Deployment is in Progress(screenshot Below)

You will now get "Your Deployment is complete" Once everything goes well.

This means your Synapse workspace has been deployed?Successfully

Now go back to your Newly Created Synapse Workspace and select "New Apache Spark Pool"


Enter your Apache Spark Pool Details as per the below Screenshot/Preferred details According to your requirements

Automatic Pausing enabled(This feature releases resources after a set idle period, reducing the overall cost of your Apache Spark pool.)

Review and Create (New Apache Spark Pool get Evaluated)

Deployment in Progress

Deployment of New Apache Spark Pool Completed

Now we need to go back into the Azure Synapse and access the Synapse Studio

Below screenshot of the Azure Synapse Opening?

Once opened Hover over the Left Icon Tabs select Develop click the Plus Sign, select Notebook , name the notebook, and select the Pool to attach to which will be the Pool Created early on Publish

Copy the below command and paste it into the cell to create a simple Spark DataFrame object to manipulate and then click on?Run all. Make sure to delete the empty cell above.

new_rows = [('CA',22, 45000),("WA",35,65000) ,("WA",50,85000)]

demo_df = spark.createDataFrame(new_rows, ['state', 'age', 'salary'])

demo_df.show()



Once the code has finished running, information below the cell displays showing how long it took to run and its execution. The data now exists in a DataFrame from there you can use the data in many different ways.


Now, to create a Spark table, a CSV, and a Parquet file all with copies of the data, run the following command in another cell by clicking on?+ Code. Make sure to replace??<<TheNameOfAStorageAccountFileSystem>>?and?<<TheNameOfAStorageAccount>>?with the file system name and storage account name that you created while creating the Synapse Analytics workspace. You can find the storage account name and the file system name in Storage Accounts on the Azure portal.

demo_df.createOrReplaceTempView('demo_df')

demo_df.write.csv('demo_df', mode='overwrite')

demo_df.write.parquet('abfss://<<TheNameOfAStorageAccountFileSystem>>@<<TheNameOfAStorageAccount>>.dfs.core.windows.net/demodata/demo_df', mode='overwrite')

Now, navigate to?Data?Hub

Go to the?Linked?tab, under?Azure Data Lake Storage Gen2, select your Synapse analytics workspace, and then select your file system. You will see that it contains the?demodata?folder which was created when you ran the above command.


Now, open the folder by double-clicking on it. Notice in both the "CSV" and "parquet" formats, write operations a directory is created with many partitioned files.


Now, go back to your notebook copy the below command, and paste it in a new cell. Run the command to list the tables in the pool.

%%sql

SHOW TABLES

Now, to see the data in?demo_df, run the following command in a new cell:

%%sql

SELECT * FROM demo_df


You can also view the data in the form of a Chart. In the?View?switcher, select?Chart.


This makes it to the end of the Lab.

What we have achieved throughout the Lab?

  1. You have successfully logged into Azure Portal.
  2. You have successfully created a Synapse Analytics workspace.
  3. You have successfully created an Apache Spark pool with Azure Portal.
  4. You have successfully created a Notebook.
  5. You have successfully run Spark SQL queries.
  6. You have successfully deleted the resources.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了