ADF Copy Data: Copy Data From Azure Blob Storage To A SQL Database Using Azure Data Factory
Atul Kumar
Founder at K21Academy | Cloud Computing, AWS, Azure, Generative AI & ML Expert | Helping You Succeed in Your Cloud & Tech Journey | IT & Services Specialist | Transforming Careers Through Training
Today I am going to share the case study of ADF copy data from Blob storage to a SQL Database with Azure Data Factory (ETL service) which I will be discussing in detail in my Azure Data ?Engineering Free Class..
Haven't registered for it Till Now? There's still time to join my class! In this session, I'll dive deep into the Integrated Analytics Platform Azure Synapse Analytics, exploring not just the platform but also delivering insights into current job opportunities and market trends.
Register Now: https://bit.ly/3uQxVUu
Overview Of Azure Blob Storage
Azure Blob Storage is Microsoft’s Azure object storage solution for the cloud. It is designed to optimize and store massive amounts of unstructured data.
It is used for Streaming video and audio, writing to log files, Storing data for backup and restore disaster recovery, and archiving.
Azure Blob storage offers three types of resources:
Objects in Azure Blob storage are accessible via the Azure PowerShell, Azure Storage REST API, Azure CLI, or an Azure Storage client library.
Overview Of Azure SQL Database
It is a fully managed platform as a service. Here the platform manages aspects such as database software upgrades, patching, backups, the monitoring. Using Azure SQL Database, we can provide a highly available and performant storage layer for our applications.
Types of Deployment Options for the SQL Database:
Azure SQL Database offers three service tiers:
ADF Copy Data From Blob Storage To SQL Database
STEP 1: Create a blob and a SQL table
1) Create a source blob, and launch Notepad on your desktop. Copy the following text and save it in a file named input Emp.txt on your disk.
FirstName|LastName
John|Doe
Jane|Doe
2) Create a container in your Blob storage. The container is named adftutorial.
3) Upload the emp.txt file to the adfcontainer folder.
4) Create a sink SQL table, Use the following SQL script to create a table named dbo.emp in your SQL Database.
CREATE TABLE dbo.emp
(
ID int IDENTITY(1,1) NOT NULL,
FirstName varchar(50),
LastName varchar(50)
)
GO
CREATE CLUSTERED INDEX IX_emp_ID ON dbo.emp (ID);
Note: Ensure that Allow access to Azure services is turned ON for your SQL Server so that Data Factory can write data to your SQL Server. To verify and turn on this setting, go to logical SQL server > Overview > Set server firewall> set the Allow access to Azure services option to ON.
STEP 2: Create a data factory
1) Sign in to the Azure portal. Select Analytics > Select Data Factory.
2) On The New Data Factory Page, Select Create
3) On the Basics Details page, Enter the following details. Then Select Git Configuration
领英推荐
4) On the Git configuration page, select the check box, and then Go To Networking. Then select Review+Create
5)?After the creation is finished, the Data Factory home page is displayed. select the?Author & Monitor tile.
STEP 3: Use the ADF Copy Data tool to create a pipeline
1) Select the + (plus) button, and then select Pipeline.
2) In the General panel under Properties, specify CopyPipeline for Name. Then collapse the panel by clicking the Properties icon in the top-right corner.
3) In the Activities toolbox, expand Move & Transform. Drag the Copy Data activity from the Activities toolbox to the pipeline designer surface. You can also search for activities in the Activities toolbox. Specify CopyFromBlobToSql?for Name.
4)? Go to the Source tab. Select + New to create a source dataset.
5) In the New Dataset dialog box, select Azure Blob Storage to copy data from azure blob storage, and then select Continue.
6) In the Select Format dialog box, choose the format type of your data, and then select Continue.
7) In the Set Properties dialog box, enter SourceBlobDataset for Name. Select the checkbox for the first row as a header. Under the Linked service text box, select + New.
8) In the New Linked Service (Azure Blob Storage) dialog box, enter AzureStorageLinkedService as name, select your storage account from the Storage account name list. Test connection, select Create to deploy the linked service.
9) After the linked service is created, it’s navigated back to the Set properties page. Nextto File path, select Browse. Navigate to the adftutorial/input folder, select the emp.txt file, and then select OK
10) Select OK. It automatically navigates to the pipeline page. In the Source tab, confirm that SourceBlobDataset is selected. To preview data on this page, select Preview data.
11) Go to the Sink tab, and select + New to create a sink dataset. . In the New Dataset dialog box, input “SQL” in the search box to filter the connectors, select Azure SQL Database, and then select Continue.
12) In the Set Properties dialog box, enter OutputSqlDataset for Name. From the Linked service dropdown list, select + New.
13) In the New Linked Service (Azure SQL Database) dialog box, fill the following details.
14) Test Connection may be failed. Go to your Azure SQL database, Select your database. Go to Set Server Firewall setting page. On the Firewall settings page, Select yes in Allow Azure services and resources to access this server. Then Save settings
15) On the New Linked Service (Azure SQL Database) Page, Select Test connection to test the connection. Then Select Create to deploy the linked service.
16)?It automatically navigates to the Set Properties dialog box. In Table, select [dbo].[emp].Then select OK.
17) To validate the pipeline, select Validate from the toolbar.
18) Once the pipeline can run successfully, in the top toolbar, select Publish all. Publishes entities (datasets, and pipelines) you created to Data Factory. Select Publish.
19) Select Trigger on the toolbar, and then select Trigger Now.? On the Pipeline Run page, select OK.
20)?Go to the Monitor tab on the left. You see a pipeline run that is triggered by a manual trigger. You can use links under the PIPELINE NAME column to view activity details and to rerun the pipeline.
21) To see activity runs associated with the pipeline run, select the CopyPipeline link under the PIPELINE NAME column.
22) Select All pipeline runs at the top to go back to the Pipeline Runs view. To refresh the view, select Refresh.
23)?Verify that you create a Copy data from Azure Blob storage to a database in Azure SQL Database by using Azure Data Factory is Succeeded
Congratulations! You just use the Copy Data tool to create a pipeline and Monitor the pipeline and activity run successfully.
If you found value in this content, please drop "Informative" comment below and Follow me for more insightful articles like this.
Also, don't miss out on joining my complimentary class on Azure Cloud —I look forward to seeing you there!??
System Engineer Java | Python |Dotnet | HTML | CSS |JavaScript
5 天前Informative