登录查看更多内容

Step-by-Step Guide to Creating a Copy Activity Pipeline in Azure Data Factory

Shubham Sharma

Software Development Consultant at Microsoft

发布日期: 2024年4月12日

Azure Data Factory is a managed cloud service for data integration, automating data movement and transformation. It orchestrates existing services to collect raw data and transform it into actionable insights.

Let’s dive deeper into creating link services, datasets, pipelines, and copy activities in Azure Data Factory. I’ll provide step-by-step instructions and include relevant code snippets.

Prerequisites

Before we begin, ensure you have the following:

An Azure SQL Database set up with the target table where you want to load data.
A CSV file (or any other supported format) available in Azure Blob Storage that contains the data you want to copy.

Creating Linked Services

Linked services are essential for connecting your data stores to Azure Data Factory. They act as connection strings, defining how ADF connects to external resources. Here’s how to create them:

Creating an Azure Blob Storage Linked Service

In the Azure Data Factory Studio, navigate to the Author tab (pencil icon).
Click the plus sign and choose Linked Service.
Select Azure Blob Storage as the connector.
Configure the service details, including the storage account connection string.
Test the connection and create the new linked service.

{
  "name": "AzureBlobStorageLinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "AzureBlobStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net"
    }
  }
}

Creating an Azure SQL Database Linked Service

Follow similar steps as above but choose Azure SQL Database as the connector.
Provide the necessary connection details, including server name, database name, username, and password.

{
  "name": "AzureSqlDatabaseLinkedService",
  "type": "Microsoft.DataFactory/factories/linkedservices",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=myserver.database.windows.net;Database=mydb;User ID=myuser;Password=mypassword;Encrypt=true;Connection Timeout=30;"
    }
  }
}

Creating Datasets

Datasets define the structure of your data within linked data stores. Let’s create two datasets: one for Azure Blob Storage and another for Azure SQL Database.

Creating an Azure Blob Storage Dataset

In the Data Factory Studio, select the Author tab.
Click the plus sign and choose Dataset.
Select Azure Blob Storage as the connector.
Configure the dataset properties, including the blob container and folder.

{
  "name": "AzureBlobDataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorageLinkedService",
      "type": "LinkedServiceReference"
    },
    "type": "AzureBlob",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder",
      "format": {
        "type": "TextFormat",
        "columnDelimiter": ","
      }
    }
  }
}

Creating an Azure SQL Table Dataset

Follow similar steps as above, but choose Azure SQL Table as the connector.
Specify the table name and map columns if needed.

{
  "name": "AzureSqlTableDataset",
  "type": "Microsoft.DataFactory/factories/datasets",
  "properties": {
    "linkedServiceName": {
      "referenceName": "AzureSqlDatabaseLinkedService",
      "type": "LinkedServiceReference"
    },
    "type": "AzureSqlTable",
    "typeProperties": {
      "tableName": "MyTable"
    }
  }
}

Creating a Pipeline

Add a new pipeline in the Data Factory Studio.
Drag and drop activities onto the canvas, such as Copy Activity.
Configure the source and sink datasets, mapping columns, and any transformations.

领英推荐

Streamline SQL Workflow with Snowflake Copilot

Factspan 5 个月前

Oracle to Snowflake – ETL Your Data in Minutes with…

Lyftrondata 7 个月前

Unveiling the Latest in SQL: 2024 Trends Breakdown

SplashBI 11 个月前

Configuring the Copy Activity

In the Copy Activity, specify the source (AzureBlobDataset) and sink (AzureSqlTableDataset).
Map columns if needed.
Set up fault tolerance options (e.g., skip incompatible rows).

{
  "name": "MyCopyPipeline",
  "properties": {
    "activities": [
      {
        "name": "MyCopyActivity",
        "type": "Copy",
        "inputs": [
          {
            "referenceName": "AzureBlobDataset",
            "type": "DatasetReference"
          }
        ],
        "outputs": [
          {
            "referenceName": "AzureSqlTableDataset",
            "type": "DatasetReference"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "BlobSource"
          },
          "sink": {
            "type": "SqlSink"
          },
          "translator": {
            "type": "TabularTranslator",
            "mappings": [
              {
                "source": {
                  "name": "Column1"
                },

Running the Pipeline

After creating your Azure Data Factory pipeline, you’ll want to execute it. Here are a few ways to do that:

Manual Execution.
Scheduled Execution.
Event-Driven Execution.

Additional Tips for Effective Pipeline Development:

Save Your Changes and Refresh ADF Before Publishing:

Always refresh your Azure Data Factory (ADF) before hitting the “Publish” button to avoid overwriting your colleagues’ changes.
This simple step can prevent unnecessary frustration.

Find Dependencies Before Modifying ADF Components:

Check related properties to identify who is using your ADF objects (Linked Services, Datasets, Pipelines, Data Flows).
Understand dependencies before making changes to avoid unexpected issues.

Clone ADF Components for Troubleshooting:

Clone Datasets, Data Flows, or Pipelines when troubleshooting issues.
Remove recently changed code and gradually add tasks back to identify the root cause.

Use Annotations for Easy Tracking:

Annotate your ADF components with notes, rules, expressions, or other relevant information.
Easily track and trace values used during pipeline execution.

Parameterize Everything:

Use parameters, variables, and global parameters for reusability.
Parameterize Linked Services, Datasets, and Data Flows.
Simplify maintenance and make your pipelines adaptable.

Final Thoughts

Azure Data Factory is a powerful tool for data engineering. As your database grows, ADF can help manage complex data movement scenarios. With features like Azure Functions integration, the possibilities are endless. Even connecting to non-Microsoft databases is feasible! Remember to explore the official documentation and community resources to continue learning and optimizing your data pipelines.

Happy data engineering! ????

!Azure Data Factory

For more insights and community discussions, check out the Azure Data Factory Blog.

要查看或添加评论，请登录

Shubham Sharma的更多文章

??Unlocking the Power of Azure AD B2C: A Comprehensive Guide ??

2024年11月11日

??Unlocking the Power of Azure AD B2C: A Comprehensive Guide ??

In today's digital landscape, identity and access management (IAM) is crucial for businesses handling consumer-facing…

1 条评论
Mastering SQL Server Indexes: A Comprehensive Guide to Boosting Query Performance

2024年11月11日

Mastering SQL Server Indexes: A Comprehensive Guide to Boosting Query Performance

Blog Content: Indexes are powerful tools in SQL Server that significantly enhance query performance by reducing the…

1 条评论

Step-by-Step Guide to Creating a Copy Activity Pipeline in Azure Data Factory

Shubham Sharma

Software Development Consultant at Microsoft

Prerequisites

Creating Linked Services

Creating an Azure Blob Storage Linked Service

Creating an Azure SQL Database Linked Service

Creating Datasets

Creating an Azure Blob Storage Dataset

Creating an Azure SQL Table Dataset

Creating a Pipeline

领英推荐

Configuring the Copy Activity

Running the Pipeline

Additional Tips for Effective Pipeline Development:

Final Thoughts

Shubham Sharma的更多文章

社区洞察

其他会员也浏览了

Synapse Analytics Dedicated SQL Pools – Everything you need to know!

ADF Copy Data: Copy Data From Azure Blob Storage To A SQL Database Using Azure Data Factory

Overview of Discord's data platform that daily processes petabytes of data

New SQL Database in Microsoft Fabric: Unified Platform for Data Analytics

SQL Pool in Azure

?? Materialized Views vs. Dynamic Views in Databricks SQL

Azure Data and Power BI News (September 2022)

The Magic of Fabric SQL Database: The All-in-One Platform for Data and AI

Getting Started with Azure Data Factory: Key Components and Initial Setup

How to test Azure Data Pipeline?

Prerequisites

Creating Linked Services

Creating an Azure Blob Storage Linked Service

Creating an Azure SQL Database Linked Service

Creating Datasets

Creating an Azure Blob Storage Dataset

Creating an Azure SQL Table Dataset

Creating a Pipeline

领英推荐

Configuring the Copy Activity

Running the Pipeline

Additional Tips for Effective Pipeline Development:

Final Thoughts

Shubham Sharma的更多文章

??Unlocking the Power of Azure AD B2C: A Comprehensive Guide ??

Mastering SQL Server Indexes: A Comprehensive Guide to Boosting Query Performance

社区洞察

其他会员也浏览了

Synapse Analytics Dedicated SQL Pools – Everything you need to know!

ADF Copy Data: Copy Data From Azure Blob Storage To A SQL Database Using Azure Data Factory

Overview of Discord's data platform that daily processes petabytes of data

New SQL Database in Microsoft Fabric: Unified Platform for Data Analytics

SQL Pool in Azure

?? Materialized Views vs. Dynamic Views in Databricks SQL

Azure Data and Power BI News (September 2022)

The Magic of Fabric SQL Database: The All-in-One Platform for Data and AI

Getting Started with Azure Data Factory: Key Components and Initial Setup

How to test Azure Data Pipeline?