Transform Your Data with Azure Data Factory

Transform Your Data with Azure Data Factory

Article will help you to understand the basic functionality of Azure Data Factory (ADF) and how powerful a tool it is when working with big data. Explore the basic architecture of ADF and get to know the components and services involved.

A Quick Intro to Azure Data Factory & Its Key Features

ADF is a cloud-based integration service for orchestrating and automating data movement and data transformation with 90 maintenance-free connectors built-in at no added cost. Easily construct ETL and ELT processes in a visual environment or write your own code.

Cut down operating expenses and capital expenditures and save precious time with these key features:

  • No code or maintenance is required to build a hybrid ETL and ELT pipeline within the Data Factory visual environment.
  • Cost-efficient and fully managed server less cloud data integration tool that scales on demand.
  • SSIS integration runtime to easily rehost on-premises SSIS packages in the cloud using familiar SSIS tools.
  • Azure security measures to connect to on-premises, cloud-based, and software-as-a-service (SaaS) applications with peace of mind.




What is ETL?

As said above, ADF is an ETL and ELT tool for Data solution. ETL stands for Extract, Transform and Load. ETL provides the method of moving the data from various sources into a data warehouse. The image below illustrates the 3 stages of data flow in ADF i.e., E – Extract, T – Transform and L – Load.

No alt text provided for this image



The ETL process collects unstructured data from On-Premises SQL Database. Data is extracted into a cloud service storage platform for data transformation as per requirements. ADF has the functions and activities which transform the unstructured data into structured data for BI tool and Analytics. With the help of ADF, the transformed data is then loaded into data warehouse or any cloud storage.

ETL is a process that uses cloud storage services for staging environments such as Blob, Data Lake, S3 storage and so on. Staging environments are like workspaces for actual data with its complete properties to test before the actual data is stored.

Connection and workflow of ADF

The image below is an example of an ADF copy workflow and the components and services used to orchestrate the task. Let’s go through the flow and learn each part involved in an ADF transformation job.

No alt text provided for this image

This example activity uses two Datasets as input source and output destination (sink) and these Datasets are connected using Linked services. Linked services are used to connect the actual storage location of data on both the source end and destination end. Integration Runtime binds this complete workflow and to automate or perform this task with single click, you will create a Pipeline.

Pipeline

As shown in the above image,?pipeline?is a logical grouping of activities. Activities deploy together to execute a specific task. Pipeline has lots of activities such as Get Metadata, For Each, Copy Activity, etc., to perform data transformation tasks. In this scenario, “copy activity” is used to copy data from the data source store to destination store.

Integration Runtime

Commonly known as IR, Integration Runtime is the heart of Azure Data Factory. IR provides the computer resources for data transfer activity and for dispatch of data transfer activities in ADF. There are 3 types of Integration Runtime – Azure IR, Self-hosted, and Azure-SSIS.

Blob Storage

This cloud storage provided by Microsoft stores massive amount of unstructured data, such as text or binary data. As you can see, the Customer.csv file with unstructured data of customer details is stored in blob storage.

Linked Service

Linked services connect data source and destination. In this example, the linked service connects to our data source i.e., blob storage. Data source can be an Azure blob storage, Azure SQL Database, or on-premises SQL Server.

Dataset

Dataset is used to connect to the data source and destination via linked services. Datasets are created based upon the type of data source and destination you want to connect.

As you can see above, on the source side a blob storage type of dataset connects via linked service to the actual location of data in blob storage. On the destination side, an SQL server type of dataset connects via linked service, pointing to the exact table location in database to storage data.

Copy Activity

The activity in this pipeline is used to copy data between different data stores, and in this case transform the data format as required in the destination store. Datasets are provided to this activity as input and output source.

So, with this you can understand ADF is the umbrella for all the components – such as Linked Service, Datasets, Integration Runtime – fused together in Pipeline Activity to produce transformed data for analytical tools such as Power BI and HDInsight for planning business outcomes.

要查看或添加评论,请登录

Ankit Sharma的更多文章

  • The Differences Between OpenAI and Microsoft Azure OpenAI

    The Differences Between OpenAI and Microsoft Azure OpenAI

    What is OpenAI? I'm sure by now you've heard of the famous and exciting ChatGPT which is a product of OpenAI, a leading…

  • Azure Subscriptions - Essential Tips

    Azure Subscriptions - Essential Tips

    When you start working on Azure, you need to first create an account and an Azure Subscription to host your services…

  • Azure Storage Services

    Azure Storage Services

    What is Azure Storage Account ? An Azure Storage account contains all of your Azure Storage data objects, including…

  • Cloud Identity and Access Management (Cloud IAM)

    Cloud Identity and Access Management (Cloud IAM)

    Authentication and access control are two of the capabilities of identity and access management solutions. Cloud IAM…

  • Azure Data Migration: 5 Tools to Know About

    Azure Data Migration: 5 Tools to Know About

    With many organizations migrating their critical workloads to cloud environments, choosing the appropriate data…

    1 条评论
  • Benefits of Togaf for an Organization

    Benefits of Togaf for an Organization

    TOGAF increases the efficiency of enterprise architecture development The architecture of an organization is a…

社区洞察

其他会员也浏览了