AZURE DATA ENGINEER

AZURE DATA ENGINEER

What is Azure Data Factory?

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.

ADF does not store any data itself. It allows you to create data-driven workflows to orchestrate the?movement of data between supported data stores and then process the data using compute services in other regions or in an on-premise environment. It also allows you to monitor and manage workflows using both programmatic and UI mechanisms.

Azure Data Factory use cases

ADF can be used for:

  • Supporting data migrations
  • Getting data from a client’s server or online data to an Azure Data Lake
  • Carrying out various data integration processes
  • Integrating data from different ERP systems and loading it into Azure Synapse for reporting

How does Azure Data Factory work?

The Data Factory service allows you to create data pipelines that move and transform data and then run the pipelines on a specified schedule (hourly, daily, weekly, etc.). This means the data that is consumed and produced by workflows is time-sliced data, and we can specify the pipeline mode as?scheduled (once a day) or one time.

Azure Data Factory pipelines (data-driven workflows) typically perform three steps.

Step 1: Connect and Collect

Connect to all the required sources of data and processing such as SaaS services, file shares, FTP, and web services. Then,?move the data as needed to a centralized location for subsequent processing by using the Copy Activity in a data pipeline to move data from both on-premise and cloud source data stores to a centralization data store in the cloud for further analysis.

Step 2: Transform and Enrich

Once data is present in a centralized data store in the cloud, it is transformed using compute services such as HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Machine Learning.

Step 3: Publish

Deliver transformed data from the cloud to on-premise sources like SQL Server or keep it in your cloud storage sources for consumption by BI and analytics tools and other applications.

Data migration activities with Azure Data Factory

By using Microsoft Azure Data Factory, data migration occurs between two cloud data stores?and between an on-premise data store and a cloud data store.

Copy Activity?in Azure Data Factory copies data from a source data store to a sink data store. Azure supports various data stores such as source or sink data stores like?Azure Blob storage,?Azure Cosmos DB?(DocumentDB API), Azure Data Lake Store, Oracle, Cassandra, etc. For more information about Azure Data Factory supported data stores for data movement activities, refer to Azure documentation for?data movement activities.

Azure Data Factory supports transformation activities such as Hive, MapReduce, Spark, etc that can be added to pipelines either individually or chained with other activities. For more information about ADF-supported data stores for data transformation activities, refer to the following Azure Data Factory documentation:?Transform data in Azure Data Factory.

If you want to move data to/from a data store that Copy Activity doesn’t support, you should use a .NET custom activity in Azure Data Factory with your own logic for copying/moving data. To learn more about creating and using a custom activity,?check the Azure documentation?and see “Use custom activities in an Azure Data Factory pipeline”.

要查看或添加评论,请登录

Ragini Trivedi的更多文章

  • GIT

    GIT

    Git is a mature, actively maintained open source project originally developed in 2005 by Linus Torvalds. Git is an…

  • APACHE SPARK

    APACHE SPARK

    What is Apache Spark? Apache Spark is an open-source, distributed processing system used for big data workloads. It…

  • DEVOPS

    DEVOPS

    What is DevOps DevOps is a collection of flexible practices and processes organizations use to create and deliver…

  • GCP

    GCP

    Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same…

  • ACTURIAL

    ACTURIAL

    What Is Actuarial Science? Actuarial science is a discipline that assesses financial risks in the insurance and finance…

  • CLOUD OPERATIONS

    CLOUD OPERATIONS

    Cloud operations (CloudOps) is the management, delivery and consumption of software in a computing environment where…

  • SALESFORCE

    SALESFORCE

    Salesforce, Inc. is an American cloud-based software company headquartered in San Francisco, California.

    1 条评论
  • REDSHIFT

    REDSHIFT

    A Redshift Database is a cloud-based, big data warehouse solution offered by Amazon. The platform provides a storage…

  • UIPATH

    UIPATH

    UiPath is a robotic process automation tool for large-scale end-to-end automation. For an accelerated business change…

    3 条评论
  • DATA INTEGRATION

    DATA INTEGRATION

    Data integration is the process of combining data from different sources into a single, unified view. Integration…

社区洞察

其他会员也浏览了