登录查看更多内容

Part 2- Data Ingestion | A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure

Akshay T.

Azure 14X | KPMG | Ex - EY | Azure Data Engineer | Data Factory | DataBricks | Data Lake | Synapse | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]

发布日期: 2024年3月10日

In this second installment, we delve into the pivotal process of data ingestion, seamlessly transferring data from on-premises SQL Server to Azure Data Lake Gen2 utilizing Data Factory. Building upon the foundational introduction and environment setup discussed in Part 1, if you haven't yet explored it, we encourage you to start there before delving into this next stage.

The Journey Begins: Establishing the Connection

Our adventure starts with establishing a connection between ADF and your on-premise SQL Server. This is achieved using a concept called an integration runtime.

Self-hosted integration runtime: Ideal for on-premise data sources. We'll walk you through the installation process, ensuring a secure and efficient connection.
Auto-resolve integration runtime: Perfect for cloud-based resources like Azure Data Lake. Later on, we'll elucidate how ADF seamlessly manages the connection process automatically.

Create a self-hosted IR via UI

Navigate to the Azure Data Factory user interface homepage and click on the Manage tab located in the leftmost pane.
From the options presented, select Integration runtimes and then click on +New.

On the Integration runtime setup page, opt for Azure, then Self-Hosted, and proceed by clicking Continue.
Next, select Self-Hosted to specify the creation of a Self-Hosted Integration Runtime and click Continue.

Now, you'll be prompted to configure the self-hosted IR: Provide a name for your Integration Runtime and finalize the setup by clicking Create.
To complete the setup via UI: Click the link under Option 1 to initiate the express setup on your computer. Alternatively, follow the steps outlined under Option 2 for a manual setup. The instructions provided here are based on the express setup (Option 1). You'll notice a newly downloaded application on the machine. Once the download completes, I'll proceed by clicking on the application to initiate it. Upon opening, you'll observe the initiation of the Self-Hosted Integration Runtime setup. The application will begin downloading the necessary files and automatically handle key-based authentication, streamlining the setup process.

Building the Bridge: The Copy Data Activity

Now comes the exciting part - data movement! We'll showcase the copy data activity, the workhorse that transfers data from your SQL Server tables to the designated location in Azure Data Lake. We'll explore:

Navigating to the Author Tab, we locate the Pipelines section and initiate the creation process. With a simple click on the plus icon, a new pipeline is born. To avoid confusion down the line, we assign a descriptive name such as "Copy Pipeline" to our newly created pipeline.

领英推荐

Clustering vs Partitioning your Apache Iceberg Tables

Alex Merced 9 个月前

Data Lakehouse 101: The Who, What and Why of Data…

Alex Merced 7 个月前

Snowflake

Rohit Singh 4 个月前

The heart of our pipeline lies in the Copy Data activity. By searching for and dragging the Copy Data activity into the workspace, we set the stage for data transfer. Initially, we opt to copy a single table, such as the "Address" table from our SQL Server database. Renaming the activity to "Copy Address Table," we proceed to configure the source and sink.

Before proceeding, we need to define a source data set representing our SQL Server database. Clicking on the Source option, we're prompted to create a new source data set. Selecting SQL Server as our data store, we provide essential details such as the server name (e.g., localhost), database name, and authentication credentials. For security purposes, we utilize Azure Key Vault to securely store and retrieve our password.

Selecting the source table (e.g., address table)

On the other end of the spectrum lies our destination, Azure Data Lake Gen2. Here, we create a sync data set for storing our copied data. Selecting the appropriate data format, such as Parquet, CSV, ensures efficient storage and retrieval. We establish a link service connection to our Azure Data Lake storage account, specifying the desired container for data storage.

With all configurations in place, we execute our pipeline using the debug option. This initiates the data copying process, transferring the "Address" table from our SQL Server database to Azure Data Lake. Upon successful completion, we verify the presence of the copied file in the designated container within the Data Lake.

Conclusion

By meticulously following these steps, data engineers can streamline the process of data ingestion, enabling seamless transfer of data from on-premise SQL Server databases to Azure Data Lake Gen2. In the next part of our data engineering projects journey, we'll explore scaling up our pipeline to handle multiple tables effortlessly.

Stay tuned for more insights and practical guides on data engineering techniques. Follow me for updates on future articles and dive deeper into the world of data engineering with Azure Data Factory. Let's unlock the true potential of data together.

#AzureDataFactory #DataIngestion #CloudMigration #DataEngineering

Data Digest

8,184 位关注者

Mihir Mohanty

1 年

Well written Akshay. You have elaborated the step-by-step process for easy understanding.

1 次回应

Atanu Das

Cloud Data Architect | Snowflake, Azure & GenAI Expert | Crafting Robust Solutions for Data Excellence | Healthcare & Insurance

1 年

Great article Akshay Tondak. I liked the step-by-step guidance. Keep up the good work! For such amazing content on data modelling, please follow https://datamodeling.atanuconsulting.in

2 次回应

Sagar Sutar

Analyst at Evolent Health International

1 年

Thanks for sharing!

1 次回应

TOMEK

1 年

Excited to see how this series unfolds, especially the practical insights on data ingestion with Azure! Looking forward to the next part.

1 次回应

查看更多评论

要查看或添加评论，请登录

Akshay T.的更多文章

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

2024年3月18日

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

Ever struggled with copying numerous tables from an on-premise SQL Server database to your Azure Data Lake? You're not…
Conquering the Azure Data Engineer Associate Exam: A 30-Day Blueprint to Success

2024年3月11日

Conquering the Azure Data Engineer Associate Exam: A 30-Day Blueprint to Success

Are you ready to unlock a world of opportunity in cloud data engineering? The Azure Data Engineer Associate…
A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure - Part 1

2024年3月9日

A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure - Part 1

Have you ever felt overwhelmed by the vast amount of data companies collect? As data engineers, we are the bridge…

4 条评论
Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

2023年7月9日

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Traditional large-scale data analytics solutions have relied on data warehouses and SQL queries to store and retrieve…

11 条评论
Seamless Integration: Databricks' Approach to Reading and Writing in Azure Data Lake Gen 2

2023年7月4日

Seamless Integration: Databricks' Approach to Reading and Writing in Azure Data Lake Gen 2

Introduction: In this article, we will discover how to read data from comma-separated values (CSV) files, and how to…
Azure Data Factory – CI/CD [Part-2]

2023年3月29日

Azure Data Factory – CI/CD [Part-2]

Welcome to Part 2 of our series on Azure Data Factory CI/CD. In this article, we will be delving into topic that were…

1 条评论
Azure Data Factory – CI/CD [Part 1]

2023年3月26日

Azure Data Factory – CI/CD [Part 1]

Azure DevOps is a set of tools for collaboration, continuous integration, and continuous delivery. Azure Repos allows…

7 条评论
Creating an Automated Data Pipeline with Databricks

2023年3月5日

Creating an Automated Data Pipeline with Databricks

In this article, we'll explore how to use Databricks to build a complete data pipeline. We'll cover everything from…
Capture Data Changes in Azure Data Factory and Azure Synapse Analytics

2023年2月4日

Capture Data Changes in Azure Data Factory and Azure Synapse Analytics

In the cloud environment, efficient data integration and ETL processes can greatly improve the performance of your…
Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

2023年1月27日

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

As data engineers, we are constantly faced with a wide range of challenges in terms of data management, accessibility…

See all articles

Part 2- Data Ingestion | A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure

Akshay T.

Azure 14X | KPMG | Ex - EY | Azure Data Engineer | Data Factory | DataBricks | Data Lake | Synapse | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]

Create a self-hosted IR via UI

领英推荐

Conclusion

Data Digest

8,184 位关注者

Akshay T.的更多文章

社区洞察

其他会员也浏览了

Azure Data Engineer Interview questions with Answers 2024

Proposal for a Management Architecture for Large Volumes of Data

The Data Story of Powerplay (Part-2)

Difference Between Data Lakehouse and Delta Lake

Data Flow : Building Scalable and Resilient Systems as a Data Engineer

Reverse Engineering a Source System - Data Model (1 of?5)

Big Data Architecture

Azure Data Factory: A Beginner's Guide for Data Engineers

Top 79 beautiful lines for taking big data architecture from drawing board to production !

Mastering Parameters and Dynamic Features in Azure Data Factory (ADF)

Create a self-hosted IR via UI

领英推荐

Conclusion

Data Digest

8,184 位关注者

Akshay T.的更多文章

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

Conquering the Azure Data Engineer Associate Exam: A 30-Day Blueprint to Success

A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure - Part 1

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Seamless Integration: Databricks' Approach to Reading and Writing in Azure Data Lake Gen 2

Azure Data Factory – CI/CD [Part-2]

Azure Data Factory – CI/CD [Part 1]

Creating an Automated Data Pipeline with Databricks

Capture Data Changes in Azure Data Factory and Azure Synapse Analytics

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

社区洞察

其他会员也浏览了

Azure Data Engineer Interview questions with Answers 2024

Proposal for a Management Architecture for Large Volumes of Data

The Data Story of Powerplay (Part-2)

Difference Between Data Lakehouse and Delta Lake

Data Flow : Building Scalable and Resilient Systems as a Data Engineer

Reverse Engineering a Source System - Data Model (1 of?5)

Big Data Architecture

Azure Data Factory: A Beginner's Guide for Data Engineers

Top 79 beautiful lines for taking big data architecture from drawing board to production !

Mastering Parameters and Dynamic Features in Azure Data Factory (ADF)