Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

Ever struggled with copying numerous tables from an on-premise SQL Server database to your Azure Data Lake? You're not alone! This manual process can be time-consuming and error-prone. But fear not, data engineers! Here's a powerful approach using Azure Data Factory (ADF) to automate copying all tables at once.

If you haven't already read Part 1 and Part 2 of this series, I highly recommend doing so before proceeding with this article. This installment builds upon the content covered in Part 1 and Part 2, providing a seamless continuation of the series.

Step-by-Step Guide with Actionable Insights

  • Use a SQL script to fetch all table names and schema names from the SQL Server database.

SELECT s.name AS SchemaName,
	t.name AS TableName
FROM sys.tables t
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE s.name = 'SalesLT'        

  • Create a new pipeline in Azure Data Factory called "Copy All Tables".

  • Add a Lookup activity to the pipeline to execute the SQL script and fetch the table/schema names from the database.

  • We'll leverage the same SQL query used to discover tables directly in the Lookup activity. Go to the settings tab, where you'll find the option for creating a dataset. We need to create one dataset for our on-premise SQL Server database.

  • Choose SQL Server as the dataset type, then proceed by naming it 'SqlDBTables'. Next, select the linked service established in the previous article to establish connectivity with the on-premise SQL Server. Keep the remaining options at their default settings and proceed by clicking 'OK'.

  • Now, let's explore an intriguing feature within the lookup activity: the 'Use Query' option. By default, the activity is set to retrieve data from tables. However, we have the flexibility to switch to a query or stored procedure. Let's navigate to this option and select 'Query'. Subsequently, I'll retrieve the SQL script from SSMS, which we've recently used, and paste it into the query window. Upon execution, the lookup activity will employ this identical query to retrieve data from the on-premise SQL Server.

  • Let's verify our setup by utilizing the 'Preview Data' option. Simply clicking on this option will reveal the table structure. Remember to uncheck the 'First Row Only' option to display all rows for thorough examination.

  • To ensure everything's working smoothly, use the debug option when running the Lookup activity. This displays the input and output sections in the output window. The input section confirms the SQL query used, while the output provides a clear JSON structure containing all table and schema names. This valuable data becomes the roadmap for copying tables from your source seamlessly.

  • Add a ForEach activity, Search for the "For Each Activity" in the ADF toolbox and drag it onto the workspace. Rename it to "For Each Schema Table" for clarity. Connect the success output of the Lookup activity to the "For Each Loop" activity. This ensures the Loop only runs if the Lookup successfully retrieves the table list.

  • Within the ForEach Loop settings, access the "Items" field and use the "Add Dynamic Content" option. Select "Activity Outputs" followed by the Lookup activity. This creates an expression that retrieves the list of tables (value) from the Lookup activity's output. This list will be used by the ForEach Loop to iterate through each schema and table name, enabling you to copy tables one by one.

  • Nest the Copy Data Activity within the ForEach Loop. Access the "Activities" section and click the edit icon (pencil) on the ForEach Loop activity. Drag and drop the Copy Data activity into the Loop container. Rename the Copy Data activity to "Copy Each Table" for clarity.

  • Configure the Copy Data activity's source dynamically. Instead of creating a static source dataset, click "New" and choose "SQL Server" as the connector. Name it "SQL Server Copy" and reuse the existing linked service connection to your on-premise SQL Server. Since we'll use dynamic content later, leave the table name blank. Click "OK" to create the source dataset.

  • Make the Copy Data activity's source query dynamic. Switch the "Use Query" option to "Query" and use dynamic content. Paste a script like @{concat('SELECT * FROM ', item().SchemaName, '.', item().TableName)}. This script leverages the item() function to access the current schema and table name being iterated through by the ForEach Loop (fetched from the Lookup activity output). As the loop progresses, the script will be populated with the specific schema and table name for each iteration, enabling you to copy each table individually.

  • Move on to configuring the Copy Data activity's destination (sink). Click on "Sink" and select "New." Choose "Azure Data Lake Storage" as the connector and hit "Continue." Maintain the default format CSV and click "Continue" again. Rename the sink dataset to "CSVTable" for clarity. Reuse the existing linked service connection to your Data Lake. Specify the file path using "Browse" and navigate to the desired bronze container where you want to store the copied tables. Finally, click "OK" to create the sink dataset.

  • Now, let's define the desired folder structure within the Data Lake. We want the tables organized by schema and table name.

1) Create Dynamic Parameters:

Open the "CSV Table" sink dataset and navigate to the "Parameters" tab. Create two new parameters: "schemaname" and "tablename".

2) Assign Dynamic Values:

In the "Value" section for each parameter, click "Add Dynamic Content." Use the expressions @item().SchemaName and @item().TableName to link the parameter values to the corresponding data from the Lookup activity (outer loop).

3) Build the Folder Path:

Open the "CSVTable" sink dataset again and navigate to the "Connection" tab. In the "File Path" field, use dynamic content to construct the path. Click "Add Dynamic Content" and enter a Python expression like: @{concat(dataset().schemaname, '/',dataset().tablename)} This expression concatenates the container name (obtained from the dataset), schema name (from the parameter), and table name (from the parameter) to create the desired directory structure.

4) Set the File Name:

In the same "File Path" field, use another dynamic content expression to define the filename format. You can use: @{concat(dataset().tablename,'.csv')} This expression combines the table name with ".csv" to create the desired file extension.

  • Publish and trigger the pipeline to execute the copy of all tables to the Data Lake.
  • Head over to the "Monitor" tab to track the pipeline's execution. Clicking the pipeline name reveals a detailed view of each activity's status. You can see that the Lookup activity has finished, and the ForEach Loop is currently running 10 activities in parallel, each copying a separate table. Successfully completed activities in the "Activity Runs" section indicate tables that have been loaded.

  • After a successful run message, let's verify the outcome in the Data Lake. Refreshing the "Bronze Container" shows a folder named "salesLT" (the schema name). Inside, you'll find all 10 tables with their designated directory structure and corresponding CSV files. Opening the "address" table confirms the presence of "address.csv." Similar files exist for all copied tables.

Key Takeaways and Benefits:

  • Automation Advantage: This approach eliminates the need for manual scripting and repetitive tasks, saving you valuable time and effort.
  • Error Reduction: Repetitive manual processes are prone to errors. ADF automates the entire process, minimizing the chance of human error.
  • Scalability and Flexibility: This ADF solution can easily handle copying a large number of tables efficiently. Additionally, it can be adapted to fit your specific data landscape.

Related References

Next Task For You

Don't forget to like and share this article to spread the knowledge among your data engineer network! :)


要查看或添加评论,请登录

Akshay T.的更多文章

社区洞察