How to import multiple Excel sheets and files with different schema in Alteryx

How to import multiple Excel sheets and files with different schema in Alteryx

Scroll to the end to download a free Batch Import Macro with examples!

Suppose you want to import multiple sheets from an Excel file that have the same structure. A common approach would be to use a Dynamic Input tool in Alteryx. In another scenario, imagine you want to import multiple Excel files with the same structure. A go-to solution is to use a Union tool or wildcard. However, these solutions only work well on one condition – the schema of all the sheets or files have to be identical. The sheet or file will be skipped if it has a different schema from the first sheet or file read by the Input tool. The warning or error message you see might be 'The file "C:\...(file path)" has a different number of fields than the 1st file in the set and will be skipped' or 'The file "C:\...(file path)" has a different schema than the 1st file in the set.'

This prerequisite can be a bit tricky to meet as sometimes monthly file format changes, and columns are added or removed. Or even if the structure of the sheets/files may look the same, they may have a different schema due to field type being recognised differently, or empty columns in the file that contain space.

A quick, simple workaround using a Batch Macro

So how can we ensure to batch import every Excel sheet and file in Alteryx regardless of their different schema?

This can be done with a very simple Batch Macro.

For those of you who don’t have the time to create a Batch Macro from scratch, you can simply download the batch import package I built here. Once you download it, follow the three steps below:

  1. Decide if you want to input multiple csv files, or multiple sheets from a single xlsx file. And go to the "Multiple files" or "Multiple sheets" folder accordingly.
  2. Run the example workflow to understand how it works. Open the example Alteryx workflow in the folder (the one with .yxmd extension). If the Batch Macros are missing (it will show up as a black box with a question mark), re-insert the Batch Macro from the folder and choose the right input fields from the dropdown following the instructions in the comment boxes embedded in the workflows.
  3. Use the macros for your data. For multiple csv files, re-point the Directory tool to the folder where all your csv files are stored. For multiple sheets in Excel, re-point the Input tool to the .xlsx file you would like to import.
  4. Run the workflow and check the result 


For those who are keen to learn how to create batch macros, read on. In the following step-by-step guide, I will use ‘multiple sheets’ scenario as an example to guide you on how to create a batch macro from scratch. To create one for importing multiple files, or for other more advanced batch macros, you can just follow the same principles.

No alt text provided for this image

Step-by-step guide to create a batch macro that imports multiple Excel sheets or csv files

Part 1. Create a Batch Macro to batch import multiple sheets

1.1 ADD TOOLS

To create a Batch Macro, start by creating a new workflow with an Input Data tool. Point it to the file you’d like to import and select one of the sheets. (You will see later that it actually doesn’t matter what Excel file you point it to, as the file name and sheet name will be dynamically updated.) Select “Full Path” as the value for “Output File Name as Field”. This is to ensure the sheet names are included in the output.

Figure 1

Then drag a Control Parameter tool (from the Interface category) onto the canvas, and connect it to the Input tool with the lightning bolt - an Action tool will be automatically generated (see Figure 2 below).

Notice here we want to update two fields in the Input tool – File name and Sheet name, so we need to add a second Control Parameter and connect it to the same Input tool. Name the two Control Parameter tools “File name” and “Sheet name” in the configuration respectively, so that you can distinguish the input fields later. Now connect the output to a Macro Output.

Figure 2

(Now if you check the Workflow Type, you’ll see it’s automatically changed from Standard Workflow to Batch Macro.)

Figure 3

1.2 CONFIGURE INPUT

Click the Action tool connecting Control Parameter for File name (① in Figure 2). Select “Update Value (Default)” in the “Select an action type” dropdown. In the window shown of Figure 4 below, select File – value =”…” and then tick the box at the bottom that says “Replace a specific string:”. Remove the sheet name in that string, which is everything from the pipe till the end of the string after the file name. It means only the file path will be updated by this Parameter Control.

Figure 4

Now click the Action tool connecting Control Parameter for Sheet name (② in Figure 2 above). Select “Update Value (Default)” in the dropdown. In the window shown on Figure 5 below, select File – value =”…” and then tick the box at the bottom that says “Replace a specific string:”. Only keep the sheet name in that string, which is the text in between and $’ after the pipe at the end of the file name (in this example the sheet name is Austria). It means only the Sheet name will be updated by this Parameter Control as the Batch Macro loops through the multiple sheets.

Figure 5

1.3 CONFIGURE OUTPUT

Now go to View -> Interface Designer -> Properties (left hand side panel). As illustrated in Figure 6, in “Output Mode”, choose “Auto Configure by Name…” or “Auto Configure by Position…” to union output from each sheet by field names or field position at the end of the iterations. Once you’ve done this, it will not complain about different schema in different sheets or files. This is the key for combining tables with different schema.

Figure 6

Note: “Auto Configure by Name” requires the headers to be the same across all sheets/files. If the headers are different but the tables have the same structure across sheets/files (eg. Date is always in column A), use “Auto Configure by Position”. In the batch macro, add a select tool to move FileName to the top to make sure the FileName column is always at the same position.

Figure 7

Save the Batch Macro and it’s good to go!

Part 2. Use the Batch Macro in a standard workflow

The standard workflow I created has two parts – import & combine data, and extract the sheet names.

Figure 8

Create a new workflow with an Input file pointing at the file you’d like to import. Choose “<List of Sheet Names>” for “Table or Query”, and “Full Path” for “Output File Name as Field”.

Figure 9

The “Sheet Names” column generated will be used as the input field for Sheet name in the Batch Macro. To get the input for File name, use Text to Columns (| as delimiter, split into 2 columns) to extract the file full path from FileName column. I then renamed the FileName1 column “File Name”, which we will use as input for File name in the Batch Macro.

Next, connect the Batch Macro. From the dropdown lists select the correct input field for File name and Sheet name.

Figure 10

The workflow after that is basically a standard Alteryx workflow to extract and add the sheet names back to the dataset in an extra column (Country in this case).

Now you have a reusable Batch Macro and a repeatable workflow to import multiple Excel tables even if they have different schema.

 

Additional notes:

To adjust the Batch Macro in order to import multiple files, simply remove one of the Parameter Control. And in the Action configuration for the remaining Parameter Control, select “Update Input Data Tool” in the “Select an action type” dropdown. In the standard workflow, use a Directory tool to obtain full paths for all files you want to import, and use that as the Batch Macro input field.

Figure 11
Figure 12

Note: this only works when you have only one sheet in every Excel file. If the Excel files have multiple sheets, you’ll have to create full path that includes file path and sheet name in the format of File Path|’Sheet Name$’ (or the more recent Alteryx versions uses three pipes instead of one).

There you have it, a practical solution to import multiple Excel sheets and files regardless of different schema all at once! 

 

You can download the example workflows and macros here.

If you find this helpful, please hit Comment or Like to let me know so that I can continue to create and share useful content.

For more tips on using Alteryx, check out Alteryx Community, my other posts or get in touch!

Rebecca Insell

Strategy & Planning Manager

1 个月

Bingqian Gao this is really helpful but I got stuck on Part 2 and updating to 'List of sheet names' as I have imported the file I want to use, it won't let me change it. Also on the next step to connect the Batch Macro - how do you do this and where do you go? Forgive me as I'm brand new to Alteryx and only at a basic level and this seems quite advanced! Thank you in advance

回复

This assisted me greatly. Thank you very much!

回复
Ramchandra More

Business Intelligence Developer at Nepa | 2x Alteryx Certified | MS-SQL | Tableau Desktop |Power BI | Ex- TCS | Pune | Mumbai

1 年

Thanks @Bingqian Gao , Awesome explanation.... Keep it up ??

回复
Thao P Tran

Data Analyst | ETL & Data Visualisation Power BI | Alteryx Advanced | Tableau

2 年

Thank you, this is so helpful with my current project ??.

回复

This macro is what my life dependent on now. But can you tell me how were you able to create the "Update input data tool"? Thank you!!!

回复

要查看或添加评论,请登录

Bingqian Gao的更多文章

社区洞察

其他会员也浏览了