Power BI Intermediate Level: 52 - Importing and Appending a Folder of Files
Richard Steinbiss
Head of Business Intelligence and Data Analytics (Power BI, AWS, Azure, Spark, Python, Kubernetes)
Short summary: Power BI allows you to import and append a whole folder of files, either stored locally or online with OneDrive, Teams, or SharePoint. You should create a query for a single file, convert that query to a function, and then use the Folder Connector and apply the function to each file in the folder, then append the tables. This works best if the files have the same structure of column names and data types.
Let's shift our focus back to Power Query and learn how to import a folder of files all at once and append the resulting data. This only works if the files have the same structure of column names, column data types, sheet names, etc.. If the structure is just slightly different, you can try to account for the differences with the function which we are going to use. For example, if the table starts at different row numbers depending on the file but is otherwise identical, you can try to dynamically remove the excess top rows, I will cover that in the advanced section of this article series.. If the files have totally different structures, this technique is not applicable and you should instead import them one by one.
For this article, let’s keep it basic and assume that all files have the same structure. The process which we will use goes as follows:
Let's walk through this step-by-step.
1. Create a Query for One file
Our sample data consists of three csv files. There is also a Readme.txt file which we want to ignore. Create a query to one of the csv files, which should be representative of all files which you want to process. In this case, we only do the automatic steps of loading the table, promoting headers, changing the column data types.
2. Change the Query into a Function
We haven't done coding in Power Query before, but we will do a very simple adjustment here. Don't worry, you don't need to understand everything here, I will explain more about M code in the advanced section of this article series.
For now, open the advanced editor.
The M code looks like this:
Now we want to turn the query into a function and replace File.Contents("") with a function parameter. Let's make the following changes:
The M code now looks like this:
When closing the advanced editor, you see that the query has turned into a function. We don't see individual steps anymore, it's just the function code.
So whenever you need to change or debug the function, temporarily change it back into a query by doing the following:
It's a bit of a hassle but in the long-run it is the cleanest and easiest way.
3. Use the File Connector
Let's now do the import of a folder of files. We will first cover the case of local files. Note that online refresh won't work here unless you set up a Data Gateway on your local computer for Power BI online to access the files.
In the Power BI Home ribbon, go to Get data, and click More. Then select Folder and click Connect.
Select the folder on your Computer and proceed. When you get to the screen below, click Transform Data so we can immediately add more steps.
What you see here are the files of the folder and all subfolders represented in a table. We first want to filter out the files which we don’t want to import. For example, we don’t want to import the Readme.txt file. In general, here is my recommended way to make sure that only the desired files get imported:
领英推荐
4. Apply our Function to each File
That leaves us with a table representation of the files. The column Content contains the file content in binary form, meaning zeros and ones. You could click the double down arrow inside the Content column header for Power BI to attempt to create a function by itself to extract the content. However, in my experience, that usually ends in a big mess and is hard to maintain. Instead we will apply our function from step 3. Go to the Add Column ribbon and select Invoke Custom Function.
Choose our function and select the column Content as for our parameter. This means that for every row, our function will be applied to the content of the corresponding file.
5. Append the Data
Under the Processed file column, you can click inside the whitespace in one of the cells to see a preview of the table data. Everything looks well. Now click the arrows icon to the right of the column header to expand and append the data.
Now we have the data from all three files appended. And this could scale to thousands of files, if required. Also note that we can keep the metadata of the files. In this case, I kept the file name of the source data file as a column. This can be quite useful for knowing where the data came from.
Optional: Using the SharePoint Folder Connector in Step 3
Do you want to import a folder from OneDrive/Teams/SharePoint so you can refresh your Power BI report online? We established in an earlier article that both OneDrive and Teams are built on SharePoint. You can set file synching with OneDrive or Teams and your local computer and have Power BI import the files which get stored online. That way online refresh will work just fine.
Important: You can only connect to the top-level of the SharePoint documents, not directly to a specific file.
First, find out the top-level of your SharePoint link. One way to do that is to open one of the CSV files in Excel, going to File, Info, and clicking Copy path (not Copy local path). Copy the part before “/Documents”.
For example, if Excel’s Copy path gives you: https://XYZ.sharepoint.com/personal/ABC/Documents/Store%202.xlsx?web=1
You should copy everything before /Documents: https://XYZ.sharepoint.com/personal/ABC/
In Power BI, go to Get Data, File, SharePoint folder, insert the URL, and click Connect.
Log in using Organizational account if available, else using Microsoft account. Once connected, click Transform Data.
Then you will see all files on the SharePoint.
Click Transform Data. You will need to filter the table of files to just your files of interest. From there you can proceed as shown earlier in the article. So this was a longer one but it’s really worth it if you need it.
Conclusion
Power BI allows you to import and append data from a folder of files, either locally or using OneDrive, Teams, or SharePoint. This can scale to thousands of files. An important requirement is that the files have the same structures, including column names and data types. In my opinion, the cleanest and easiest way to process the files is to create a query for one representative file, apply your changes, and then change the query into a function. You can then use the function and apply it to each file in the folder. If you want to change or debug your function, you can temporarily change it back into a regular query.
Next up, I will show you a similar technique for importing multiple sheets from an Excel file at once.
Please like, share, and subscribe and feel free to ask questions in the comments below.