Combine Files With Different Column Names in Power Query

I'm going to look at situations that appear to be fairly common: loading data into Power Query where the source data has columns with different names.

For example, in the following file received in August 2021, I have data that relates to various items over a 12 month period.

No alt text provided for this image

Each month I get a new file showing values for that month, the next 6 months and the preceding 5.

So for Sep 21 the file looks like this

No alt text provided for this image

It contains data for the months Apr 21 to Mar 22. It still contains 12 months of data, but it's moved along by 1 month compared to the previous month's file.

Clearly each file has the same number of columns and whilst some of those columns have the same names, the order of those columns changes every month.

If I load these files into PQ and combine them let's see what I get.

In Power BI Desktop, I load data from a folder: Get data -> More -> Folder

No alt text provided for this image

After browsing to the folder containing the CSV files, and selecting it, I choose to?Combine & Transform?the files.

No alt text provided for this image

In the data preview I accept the defaults and click OK.

No alt text provided for this image

The result of combining the files is this

No alt text provided for this image

It's not exactly what I want. For a start the columns contain the Month-Year names and these aren't consistent down the columns because for each new file, the months shift along by one.

What I want is a column for the item and a column for the months (dates) and a column for the numeric values. How to achieve this?

The first thing I'll do is duplicate this query so I can play around with it a little and investigate how to solve this problem.

No alt text provided for this image

With this done, I'm going to back up a bit and delete the last 2 steps

No alt text provided for this image

This leaves me at the point in the query where I have the 3 CSV files as Tables in one of?my?columns.

No alt text provided for this image

Clicking on the first table (Aug 21) shows the data in that file

No alt text provided for this image

Promoting the first row to column headers is the thing to do here

No alt text provided for this image

Which does this

No alt text provided for this image

To get all the months/dates into a column and all the values into another column, I can select the Item column and then?Unpivot Other Columns

No alt text provided for this image

This gives me the data from this file in a format that I can use so I need to apply these transformations to all the files.

Back in?my?main query where I left it with just the files as tables in a column

No alt text provided for this image

Add a Custom Column

No alt text provided for this image

Using the?Table.PromoteHeaders?function I can manipulate each of the tables inside the?Transform File?column using this code

No alt text provided for this image

Clicking in the cell beside the August-21 table shows me that the first row has been promoted to headers, and it is the same for all 3 tables.

No alt text provided for this image

Next I need to unpivot the dates/values. Add another Custom Column and enter this code

No alt text provided for this image

Note that I am referring to the?Custom?column as the location for the tables, and that I am naming the first column created by the unpivot?Date

I get another column containing tables with the dates and values unpivoted into their own columns

No alt text provided for this image

Almost done. Next delete the first 3 columns leaving me with just the?Custom.1?column.

I can expand the tables in this column now. Uncheck?Use original column name as prefix

No alt text provided for this image

Giving me this




No alt text provided for this image

There's one more thing to do. Remember that each month I get data for another month, so I'm going to end up with a lot of duplicate values for months that are contained in several files.

To see this, first change the Date column to the Date type

No alt text provided for this image

Next sort the?Item?column in ascending order, then sort the?Date?column in ascending order.

It's plain to see now that there are duplicate values

No alt text provided for this image

To fix this, click on the?Item?column then click on the?Date?column, then right click on either column and choose?Remove Duplicates.

No alt text provided for this image







No alt text provided for this image

Different Column Names - Same Data

?this scenario I'm receiving data about clients and the values represent the amount I'm charging for some work done, on the dates shown.

In the first CSV file, for January, you can see there are 3 columns, the first one named Client.

No alt text provided for this image

In the February file there are also 3 columns but the first is named Customer.

No alt text provided for this image

If I load these files with Power Query (using the same process as in the first example) I get this.

No alt text provided for this image

I get nulls in the first column because in the Feb CSV file that column is called?Customer, and in Jan it's called?Client. The Feb file has been loaded first so the first column takes its name from that file. As the Jan file does not have a?Customer?column, Power Query enters nulls here, and doesn't know what to do with the data in Jan's?Client?column so omits that entirely.

I have 4 solutions to this problem for you.

Solution 1

Let's back up a bit and get rid of these steps.

No alt text provided for this image

I want to be left with the source CSV files as tables in a column.

No alt text provided for this image

I want to demote the column headers to become the first row in the tables. To achieve this, add a Custom Column with this code using the?Table.DemoteHeaders?function.

The table(s) being acted upon are in the?Transform File (2)?column.

No alt text provided for this image
No alt text provided for this image

With the headers demoted in these tables I can skip the first row in each using the?Table.Skip?function. Add a Custom Column with this code

No alt text provided for this image
No alt text provided for this image

With the first rows removed each table is left with just the data. I can now delete the first 3 columns from the query and expand the?Custom.1?column

No alt text provided for this image
No alt text provided for this image

All that is left is to rename the columns, set column data types and (if you wish) sort by date.

No alt text provided for this image

Solution 2

Duplicate the query from Solution 1

No alt text provided for this image

Delete all the steps after?Added Custom?which demotes the table headers.

No alt text provided for this image
No alt text provided for this image

Delete the first 2 columns from the query and expand the tables in the?Custom?column.

No alt text provided for this image

Filter column 3 to remove the text "Date"

No alt text provided for this image

Now you can rename the columns, set data types and sort as desired.

Solution 3

Duplicate the query from Solution 1, delete all the steps after?Added Custom?and delete the first 2 columns in the query to leave a single column?Custom.

No alt text provided for this image
No alt text provided for this image

Right click on the?Column 3?header and remove the errors

No alt text provided for this image
No alt text provided for this image

Promote headers : Transform tab -> Use First Row as Headers

No alt text provided for this image

I can now filter out the header row from the table by filtering any of the columns to remove the appropriate text value. In this case I'll filter the?Date?column to remove the text "Date"

No alt text provided for this image
No alt text provided for this image

Hopefully this will help you deal with situations where you also face importing columns with names that don't match up.

Let me know if you have any other scenarios with different column names causing you issues.

要查看或添加评论,请登录

Emmanuel Iruke, PHD的更多文章

社区洞察

其他会员也浏览了