- Have you ever thought of designing a Microsoft Excel workbook template and using that particular template for a different dataset?
- This could be a very tedious and time-consuming task if done manually. In today’s blog, I’ll be writing on how to automate this process with power query using a fictitious example.
- Let’s assume we have a dataset that contains the attributes of wine sold per day from a wine shop and we need to get the combined dataset for the week with only some specific columns (i.e. 7 datasets in total).
- The first thing to do is to put the dataset of the seven days in a folder and name it. After doing this, open a blank Excel workbook — Go to the DATA tab — Click on the Get Data drop-down — Select From File — Select From Folder and pick the folder from where it is located on your computer. You can also select other file types apart from selecting a folder.
N.B: When you select the folder from your computer, you won’t see the files in the folder until it is loaded to your Excel workbook.?
- After doing this, you’ll see a preview showing the list of all the Excel files in that folder. Click on the Combine drop-down — Select Combine & Transform Data. Once the query is done evaluating, click OK. The power query editor will be opened to transform your data based on your preference.
N.B: Before you start transforming your data, make sure you go to the APPLIED STEPS at the right hand side of the power query editor screen and cancel the Changed Type step so as to change the data type of all columns to general format. Also, manually change the data type of number related columns that will be needed to Decimal Number data type. If this is not changed, you won’t be able to perform mathematical operations when grouping. You can change this by clicking on the data type next to the column name and changing it.
- For example, lets assume that after combining the excel workbooks, you don’t need all the columns. You can transform only the columns you need. After transforming, go to the Transform tab under the power query editor and click on the Group By tab. This will enable you group the columns in the order you want them to appear.
N.B: Under the Group By tab, there are two types of grouping (Basic and Advanced). The Basic grouping is used if you desire to group only one column while the Advanced grouping is used to group multiple columns.
- In most cases, the Advanced grouping is used. Once you click on the Advanced grouping, two sections will be opened (One for grouping text related columns and the other for grouping number related columns as shown below).
- In the first section of the picture above, click on the drop-down to pick the first column based on your preference. After this, click the Add grouping box to keep adding other columns in the order you want them to appear.?
- The second section is where to aggregate or perform arithmetic operations. Under the New column name, name the new column you want to derive through arithmetic operation. Under the Operation, click the drop-down box to select the operation you want to perform. Under the Column, click the drop-down to specify the column you want the mathematical operation to be performed on. After this, you can click the Add aggregation box to add new columns and operations. Click on OK after all boxes has been filled.
- After all the above process has been done, go to the Home tab under the power query editor and click on the Close & Load tab. Wait for the entire process to load into your workbook. You will have your transformed merged dataset.
- The whole essence of this automation is to be able to reuse the template as written in the introductory paragraph. From the fictitious example I gave above. Let’s assume you need to get the attributes of wine sold for another week range. All you need to do is to remove the existing datasets from the folder you loaded into power query and replace them with the new ones?.
- Once this is done, open the previous transformed generated workbook — Go to the Data tab— Click the Refresh All drop-down — Click on Refresh. The worksheet will be automatically refreshed with the new datasets in that folder.
P.S: This automation can also be done with the power query in Microsoft Power BI.
Deloitte || Click-On Kaduna Data Science Fellow || Python|| PowerBI || SQL
2 年Amazing. This will surely ease alot of stress