A DAX tale told in four versions
Welcome back flatlanders. First, a personal note: I have been travelling a lot in the last couple of weeks, and therefore have been remiss in my blogging. I promise to try to do better. But it was worth it. I went to "the" quilting show in Paducah Kentucky. If you are a quilt fan, follow this link to see the Best in Show quilt (you can skip the video at the top and just scroll down to see the quilt itself). I promise that you will be amazed!
Now back to data. In this post, I want to tackle one of the first things you need to know about DAX and your data model: how to add calculations (measures) that you need. Before we dive in, let's review the uses for DAX (Data Analysis Expressions). DAX is the language used once you are inside the Power BI Desktop (or DAX Studio and Tabular Editor) to build four different types of content:
Always try to create new columns and new tables as far upstream as possible and as far downstream as necessary (shout out to the great Matthew Roche for coining this aphorism). What does it mean exactly? It means do everything you can to avoid creating new columns or tables in the Power BI Desktop using DAX. Why? Because these creations do not compress as well as columns and tables that are imported into the Desktop from the Power Query Editor. But sometimes you don’t have access to the source (e.g. if it is in a database). Your next best option is to do the work in the Power Query Editor. Failing that option, you will have to use DAX.?
With that disclaimer firmly in mind, let's explore how to create a calculated column, a measure and an iterator measure. Measures are calculations that are not present in the data. If you think about it, one of the great strengths of Power BI is the ability to bring data together from a variety of sources. If you do that, it is highly likely that you won't have all the calculations you might want because the data has never been 'all together' before. Here's an easy example for us to start with:
In this case, we have a column for quantity and we have a column for the unit price, but we don't have a column with the total price paid for each row (e.g. [Quantity]* [UnitPrice]). Now we Flatlanders instinctively want to reach for a column to solve this problem. And you can….but try to create the column either in the source or in the Power Query Editor. Assuming you can't or won't do either of these things, here's how you would write the DAX to create this column:
Version A:
Total Sale='All Sales'[Quantity] * 'All Sales'[UnitPrice]
Notice a couple of things in Version A:
This is the expression that will produce a new column called [Total Sale]. Once the column is there (whether you added it further upstream or used DAX to calculate it as we are doing here) you can now use DAX to add it up. In other words, you can write a measure using this column now:
Version B:
Total Sales=SUM('All Sales'[Total Sale])
?Notice a couple of things in Version B:
More importantly, there is a major difference between Version A and Version B. Version A produces a column where the calculation is performed row by row. Version B adds up the entire column. Let's look at a Version C, which is wrong:
领英推荐
Version C:
Wrong Total Sale= SUM('All Sales'[Quantity])*SUM('All Sales'[Unit Price]])
?Notice a couple of things about Version C:
Understanding why Version C doesn't work involves understanding how DAX 'thinks'. (I know, DAX is inanimate and doesn't think per se, but if you have ever wrestled with writing an expression, you will believe DAX is a living, breathing entity with a mind of its own!) DAX 'thinks' in columns primarily. The default behavior when you write a DAX expression is for the entire column to be computed in one 'gulp'. This is one of the reasons DAX is so fast. But what happens if you need to calculate something on a row by row basis? Look at the example above again--if we just added up the [Quantity] column and the [UnitPrice] column and then multiplied them together, we would get the wrong answer.
This is an example of a calculation that must be performed row by row. When you are confronted with this situation, you have two choices:
OR
You can write a single measure that does both at once. If you don't have a column that computes the total for each row, you don't have to create one. Instead, use DAX to write a measure that will 'iterate' (go row by row) and calculate the total for each row and then add them up.
Version D:
Total Sales Iterator=SUMX('All Sales', 'All Sales'[Quantity] * 'All Sales'[Unit Price])
Notice a couple of things in Version D:
Of all four options, Version D is the most effective if you do not have a column already present. The key here is understanding that DAX's natural behavior is to gobble up columns. If you want to compute something row by row, you will have to use either a calculated column (Version A) and then a measure (Version B) or an expression using an iterator (Version D).
How can you practice this DAX? Easy. Build yourself a very small table in Excel with two columns [Unit Price] and [Quantity]. Bring that table into Power BI, and use each of the versions in turn. Remember, for Version A, you need to choose "New Column" first. For Version B, you need to choose "New Measure". For Version C, it doesn't matter if you choose new column or new measure--both will error out. For Version D, you need to choose "New Measure".
Data analytics professional
1 年In addition to reading your certification companion for the PL-300, I've been working through the tutorials on learn.microsoft.com. Both are excellent resources!