A DAX tale told in four versions
Photo Credit: Mike Hindle on Unsplash

A DAX tale told in four versions

Welcome back flatlanders. First, a personal note: I have been travelling a lot in the last couple of weeks, and therefore have been remiss in my blogging. I promise to try to do better. But it was worth it. I went to "the" quilting show in Paducah Kentucky. If you are a quilt fan, follow this link to see the Best in Show quilt (you can skip the video at the top and just scroll down to see the quilt itself). I promise that you will be amazed!

https://www.wpsdlocal6.com/news/sweet-madame-blue-wins-best-of-show-at-2023-aqs-quiltweek/article_70d1d876-e3dc-11ed-a092-d7739b446337.html

Now back to data. In this post, I want to tackle one of the first things you need to know about DAX and your data model: how to add calculations (measures) that you need. Before we dive in, let's review the uses for DAX (Data Analysis Expressions). DAX is the language used once you are inside the Power BI Desktop (or DAX Studio and Tabular Editor) to build four different types of content:

  1. New tables
  2. New columns (calculated columns)
  3. Measures
  4. Row level security roles

Always try to create new columns and new tables as far upstream as possible and as far downstream as necessary (shout out to the great Matthew Roche for coining this aphorism). What does it mean exactly? It means do everything you can to avoid creating new columns or tables in the Power BI Desktop using DAX. Why? Because these creations do not compress as well as columns and tables that are imported into the Desktop from the Power Query Editor. But sometimes you don’t have access to the source (e.g. if it is in a database). Your next best option is to do the work in the Power Query Editor. Failing that option, you will have to use DAX.?

With that disclaimer firmly in mind, let's explore how to create a calculated column, a measure and an iterator measure. Measures are calculations that are not present in the data. If you think about it, one of the great strengths of Power BI is the ability to bring data together from a variety of sources. If you do that, it is highly likely that you won't have all the calculations you might want because the data has never been 'all together' before. Here's an easy example for us to start with:

No alt text provided for this image

In this case, we have a column for quantity and we have a column for the unit price, but we don't have a column with the total price paid for each row (e.g. [Quantity]* [UnitPrice]). Now we Flatlanders instinctively want to reach for a column to solve this problem. And you can….but try to create the column either in the source or in the Power Query Editor. Assuming you can't or won't do either of these things, here's how you would write the DAX to create this column:

Version A:

Total Sale='All Sales'[Quantity] * 'All Sales'[UnitPrice]

Notice a couple of things in Version A:

  1. There is no function?in this formula (and DAX is a functional language).
  2. Column names are enclosed in square brackets.
  3. The table names are enclosed in single quotations (this is only required if there is a space in the table name, but I use single quotes all the time because I can't be bothered to remember when to use them).

This is the expression that will produce a new column called [Total Sale]. Once the column is there (whether you added it further upstream or used DAX to calculate it as we are doing here) you can now use DAX to add it up. In other words, you can write a measure using this column now:

Version B:

Total Sales=SUM('All Sales'[Total Sale])

?Notice a couple of things in Version B:

  1. Now we see a function--in this case the function SUM.
  2. The column name is accompanied by its table name. When you reference a column name in DAX, you always "fully qualify" it by putting its table name in front of it.

More importantly, there is a major difference between Version A and Version B. Version A produces a column where the calculation is performed row by row. Version B adds up the entire column. Let's look at a Version C, which is wrong:

Version C:

Wrong Total Sale= SUM('All Sales'[Quantity])*SUM('All Sales'[Unit Price]])

?Notice a couple of things about Version C:

  1. Here we see the use of a function twice: once to add up the [Quantity] column and once to add up the [UnitPrice] column. This version is wrong because it is written as if we are creating a column and as if we are creating a measure (in other words it is a mashup of versions A and B). And while some mash ups are better together, some are just wrong. This is just wrong. It won't work.
  2. (I have included this incorrect construction because when I first started writing DAX I always tried to write the expression this way and could not wrap my head around why it wasn't working.)

Understanding why Version C doesn't work involves understanding how DAX 'thinks'. (I know, DAX is inanimate and doesn't think per se, but if you have ever wrestled with writing an expression, you will believe DAX is a living, breathing entity with a mind of its own!) DAX 'thinks' in columns primarily. The default behavior when you write a DAX expression is for the entire column to be computed in one 'gulp'. This is one of the reasons DAX is so fast. But what happens if you need to calculate something on a row by row basis? Look at the example above again--if we just added up the [Quantity] column and the [UnitPrice] column and then multiplied them together, we would get the wrong answer.

No alt text provided for this image
What happens when we don't multiply row by row


This is an example of a calculation that must be performed row by row. When you are confronted with this situation, you have two choices:

  1. Compute a calculated column that multiplies [Quantity] by [UnitPrice] and returns a value for each row (Version A)
  2. Write a measure that adds up the calculated column (Version B)

OR

You can write a single measure that does both at once. If you don't have a column that computes the total for each row, you don't have to create one. Instead, use DAX to write a measure that will 'iterate' (go row by row) and calculate the total for each row and then add them up.

Version D:

Total Sales Iterator=SUMX('All Sales', 'All Sales'[Quantity] * 'All Sales'[Unit Price])

Notice a couple of things in Version D:

  1. This expression uses the function SUMX. The presence of an X at the end of the function tells DAX to perform the calculation row by row. Functions with an X at the end of them are called iterators. The first argument of the expression is the table that you want to iterate on. The second argument is the calculation you want performed.
  2. You will not see the individual row results. The iterator function keeps track of them in memory, but does not materialize ("show") them to you. If you need to see the results row by row, you have to create a calculated column. When you use the measure in a visual, it will materialize the results.
  3. If you need to use one of the individual row results in a slicer or filter, you will have to create a calculated column. You can't use a measure in a slicer or a filter.

Of all four options, Version D is the most effective if you do not have a column already present. The key here is understanding that DAX's natural behavior is to gobble up columns. If you want to compute something row by row, you will have to use either a calculated column (Version A) and then a measure (Version B) or an expression using an iterator (Version D).

How can you practice this DAX? Easy. Build yourself a very small table in Excel with two columns [Unit Price] and [Quantity]. Bring that table into Power BI, and use each of the versions in turn. Remember, for Version A, you need to choose "New Column" first. For Version B, you need to choose "New Measure". For Version C, it doesn't matter if you choose new column or new measure--both will error out. For Version D, you need to choose "New Measure".

David Kuralt

Data analytics professional

1 年

In addition to reading your certification companion for the PL-300, I've been working through the tutorials on learn.microsoft.com. Both are excellent resources!

要查看或添加评论,请登录

Jessica Jolly的更多文章

  • Semantic model now visible

    Semantic model now visible

    Gooooood morning (or afternoon or evening) folks. I always say to myself that I need to keep up on all the changes…

    2 条评论
  • Expanded Tables: Aha!

    Expanded Tables: Aha!

    It has been a minute since I have posted. I could tell you that I have been too busy to post, but alas, that wouldn't…

    5 条评论
  • Fabric (Microsoft and otherwise)

    Fabric (Microsoft and otherwise)

    Hello friendly flatlanders! I realize that most of you have day jobs (and lives) so the big announcement from Microsoft…

    3 条评论
  • Relationships: It's complicated

    Relationships: It's complicated

    Hello fellow flatlanders. Today, we will talk about relationships in your data model.

  • Weeding your Data

    Weeding your Data

    I love to garden. Who knew? For years, I didn't have any dirt to 'play in'.

    4 条评论
  • The Granular Cardinal

    The Granular Cardinal

    Does this title make no sense to you? Yay! That was my intention. I know that when, as a flatlander, you see the words…

    3 条评论
  • Data Types--why the hype?

    Data Types--why the hype?

    Calling all flatlanders…(if you missed my definition of flatlander, check out last week's newsletter). Last week, I…

    3 条评论
  • Muddling through Modeling

    Muddling through Modeling

    Are you a flatlander? I am. Not just because I live in Illinois, which is flat.

    6 条评论
  • #Lifelong Learner

    #Lifelong Learner

    Recently, I posted here that I had completed a class on edX on O365 Service Adoption. A kind friend responded and…

    2 条评论
  • DIY Community

    DIY Community

    When I started my business, I did what we are told to do to build networks: I connected to a lot of people on LinkedIn.…

    12 条评论

社区洞察

其他会员也浏览了