Why You Shouldn't Avoid Calculated Columns in Power BI
riotfest.org

Why You Shouldn't Avoid Calculated Columns in Power BI

Introduction

Recently Ed Hansberry posted a fantastic article, Why You Should Avoid Calculated Columns in Power BI — ehansalytics . It's a great article in which Ed clearly articulates the standard Power BI best practice/mantra which can be summarized as:

  • If you have to use calculated columns, create those in Power Query or the source instead of DAX
  • Even better, use DAX measures instead of calculated columns

Now, I won't reiterate the reasons here, it has to do with model size, refresh performance, etc. Read the article, I highly recommend it. However, as is too often the case with most best practices, a lot of nuance is generally glossed over. This is certainly true for the mantra in question, the use of DAX calculated columns. So, if you have read the best practices around this and are concerned about your use of DAX calculated columns, let me assure you that it is not the end of the world. Here's why.

Scope

The best practices around DAX calculated columns are really geared toward data modeling professionals, not business users. And let's face it, professional data modelers probably only represent a small fraction, maybe 10%-20% of the Power BI user base. Professional data modelers are highly technical and are likely versed in a variety of programming languages such as SQL, Power Query (M), as well as DAX, R, Python, you name it.

Let's contrast this with a typical business user of Power BI. Most business users likely gravitate initially towards DAX because it has a familiar feel and many similar functions to Excel. Being somewhat less technical than professional data modelers, this may end up being the only language they ever really use.

It is also quite likely that these business users do not have access to source systems and probably wouldn't even know where to begin even if they had access. And, Power Query (M) is an entirely different sort of programming language than something like DAX or Excel functions.

None of this is to insult the average business user but they are a business user and not a professional data modeler for a reason. They know more about the business and less about the technical stuff.

The point is, requiring business users to learn something other than DAX really amps up the learning curve for Power BI and that's not really what anyone wants. We want more people to adopt and use Power BI, not less.

Scale

Worrying about how much a calculated column is compressed, its cardinality and how much storage it takes up in the model really only comes into play at scale, like in the 200 million row table from Ed's example. And even then, the calculated column still only takes up less than 1% of the total storage for the model.

So, again, for professional data modelers dealing with enterprise scale data, these sorts of things might matter. But for the typical business user dealing with tables with rows that number in the hundreds, thousands, tens of thousands or even a few million, the additional storage consumed and perhaps performance are barely noticeable, if at all. At the end of the day, it simply does not matter.

Maintainability

Another aspect of this discussion is overall maintainability of the solution, a subject that is too often overlooked. In most software solutions, writing a system in three different languages is generally considered a bad idea. So if you implement a software system partly in PHP, some C# and some Java, that's a lot of extra technical debt to incur versus writing the system in a single language. It takes additional expertise to work on the system and make changes, fix bugs, etc. because you have to know all three different languages. So, there's something to be said for just using a single language when implementing software.

Now, consider this with respect to the guidance that DAX calculated columns are bad. Well, if you are already going to have DAX measures and you can also write your calculated columns in DAX, that's pretty good maintainability. Single language, single place to make changes, etc. Contrast that with the case where some of your custom columns are in SQL, some are in Power Query and some are in DAX. That's pretty poor overall maintainability.

Data Refresh Performance

It is generally argued that creating columns in Power Query is better than DAX because data refreshes perform better. Well, not always. If your calculated column in Power Query breaks query folding, then it's quite likely that you just decreased performance rather than increased performance over a DAX calculated column. So, absolutes are dangerous.

Measures vs. Columns

This is another "best practice" that needs some nuance. Look, measures are great, but they have some problems. First, they require a bit more technical expertise than calculated columns because you are dealing with filter context (in other words "filtering").

Second, measures can often suffer from the dreaded "measure totals problem" whereas calculated columns never suffer such issues. So, again, measures tend to increase the technical complexity involved in a solution and this can be burdensome to business users just trying to do some basic analysis of their data.

Conclusion

Again, I want to be clear, Ed's article is a great article and there is nothing "wrong" with the advice and guidance provided. It is absolutely the generally accepted "best practice" when it comes to professional data modeling. The point here is that as with all such "best practices", the application of them is seldom, if ever, universal. So, if you are worried that your DAX calculated columns are going to bring about the apocalypse, don't. If you are part of the 80%-90% of Power BI's target market, business users, you almost certainly don't have to worry about the issue in the slightest.

George Vandyke

Director of Business Intelligence, Stonebridge Companies

7 个月

Good article Greg Deckler! Like exploring the nuances of this topic.

Omena A.

Sales Manager @ Infopro Digital Automotive | Inside Sales, Data Analytics

1 年

?? ?? ?? A great example of "know the rules so that you know when - and how - to break them." Principles are flexible when perspective is applied. Thanks for sharing this. Helpful.

Artur K?nig

POWER BI or DIE Podcast & Streams ???| Microsoft Data Plattform @ BI or DIE ??| Self-Service, Agile BI and Analytics ??| Power BI Fanboy ??| Sketchnotes ??| YouTube ??| TDWI Expert & Author ??

1 年

good point! this is the same as with other features like Auto-Time-Intelligence and Auto-Relationships: there is a reason why there are default settings in Power BI that are not compliant to best-practises of professionals. The main target market of Microsoft are citizen developers that are completely fine with al this. The reason why Excel is so dominant is its flexibility and it is good that Power BI has some of it too

Tim Weinzapfel

Focused on Automation and Analytics | Microsoft Power Platform fanatic | Enterprise DNA Expert

1 年

Greg Deckler (Microsoft MVP) - great article including the various perspectives. Ultimately, my advice to people is - do what works for your situation.

Timothy Osborn

Power BI Specialist

1 年

Thanks for the validation Greg, I'll continue making my helper columns whenever I see fit ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了