Why You Shouldn't Avoid Calculated Columns in Power BI
Introduction
Recently Ed Hansberry posted a fantastic article, Why You Should Avoid Calculated Columns in Power BI — ehansalytics . It's a great article in which Ed clearly articulates the standard Power BI best practice/mantra which can be summarized as:
Now, I won't reiterate the reasons here, it has to do with model size, refresh performance, etc. Read the article, I highly recommend it. However, as is too often the case with most best practices, a lot of nuance is generally glossed over. This is certainly true for the mantra in question, the use of DAX calculated columns. So, if you have read the best practices around this and are concerned about your use of DAX calculated columns, let me assure you that it is not the end of the world. Here's why.
Scope
The best practices around DAX calculated columns are really geared toward data modeling professionals, not business users. And let's face it, professional data modelers probably only represent a small fraction, maybe 10%-20% of the Power BI user base. Professional data modelers are highly technical and are likely versed in a variety of programming languages such as SQL, Power Query (M), as well as DAX, R, Python, you name it.
Let's contrast this with a typical business user of Power BI. Most business users likely gravitate initially towards DAX because it has a familiar feel and many similar functions to Excel. Being somewhat less technical than professional data modelers, this may end up being the only language they ever really use.
It is also quite likely that these business users do not have access to source systems and probably wouldn't even know where to begin even if they had access. And, Power Query (M) is an entirely different sort of programming language than something like DAX or Excel functions.
None of this is to insult the average business user but they are a business user and not a professional data modeler for a reason. They know more about the business and less about the technical stuff.
The point is, requiring business users to learn something other than DAX really amps up the learning curve for Power BI and that's not really what anyone wants. We want more people to adopt and use Power BI, not less.
Scale
Worrying about how much a calculated column is compressed, its cardinality and how much storage it takes up in the model really only comes into play at scale, like in the 200 million row table from Ed's example. And even then, the calculated column still only takes up less than 1% of the total storage for the model.
领英推荐
So, again, for professional data modelers dealing with enterprise scale data, these sorts of things might matter. But for the typical business user dealing with tables with rows that number in the hundreds, thousands, tens of thousands or even a few million, the additional storage consumed and perhaps performance are barely noticeable, if at all. At the end of the day, it simply does not matter.
Maintainability
Another aspect of this discussion is overall maintainability of the solution, a subject that is too often overlooked. In most software solutions, writing a system in three different languages is generally considered a bad idea. So if you implement a software system partly in PHP, some C# and some Java, that's a lot of extra technical debt to incur versus writing the system in a single language. It takes additional expertise to work on the system and make changes, fix bugs, etc. because you have to know all three different languages. So, there's something to be said for just using a single language when implementing software.
Now, consider this with respect to the guidance that DAX calculated columns are bad. Well, if you are already going to have DAX measures and you can also write your calculated columns in DAX, that's pretty good maintainability. Single language, single place to make changes, etc. Contrast that with the case where some of your custom columns are in SQL, some are in Power Query and some are in DAX. That's pretty poor overall maintainability.
Data Refresh Performance
It is generally argued that creating columns in Power Query is better than DAX because data refreshes perform better. Well, not always. If your calculated column in Power Query breaks query folding, then it's quite likely that you just decreased performance rather than increased performance over a DAX calculated column. So, absolutes are dangerous.
Measures vs. Columns
This is another "best practice" that needs some nuance. Look, measures are great, but they have some problems. First, they require a bit more technical expertise than calculated columns because you are dealing with filter context (in other words "filtering").
Second, measures can often suffer from the dreaded "measure totals problem" whereas calculated columns never suffer such issues. So, again, measures tend to increase the technical complexity involved in a solution and this can be burdensome to business users just trying to do some basic analysis of their data.
Conclusion
Again, I want to be clear, Ed's article is a great article and there is nothing "wrong" with the advice and guidance provided. It is absolutely the generally accepted "best practice" when it comes to professional data modeling. The point here is that as with all such "best practices", the application of them is seldom, if ever, universal. So, if you are worried that your DAX calculated columns are going to bring about the apocalypse, don't. If you are part of the 80%-90% of Power BI's target market, business users, you almost certainly don't have to worry about the issue in the slightest.
Director of Business Intelligence, Stonebridge Companies
7 个月Good article Greg Deckler! Like exploring the nuances of this topic.
Sales Manager @ Infopro Digital Automotive | Inside Sales, Data Analytics
1 年?? ?? ?? A great example of "know the rules so that you know when - and how - to break them." Principles are flexible when perspective is applied. Thanks for sharing this. Helpful.
POWER BI or DIE Podcast & Streams ???| Microsoft Data Plattform @ BI or DIE ??| Self-Service, Agile BI and Analytics ??| Power BI Fanboy ??| Sketchnotes ??| YouTube ??| TDWI Expert & Author ??
1 年good point! this is the same as with other features like Auto-Time-Intelligence and Auto-Relationships: there is a reason why there are default settings in Power BI that are not compliant to best-practises of professionals. The main target market of Microsoft are citizen developers that are completely fine with al this. The reason why Excel is so dominant is its flexibility and it is good that Power BI has some of it too
Focused on Automation and Analytics | Microsoft Power Platform fanatic | Enterprise DNA Expert
1 年Greg Deckler (Microsoft MVP) - great article including the various perspectives. Ultimately, my advice to people is - do what works for your situation.
Power BI Specialist
1 年Thanks for the validation Greg, I'll continue making my helper columns whenever I see fit ??