Making Tradeoffs: The Data Edition
PC: By Radeksz - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=8957473

Making Tradeoffs: The Data Edition

Everything in life involves tradeoffs, whether that's an economic one like the Williamson tradeoff model illustrated above or a personal tradeoff we make when we decide to drive somewhere instead of walking for example. While there are entire books on the topic of tradeoffs, I want to briefly shed light onto tradeoffs that apply specifically to data modeling, which include topics like:

  • Speed
  • Cost
  • Existing resources like personnel
  • Available tools to buy or use
  • Capabilities of both tools and people
  • Current infrastructure
  • Requests from users (careful about scope creep though!)
  • User accessibility

The above items cover many aspects of data modeling, but it's by no means all-inclusive. For those of us who work a lot in data analytics tools like Power BI, we know that with updates that comes out every month, it often feels like we're just trying to keep up with everything ourselves. But I've also personally come to realize as well that the opportunity to potentially learn something new every day is why I enjoy working with technology like this.

Accessible Analysis

One example of a tradeoff in Power BI is between user accessibility, and the available features and efficiencies of the data model. Making Power BI useful to the ultimate consumers of models can present many challenges because those that develop these models aren't ultimately the end users of them.

In the latest video of the Power BI Weekly series, we explore how to minimize the number of clicks it takes to access views within a visual. In this example, we take a hierarchy of time intelligence levels for dates and flatten it so that the end user can access any given level of detail in a single click.

No alt text provided for this image

Navigating through the levels along the x-axis of a line chart for example can become confusing, even for those (like developers) who are already quite familiar with the levels to begin with. In the illustration below, we can see how this flattening process works for the hierarchy of time series labels.

No alt text provided for this image

It can seem a bit daunting to understand how the data changes below the surface because they can present challenges on the development side, even if the changes aren't imminently noticeable for the end user. Like last week's Power BI Weekly topic, the concepts of pivoting and unpivoting a data table are key to making this functionality work!

No alt text provided for this image

Using Programming Languages in Power BI

Another topic that I explore and discuss in a number of Power BI courses in the LinkedIn Learning library (including the Power BI Weekly series) is integrating programming languages like R and Python directly into Power BI. Approaches for these integrations include in the original data source, as part of the ETL framework, an additional step to transform an existing query, and scripts to build custom visuals.

No alt text provided for this image

The integration of these programming languages directly in Power BI come with tradeoffs of their own as well. While they increase the data modeling capabilities and flexibilities of Power BI, in particular for statistics and subsequently for algorithms like machine learning, they also tend to run slower than built-in or native functions within Power BI. Whether or not we choose to use the modeling capabilities of Python or R depend on the value they add to the existing data model without unnecessarily slowing down the existing overall model performance.

There are also limitations to the libraries and capabilities that Power BI supports, in particular for the shared and scaled versions of these Power BI models in the cloud when we upload them. As of today, the Power BI service supports many more R libraries than it does Python libraries, if that guides your decision on what programming language to potentially utilize.

Python Options

Right now, Python is the hottest and most popular programming language in the world (TIOBE index). This means there's always something new to learn within the language, whether that's exploring algorithms for machine learning models or building ETL frameworks that increase our opportunities for connecting to different types of data sources. In the Power BI Weekly series, you can check out how to:

R Options

Similarly, we can do many similar functionalities using R, although as I mention above, R supports more libraries in the Power BI service. This means that we theoretically have more functions and capabilities within these libraries to connect Power BI to, whether that's to:

If you're looking for modeling approaches in the Power Query Editor in Power BI Desktop, as well as modeling approaches using Power BI visuals, check out two entire courses on machine learning for logistic regression modeling and data reduction techniques (hierarchical clustering, KMeans, PCA) that I created for the LinkedIn Learning library in the past year. These courses don't exclusively focus on R in Power BI, but they begin by building the models using a more manual approach starting in Excel, then moving the process into lines of R code, and then finally running the R scripts directly in Power BI Desktop. Check out what the outcomes of these models look like as visuals in the slideshow below!

Aim For Simplicity

No alt text provided for this image

There's often a tendency within the data modeling profession to try to use a complicated model when a much simpler model will do just as well. It's important to note that just because we can run a complicated algorithm on a data model, that doesn't mean we should run one! Often machine learning models are much more expensive to run (in terms of cost for the physical computing power as well as the human effort that goes on behind the model) than much simpler models yet there's no material difference in the actual outcome or result. Let's take text analytics for example on the song "Istanbul (Not Constantinople)" by the cult favorite band They Might Be Giants. The re library in Python lets us run regex analysis on character patterns in text blocks like these song lyrics by counting the occurrence of each word in conjunction with other Python libraries like pandas and numpy in a summarized DataFrame series. We don't need to use a more elaborate (or expensive for that matter) natural language processing (NLP) algorithm in this example when a regex approach would work just as well instead. Some examples of where NLP models can add a lot of value include analyzing positive or negative sentiment for sentences of text like travel reviews. Other helpful text analytics algorithms built directly into Power BI through Azure Cognitive Services functions (note that a Power BI Premium account is required) include language detection and key phrase extraction.

What's Coming Up

The next video in the Power BI Weekly series will explore how to add dynamic titles to visuals that change depending on the selected data points or categories within a page or a report view. It's a great way to better communicate the displayed elements and data points of the model for the end user through clear text labels.

Looking a bit farther into the future, I wrapped up recording the weekly series videos for Q3 2022 a few weeks ago. Power BI continues to excite me with the updates that come out every month and I'm learning something new within the tool all the time myself! Some of the Power BI course topics to come later in the year include the amazing built-in fuzzy matching algorithm, the CONCATENATEX DAX function, and editable shapefiles in mapping visuals. Stay tuned and I'm looking forward to sharing these updates with you in the coming weeks and months!

-HW

Pooja Pise

Senior Business Analyst @ InfoCepts | Business Analytics

2 年

What a great article! Thank you for the detailed insights on trade off ??

Amazing read as always. I’ve found the trade off to be an important and real world topic when designing our dashboard. There are technical knowledge and speed costs associated with too much being added in power bi. Great article!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了