登录查看更多内容

Multi-valued attributes in a Tableau BI solution

Nirav Saraiya

Data, Reporting, and Analytics Architect at Salesforce | 5x Tableau Certified

发布日期: 2024年1月17日

+ 关注

Where and how to process multi-valued attributes - finding a sweet spot

Background:

In the realm of Business Intelligence (BI) tools, data normalization is a fundamental prerequisite. From time-tested enterprise reporting solutions like Cognos and Business Objects to modern data visualization platforms like Tableau and Power BI, the expectation is clear: input data must adhere to first-level normal form (1NF). This fundamental principle dictates that each attribute within a table should possess a single, atomic value, ensuring structural soundness and analytical clarity.

However, the reality of data often deviates from this ideal. Multi-valued attributes, those containing multiple values within a single field (e.g., "apple, banana, orange"), frequently disrupt the harmonious symphony of 1NF. While seemingly convenient, these attributes introduce a concerto of challenges for BI tools:

Analysis Complexity: Unraveling meaningful insights from multi-valued fields demands intricate data wrangling, increasing the cognitive load on analysts.

User Interface Overload: The visual representation of multi-valued fields often results in cluttered and confusing interfaces, hindering user comprehension.

Reporting Restrictions: The inherent structure of multi-valued attributes limits the flexibility and adaptability of reporting capabilities.

ETL Headaches: Extracting, transforming, and loading multi-valued data demands additional processing steps, escalating the complexity of ETL workflows.

Integration Hurdles: Seamlessly merging multi-valued data with other datasets often necessitates intricate workarounds and compromises.

Modeling Mishaps: Constructing robust and accurate data models becomes significantly more challenging in the presence of multi-valued attributes.

Performance Penalties: BI platforms and underlying reporting and database engines may experience performance bottlenecks when grappling with multi-valued fields, hindering analysis speed and responsiveness.

While data modelers rightfully champion 1NF whenever possible, organizational realities occasionally necessitate compromise. Constraints beyond control can dictate the persistence of multi-valued fields, leaving BI teams to navigate their complexities.

This article ventures into this very predicament—confronting the challenges of comma-separated multi-valued fields in BI environments. We'll explore techniques to manage these unruly data structures, leveraging business rules and counting strategies to extract valuable insights despite their inherent complexities.

Sample Data:

Here’s what our sample data looks like.

Navigating the Business rules:

Customer Data: A row is categorized as "Customer Data" if any of its tokens contain the string "customer."
Usage Data: A row is categorized as "Usage Data" if any of its tokens contain the string "observability."
Security Data: A row is categorized as "Security Data" if any of its tokens contain the string "security."
User Account Data: A row is categorized as "User Account Data" if any of its tokens contain the string "user_account."
Overlapping Buckets: Crucially, if a row satisfies multiple categorization rules, it must be counted towards all applicable categories. We have such data values in our sample.

Analysis of the business rules:

This final rule injects a captivating twist into the categorization process. While mutually exclusive buckets present a straightforward scenario, overlapping buckets—where rows contribute to multiple categories—introduce a challenge. Preserving the integrity of counting rules within reporting applications becomes paramount.

In the subsequent sections, we'll explore strategies to orchestrate this delicate dance of overlapping buckets and accurate counts, ensuring the sanctity of both business rules and analysis outcomes.

Deep Dive: Exploring Solutions for Overlapping Buckets

The conundrum of multi-valued fields categorized by potentially overlapping business rules presents a formidable challenge, demanding careful consideration of both data modeling and reporting approaches. Let’s embark on a deep dive into each, unmasking their strengths and weaknesses:

Solution Approach 1: Enforcing 1NF in Data Modeling

This technique champions the principles of data normalization, transforming the multi-valued field into a one-to-many relationship with separate rows for each token. Each row then bears a single categorization tag based on the identified string. While achieving 1NF purifies the data structure, it comes at a potential cost:

Data Explosion: Depending on the number of tokens per field, this approach can lead to a significant increase in number of rows and data volume, resulting in:

Performance Concerns: Increased processing demands on both the database and reporting engine.
Challenge for Simple Counts: Replacing “count” with “distinct count” becomes crucial to avoid overcounting for rows categorized in multiple buckets.
Complex Calculations: Averages and other aggregations require intricate formulas to ensure accuracy, moving beyond the tool’s basic functionalities.
Development and Testing Burden: Increased complexity translates to longer development and testing cycles.
User Education Needs: Educating users on interpreting dashboards and reports necessitates additional documentation and training.
Development and Maintenance Costs: The overall project cost can escalate due to the increased complexity and associated resource needs.

Solution Approach 2: Tackling it Inside Reports and Dashboards

If data modeling presents insurmountable challenges, the battle can be shifted to the reporting layer. However, success hinges on the specific strengths and limitations of the chosen tool:

Tool Dependence: The solution’s feasibility becomes intimately tied to the tool’s capabilities. Carefully assess available features and functionalities to design a viable approach.
Maintenance Headwinds: Any changes in calculations necessitate rework across all reports, potentially leading to a maintenance nightmare.
Governance Concerns: Burying business rules within reports poses governance challenges, potentially hampering transparency and traceability.
Performance Bottlenecks: Evaluating calculated fields at runtime for large datasets can overload the reporting engine, causing performance issues.
Data Ingestion Bottleneck: This approach requires the entire dataset to be ingested into the reporting engine first, eliminating potential database-side optimizations and hindering the efficiency of certain operations.

The choice between these approaches requires a nuanced evaluation of project constraints, data volume, tool capabilities, and long-term maintainability. Striking the optimal balance between data purity, reporting flexibility, and performance efficiency will ultimately determine the victor in this battle of overlapping buckets.

How to tackle it inside Tableau:

We are assuming that our reporting platform is Tableau. In this section, let’s explore three strategic scenarios, each adept at handling specific use cases.

领英推荐

Power BI vs. Tableau - Which Data Modelling Tool is…

Skillfield 5 个月前

MSBI vs Power BI: Unraveling the Dynamics of Business…

Online IT Solutions 1 年前

Power BI vs. Other Business Intelligence Tools: Which…

Convelox 1 年前

Scenario 1: Using a Parameter to create a filter

Approach: Instead of direct data field filtering, we introduce Tableau parameters populated with category values, offering users a filter-like experience.
Prerequisites: Limited categories to minimize manual parameter maintenance.Relatively static category definitions to reduce rework.
Solution Details: Here are important screenshots to build the solution in Tableau.

Limitations: Runtime evaluation for the entire dataset, potentially impacting performance for large volumes.Users can select only one value from the category filter. Multiple values can’t be selected, restricting user flexibility.Rework is required for all reports if categorization business rules change.

Scenario 2: Simulating Multi-Select with Permutations

Approach: This is an extension of the previous approach. Create a parameter list encompassing all category value combinations, providing users with a semblance of multi-select filtering despite Tableau’s single-select parameter constraint.
Prerequisites: A maximum of 4–5 categories to prevent a combinatorial explosion and overwhelming UI. Here’s a possible refinement of this idea. Instead of generating all possible permutations, you can only present the most relevant combinations.
Solution Details: Here are important screenshots to build the solution in Tableau.

Limitations:Generally unsuitable for more than 5 categories.

Scenario 3: Using two or more identical parameters to provide limited multi-select ability for Enhanced Flexibility

Approach: Employ multiple identical parameters (up to 3 or 4 at the most) to enable filtering by multiple category values. The parameter would display the same list of category values and users can select appropriate values for these parameters catering to scenarios where users need to explore specific combinations.
Prerequisites: User interest in a limited number of category combinations (typically 2–4).
Solution Details: Here are important screenshots to build the solution in Tableau.

Limitations:Increased maintenance for changes in grouping or categorization of business rules.

Final Solution: A Balance of Science and Art

Solution design, as we’ve discovered, transcends rigid rules. It’s a delicate dance of science and art, harmonizing with the unique needs of each situation. The scenarios presented offer valuable patterns to guide your choices, empowering you to craft the most effective solution for your specific data challenges and reporting requirements.

Nirav Saraiya的更多文章

The Hidden Pitfalls of Live Connections in Tableau: A Migration Story

2025年1月7日

The Hidden Pitfalls of Live Connections in Tableau: A Migration Story

Summary: Changing data sources from extracts to live (or vice versa) can expose subtle but critical differences in how…

1 条评论
Designing Large Tableau Server Clusters on VMware: A Balancing Act

2022年12月3日

Designing Large Tableau Server Clusters on VMware: A Balancing Act

Document summary: When a Tableau Server is deployed on on-premise bare-metal machines, it is installed directly on the…
Tableau Server Disaster Recovery Process Using External File Store

2022年12月1日

Tableau Server Disaster Recovery Process Using External File Store

What this article is about : Every Tableau Server implementation starts small. This article walks you through the…
How Tableau Server processes invalid combinations in CSV file when importing users

2019年8月23日

How Tableau Server processes invalid combinations in CSV file when importing users

Summary As a Tableau Server administrator, you can add new users in Tableau Server using a CSV file. This method is…
Passing a multi-value parameter to custom SQL from Tableau interfacing with DB2 database

2018年9月18日

Passing a multi-value parameter to custom SQL from Tableau interfacing with DB2 database

This is a part 2 of the two-part series. In the first part of this series, we discussed the challenges involved with…

7 条评论
Thinking outside the box: Tackling advanced use cases with complex business rule processing outside Tableau

2018年9月17日

Thinking outside the box: Tackling advanced use cases with complex business rule processing outside Tableau

Tableau is an innovator in data visualization space. It’s an impactful BI and Analytics tool and is a part of the BI…

3 条评论
Quality of your LinkedIn network

2017年12月27日

Quality of your LinkedIn network

Like everything else in life, quality supersedes quantity, especially when it comes to your LinkedIn network. Recently,…

2 条评论
AWS Summit Chicago 2017

2017年7月28日

AWS Summit Chicago 2017

Attended AWS Summit in downtown Chicago for 2 days (Jul 26, 27). A lot of interest among IT professionals.

5 条评论
Meetings and Hidden agendas

2017年5月23日

Meetings and Hidden agendas

Why do some meetings take so long? Why are they not as productive (as other meetings)? We all are humans and we all…

9 条评论
IT Certifications: To be or not to be, that is the question

2017年5月6日

IT Certifications: To be or not to be, that is the question

To certify or not to certify? That is the question of the day. As many of my colleagues are making decisions about…

See all articles

Multi-valued attributes in a Tableau BI solution

Nirav Saraiya

Data, Reporting, and Analytics Architect at Salesforce | 5x Tableau Certified

Background:

Sample Data:

Navigating the Business rules:

Analysis of the business rules:

Deep Dive: Exploring Solutions for Overlapping Buckets

Solution Approach 1: Enforcing 1NF in Data Modeling

Solution Approach 2: Tackling it Inside Reports and Dashboards

How to tackle it inside Tableau:

领英推荐

Scenario 1: Using a Parameter to create a filter

Scenario 2: Simulating Multi-Select with Permutations

Final Solution: A Balance of Science and Art

Further Reading:

Nirav Saraiya的更多文章

社区洞察

其他会员也浏览了

Important tips of Power BI

How is Power BI better than Qlik?

How to Use DAX Queries in Power BI? Data Analytics Guide 101

Data Importing and Transformation on power BI

What is wrong with BI? Addressing Key Dissatisfactions

Data Modeling in Power BI: How to Design an Efficient Data Model

Power BI: Best Practices for Report Optimization

Power BI: Transforming Data into Actionable Insights

Power BI vs. Tableau: A Comprehensive Comparison for Data Analysts

Understanding Data Models in Power BI: The Backbone of Business Intelligence

Background:

Sample Data:

Navigating the Business rules:

Analysis of the business rules:

Deep Dive: Exploring Solutions for Overlapping Buckets

Solution Approach 1: Enforcing 1NF in Data Modeling

Solution Approach 2: Tackling it Inside Reports and Dashboards

How to tackle it inside Tableau:

领英推荐

Scenario 1: Using a Parameter to create a filter

Scenario 2: Simulating Multi-Select with Permutations

Final Solution: A Balance of Science and Art

Further Reading:

Nirav Saraiya的更多文章

The Hidden Pitfalls of Live Connections in Tableau: A Migration Story

Designing Large Tableau Server Clusters on VMware: A Balancing Act

Tableau Server Disaster Recovery Process Using External File Store

How Tableau Server processes invalid combinations in CSV file when importing users

Passing a multi-value parameter to custom SQL from Tableau interfacing with DB2 database

Thinking outside the box: Tackling advanced use cases with complex business rule processing outside Tableau

Quality of your LinkedIn network

AWS Summit Chicago 2017

Meetings and Hidden agendas

IT Certifications: To be or not to be, that is the question

社区洞察

其他会员也浏览了

Important tips of Power BI

How is Power BI better than Qlik?

How to Use DAX Queries in Power BI? Data Analytics Guide 101

Data Importing and Transformation on power BI

What is wrong with BI? Addressing Key Dissatisfactions

Data Modeling in Power BI: How to Design an Efficient Data Model

Power BI: Best Practices for Report Optimization

Power BI: Transforming Data into Actionable Insights

Power BI vs. Tableau: A Comprehensive Comparison for Data Analysts

Understanding Data Models in Power BI: The Backbone of Business Intelligence