DATA CARDINALITY
Mustafa Qizilbash
‘Open for New Opportunities (Globally), Author, Data & AI Practitioner & CDMP Certified, Innovator of Four 4s Formula, DAC Architecture, PVP Approach
This concept is commonplace among database modelers but may be unfamiliar to business users. Grasping this concept can address 90% of your data-related challenges. If you lack a robust Data Model, you'll likely encounter issues like Chasm and Fan Traps.
Data Cardinality pertains to maintaining the level of detail within a table and can be categorized into two types:
High Cardinality: This occurs when a table has minimal or no duplicate entries. Tables with primary keys (Pkeys) primarily fall into the High Cardinality category.
Low Cardinality: This occurs when a table contains a significant number of duplicate entries. Tables with foreign keys (Fkeys) are often associated with Low Cardinality.
For instance, consider a Product table that includes attributes such as Product ID, Product Name, and Product Manufactured Date. While this table can encompass hundreds of products, each product entry is unique, as it's present only once in the table. Therefore, we can classify this table as having High Cardinality.
Now, let's use the same example for a Sales table. When products are sold, multiple records may reference the same product, resulting in thousands of rows in the Sales table. In this case, the Product ID is repeated multiple times within the Sales table, indicating Low Cardinality concerning Products.
We previously delved into Data Modeling in a distinct discussion, where we focused on normalizing data to eliminate data redundancy. The fundamental principle of normalization is closely tied to Cardinality. When we decompose denormalized tables into multiple tables to reduce duplicate records, the primary table takes on High Cardinality, typically associated with a Primary Key (Pkey).
It's important to note that not all tables with Foreign Keys (Fkey) are inherently Low Cardinality; some may also have Primary Keys and serve as the primary table for another foreign key-related table. Thus, we cannot universally claim that all Pkey tables exhibit the highest Cardinality.
领英推荐
There are three cardinality degrees:
·???????? One to One (1:1): Each row in table A corresponds to a single row in table B.
·???????? One to Many (1:M): Each row in table A relates to multiple rows in table B.
·???????? Many to Many (M:N): Multiple rows in table A connect to multiple rows in table B.
This concept assumes great significance when joining tables without relying on direct Primary Key (Pkey) and Foreign Key (Fkey) relationships. Incorrectly choosing columns for joins can result in Cartesian data, as explained in a separate discussion.
Understanding Cardinality also proves crucial for impact analysis. Whenever there's a need to alter the table structure due to a business requirement, the first question to ask is whether this change will affect the Cardinality of any table. If the answer is affirmative, Data Lineage comes into play, allowing you to visualize the end-to-end impact, which necessitates the presence of Metadata Management.
‘All the primary key tables are High Cardinality tables and Fkey (Foreign Key) tables are Low Cardinality tables.’
Cheers.
And 0 to many?
Head of Data Office Trading & Shipping | CDMP?
1 年Thanks Mustafa for sharing about this key concept for data modeling and application/database design