291betone philippines,55xx casino login Register Philippines.Recharge Every day and Get Bonus up-to 50%!

Data Modeling Fundamentals

What is Data Modeling?

What is Data Modeling? Representation of what the data is in the real-world. Provide us insights into characteristics and rules that apply to the data.

Lifecycle of Data Modeling

OLTP – Online Transactional Processing

Optimized for inserts and updates
Optimal for application use
Schema structure might fundamentally change for different application needs
“Focus on customers entering data”

OLAP – Online Analytical Processing

Optimized for heavy reads
Optimal for business structure, understandable by business people
Schema structure should be consistent and flexible for different business needs

The Building Blocks of Data Modeling

Data Subjects/Entities

Commonly called “entities”
Others call it “objects” or “classes”
Analogous to database tables

Attributes

Analogous to a database column
Typically associated with Data Subjects
Attribute types often “shared” across multiple entities“
Bottom-up” data modeling once semi-popular
Go down to all the attributes then model them back to entities

If there are common attributes if you want you can make the attributes more descriptive.

Attributes have descriptions & rules

Data types and size
Whether NULL values allowed
Permissible values

Attribute Tips

Decompose but carefully
Ex: “where a student lives
”“STUDENT_ADDR_AND_CITY” = Street address + City + State + Zip Code all in one attribute/field (NOT BEST PRACTICE)
“STUDENT_ADDR” + “STUDENT_CITY” + “STUDENT_STATE” + “STUDENT_ZIP” each as a separate attribute/field (BEST PRACTICE)

Relationships among data subjects

Defining the relationship amongst the entities

Business rules for data

Cardinality
Mandatory or optional relationships
Permissible attribute values(including NULLs)
“Data change dynamics”

Hierarchies in Entities/Data Subjects

Hierarchy

Special kind of entity
A special type of relationship
Think “specializtion”

Hierarchy use case

Two or more entities
That have “a lot” in common
But also “at least a little bit” different

Strong vs Weak Entities/Data Subjects

Strong entity “exists on its own terms”

Exists independent of any other entity
Does not require any other entity instances to help identify its own instance

Weak entity “needs some help”

To identify specific instance of that entity
Can’t exist without an instance of another entity
or both

Multiple Relationships Between Entities

Multiple Relationships between 2 entities

-->Recursive Relationship

--> Ternary Relationship

Data Modeling Gerund

Special Type of relationship that acts like an entity.
Relationship that have specific attributes specific to the relationship itself

Cardinality

Cardinality: "the number of something"

Maximum Cardinality

Number of instances of both sides of a relationship
Typical values: “1” or “M”(many)
Can also be a specific numeric value

1:1 Relationship

Specific number of Cardinality

Minimum Cardinality

Sometimes referred to as “participation constraint”

Total Participation ( Min. cardinality = 1)
Partial Participation ( Min. cardinality = 0)

Mandatory vs optional relationship

Mandatory relationship (Min. cardinality = 1)
Optional relationship (Min. cardinality = 0)

3rd possible value for min. cardinality

0: optional/partial participation
1: mandatory/total participation
(n): some explicit number of minimum instances“A full time lecturer must teach at least 6 classes”“A full professor must advise at least 2 other faculty members”

Crow's Foot Notations for Cardinality

Normalization

Normalization “The Key, the Whole Key, Nothing but the Key…”

1st Normal Forms

Every row (tuple) must be unique
NO repeating groups
Multi-valued attributes are in violation of 1N

2nd Normal Forms

Must be in 1NF“
No partial key dependencies”
Applies if composite primary key
If single-column (field) primary key then already in 2NF

3rd Normal Forms

Must be in 2NF“
No non-key dependency”

Forward Engineering

Typical conceptual -> logical transformaitions

Address violations of normalization

Transform M:M relationships

Add “intersection entity (or table)” to your model
Also referred to as “associative entity (or table)”
Or “bridge entity (or table)”
Purpose: decompose M:M relationship into multiple “semantically equivalent” relationships
Semantically equivalent…but “artificial”

Add foreign keys

Typical logical -> physical transformation

Denormalization

Violating normalization rules deliberate for performance gains
Aggregates
Materialized Views
Join across various tables that physically creates the result of a query
Optimized storage placement(e.g. partitioning)
Database Indices

Dimensional Modeling

"Dimensional Modeling is a design technique for databases intended to support end-user queries in a data warehouse"

Key Terms

Surrogate Keys:

Artificially created keys (usually integers) used only by the data warehouse to uniquely identify a row of dimension table
Required to implement history of slowly changing dimensions
Avoids conflicts among backend source systems
Insulates the data warehouse from source systems

Dimension Table:

By what we measure things
The who, what, when, where etc. of things
What users would want to sort, group, and filter on

Fact Table:

Also called a Measure
Measurable metric with is described by the dimensions
An observation or event

Grain:

determines what each fact row contains and in what detail
defined by dimensions in the fact table, and their details

Steps of Dimensional Modeling

Choose the business process

Describe the business process which the model builds on.

Declare the Grain

The grain of the model is the exact description of what the dimensional model should be focusing on. To clarify, you should pick the central process and describe it with one sentence

Identify the Dimensions

The dimensions must be defined within the grain. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory, etc.?

Identify the Facts

Identify numeric facts that will populate each fact table row

Star Schema

Marriage of Fact Schema to dimension schema
Dimensions relate directly to fact table only
Dimensions are deformalized. Does not have a related region lookup tables as an OLTP design
Usually dimension keys are NOT keys from the source systems, rather they are generated by the data warehouse load process(Surrogate Keys)
Dimension attributes you define determine granularity called the grain of the facts

Snowflake Schema

Dimensions relate to another dimension you have a snowflake
Snowflake causes a number of performance and usability issues and are rarely justified
The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.

Slowly Changing Dimensions

Type 0:?

Never update. Keep the original value
Useful for original based tracking

Type 1 :

Overwrite the record row
Useful when history is not a factor
Modeling and querying only by current state
Reporting will reflect the current value only

Type 2 :

Maintain History
Track complete history of dimension
Adding 3 columns to maintain type 2 tables
'effective_date' - when the new row becomes the truth
'expiration_date' - when the row expired due to new update
'is_current' - is this row the current system truth

Best Practices

Using surrogate key is recommended regardless but essential with Type 2 SCD
Expiration date should be in the distant future(9999-01-01)
Use only for true slowly changing dimensions
Fast changing attributes lead to inflated dimension tables (Fix can be found using Type 4 SCD)
Big Dimensions lead to poor performance and slow filtering

Type 3:

Keep Limited History?
Add new column to table to keep old value
old_value
new_value
Typically used for one-time, cross data change

Type 4 :

Maintain a separate history table
Addresses Type 2 scaling issues
Fast changing dimensions that are not facts
they are dimensions for modeling purpose(filter by)
they are dimensions for not changing fast enough
The mini dimension is tracked through time via the fact table
Creates dependence on the fact table to exist and never fundamentally change

Type 5:

Separate changing values into mini dimensions
Builds on type 4 SCD by embedding a mini-dimension table that is of type 1 SCD
Allows for currently assigned mini-dimension tables to be accessed along with the base dimension table without being linked to a fact table

Type 6 :

combination of type 1, 2 and 3

Data Modeling/Dimension Modeling

nagaraju juluru

Hadoop | Hive | Sqoop | PySpark| Spark Streaming | Kafka |AWS(S3,EMR,Ec2,Athena,Glue,Dynamo DB and Redshift) | Databricks | HBASE | Cassandra |Snowflake| Airflow|

Data Modeling Fundamentals

What is Data Modeling?

Lifecycle of Data Modeling

OLTP – Online Transactional Processing

OLAP – Online Analytical Processing

The Building Blocks of Data Modeling

Hierarchies in Entities/Data Subjects

Strong vs Weak Entities/Data Subjects

Multiple Relationships Between Entities

Cardinality

Maximum Cardinality

Minimum Cardinality

领英推荐

Normalization

Forward Engineering

Typical conceptual -> logical transformaitions

Typical logical -> physical transformation

Dimensional Modeling

Key Terms

Steps of Dimensional Modeling

Star Schema

Snowflake Schema

Slowly Changing Dimensions

更多精彩文章

社区洞察

其他会员也浏览了

Data Analytics for Business: Data Modeling Variations

Updated: Difference Between Business Intelligence and Data Science

Data Analysis Tools For Perfect Data Management

What is data modeling – An Introduction for Business Analysts

Data Modeling: Building a Strong Foundation for Data Architecture Part 1

Optimizing datasets for BI tools using SQL and Dimensional Modeling

Understanding the Distinctions Between Business Analysts, Data Analysts, and Data Scientists

Star Schema in Power BI: The Key to Simplicity and Speed

Mastering Data Modeling in Power BI: A Comprehensive Guide

Data Modelling Best Practices in Power BI

Data Modeling Fundamentals

What is Data Modeling?

Lifecycle of Data Modeling

OLTP – Online Transactional Processing

OLAP – Online Analytical Processing

The Building Blocks of Data Modeling

Hierarchies in Entities/Data Subjects

Strong vs Weak Entities/Data Subjects

Multiple Relationships Between Entities

Cardinality

Maximum Cardinality

Minimum Cardinality

领英推荐

Normalization

Forward Engineering

Typical conceptual -> logical transformaitions

Typical logical -> physical transformation

Dimensional Modeling

Key Terms

Steps of Dimensional Modeling

Star Schema

Snowflake Schema

Slowly Changing Dimensions

AWS: IAM (Identity and Access Management)

2024年7月6日

AWS: DynamoDB

2024年7月2日

AWS: Cloud Formation

2024年7月1日

AWS: Basics

2024年6月30日

Types of Machine Learning Models

2024年6月15日

SPARK - Partitioning

2024年4月21日

Event Driven AWS Services

2024年4月8日

Function Structure

2024年4月8日

AWS Transfer for SFTP

2024年4月4日

Empowering the Mobile Citizen Developer

2016年10月12日

社区洞察

其他会员也浏览了

Data Analytics for Business: Data Modeling Variations

Updated: Difference Between Business Intelligence and Data Science

Data Analysis Tools For Perfect Data Management

What is data modeling – An Introduction for Business Analysts

Data Modeling: Building a Strong Foundation for Data Architecture Part 1

Optimizing datasets for BI tools using SQL and Dimensional Modeling

Understanding the Distinctions Between Business Analysts, Data Analysts, and Data Scientists

Star Schema in Power BI: The Key to Simplicity and Speed

Mastering Data Modeling in Power BI: A Comprehensive Guide

Data Modelling Best Practices in Power BI