Data Modeling/Dimension Modeling
nagaraju juluru
Hadoop | Hive | Sqoop | PySpark| Spark Streaming | Kafka |AWS(S3,EMR,Ec2,Athena,Glue,Dynamo DB and Redshift) | Databricks | HBASE | Cassandra |Snowflake| Airflow|
Data Modeling Fundamentals
What is Data Modeling?
Lifecycle of Data Modeling
OLTP – Online Transactional Processing
OLAP – Online Analytical Processing
The Building Blocks of Data Modeling
Data Subjects/Entities
Attributes
Attributes have descriptions & rules
Attribute Tips
Relationships among data subjects
Business rules for data
Hierarchies in Entities/Data Subjects
Hierarchy
Hierarchy use case
Strong vs Weak Entities/Data Subjects
Strong entity “exists on its own terms”
Weak entity “needs some help”
Multiple Relationships Between Entities
-->Recursive Relationship
--> Ternary Relationship
Data Modeling Gerund
Cardinality
Cardinality: "the number of something"
Maximum Cardinality
1:1 Relationship
Specific number of Cardinality
Minimum Cardinality
Sometimes referred to as “participation constraint”
领英推荐
Mandatory vs optional relationship
3rd possible value for min. cardinality
Crow's Foot Notations for Cardinality
Normalization
Normalization “The Key, the Whole Key, Nothing but the Key…”
1st Normal Forms
2nd Normal Forms
3rd Normal Forms
Forward Engineering
Typical conceptual -> logical transformaitions
Transform M:M relationships
Typical logical -> physical transformation
Denormalization
Dimensional Modeling
"Dimensional Modeling is a design technique for databases intended to support end-user queries in a data warehouse"
Key Terms
Surrogate Keys:
Dimension Table:
Fact Table:
Grain:
Steps of Dimensional Modeling
Choose the business process
Declare the Grain
Identify the Dimensions
Identify the Facts
Star Schema
Snowflake Schema
Slowly Changing Dimensions
Type 0:?
Type 1 :
Type 2 :
Best Practices
Type 3:
Type 4 :
Type 5:
Type 6 :