Data warehousing Concepts Using Oracle Technology
Tamer SHALABY
Technical Director specializing in Oracle DBA and Analytics at Data & Analytics
Data warehousing Concepts Using Oracle Technology
Knowledge Transfer Session
Prepared By
Tamer Shalaby - Data & Analytics
?
?
Objective:
To discuss/explain technical details about the data warehousing concepts.
Example on data flow from a daily OLTP Banking database to data-warehouse database that will be used for analytical dashboards
Overview of the Oracle data warehousing implementation
■ What is a Data Warehouse?
■ Data Warehouse Architectures
■ Extracting Information from a Data Warehouse
■ What is a Data Warehouse?
·??????? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
·??????? It usually contains historical data derived from transaction data, but can include data from other sources.
·??????? Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources.
?
This helps in:
■ Maintaining historical records
■ Analyzing the data, to gain a better understanding of the business and to improve the business.
Nonvolatile
■ Means that, once logged into the data warehouse, data should not change.
This is logical because the purpose of a data warehouse is to enable you to analyze what has occurred.
Data Warehouse Architecture with a Staging Area :
·??????? You need to clean and process your operational data before putting it into the warehouse, most data warehouses use a staging area.
·??????? A staging area simplifies building summaries and general warehouse management.
?
·??????? The following Figure illustrates this typical architecture.
·??????? This is the suggested model to use? .
Design in Data-warehouses
?
·??????? Define the business requirements and agreed upon the scope of your application.
·??????? ?Create a conceptual design.
·??????? Now you need to translate your requirements into a system deliverable.
·??????? To do so, you create the logical and physical design for the data warehouse. You then define 2 steps:
?
Logical
Physical
Creating a Logical Design :
A logical design is conceptual and abstract.
You do not deal with the physical implementation details yet.
You deal only with defining the types of information that you need.
?
·??????? You can use to model your organization's logical information requirements is entity-relationship modeling.
·??????? Entity-relationship modeling involves identifying the things of importance (entities), the properties of these things (attributes), and how they are related to one another (relationships).
Data Warehousing Schemas
A schema is a collection of database objects, including tables, views, indexes, and synonyms.
You can arrange schema objects in the schema models designed for data warehousing in a variety of ways.
Most data warehouses use a dimensional model.
·?????? The most natural way to model a data warehouse is as a star schema,
???????? where only one join establishes the relationship between the fact table and any one of ?the dimension tables.
?
·???????? A star schema optimizes performance by keeping queries simple and providing fast response time.
·?????? Fact tables and dimension tables are the two types of objects commonly used in dimensional data warehouse schemas.
?
·?????? Fact Tables
Are the large tables in your data warehouse schema that store business measurements.
Fact tables typically contain facts and foreign keys to the dimension tables.
Fact tables represent data, usually numeric and additive, that can be analyzed and examined.
·?????? Requirements of Fact Tables
You must define a fact table for each star schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all of its foreign keys.
Examples on Fact Tables include
sales,
cost,
profit.
Dimension tables,
·?????? also known as lookup or reference tables, contain the relatively static data in the data warehouse.
·?????? Dimension tables store the information you normally use to contain queries.
·?????? Dimension tables are usually textual and descriptive and you can use them as the row headers of the result set.
·?????? Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis.
These natural rollups or aggregations within a dimension table are called hierarchies.
Examples on dimension tables :
customers ,
products.
Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data.
?
A hierarchy can be used to define data aggregation.
For example:
In a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level.
A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it.
Data values at lower levels aggregate into the data values at higher levels.
Levels :
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels.
Level Relationships :
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information.
They define the parent-child relationship between the levels in a hierarchy.
Data Warehousing Objects: Relationships
Relationships guarantee business integrity.
An example is : ?
if a business sells something, there is obviously a customer and a product.
Designing a relationship between the sales information in the fact table and the dimension tables products and customers enforces the business rules in databases.
Example of Data Warehousing Objects and Their Relationships
The Figure above illustrates a common example of a sales fact table and dimension tables
customers, products, promotions, times, and channels.
Moving from Logical to Physical Design
·?????? Logical design :
???????? is what you draw with a pen and paper before building your data warehouse.
·?????? Physical design :
???????? is the creation of the database with SQL statements.
During the logical design phase, you defined a model for your data warehouse consisting of entities, attributes, and relationships.
The entities are linked together using relationships.
Attributes are used to describe the entities.
The unique identifier (UID) distinguishes between one instance of an entity and another.
During the physical design process, you translate the expected schemas into actual database structures. You have to map:
■ Entities to tables
■ Relationships to foreign key constraints
■ Attributes to columns
■ Primary unique identifiers to primary key constraints
■ Unique identifiers to unique key constraints
Physical Design Structures
Once you have converted your logical design to a physical one, you will need to create some or all of the following structures:
■ Tablespaces
■ Tables and Partitioned Tables
■ Views
■ Integrity Constraints
■ Dimensions
?
Additionally, the following structures may be created for performance improvement:
?■ Indexes
■ Materialized Views
Tablespaces
A tablespace consists of one or more datafiles, which are physical structures within the operating system you are using.
?
Tables and Partitioned Tables
Tables are the basic unit of data storage. They are the container for the expected amount of raw data in your data warehouse.
Using partitioned tables instead of nonpartitioned ones addresses the key problem of supporting very large data volumes by allowing you to divide them into smaller and more manageable pieces
Views
A view is a tailored presentation of the data contained in one or more tables or other views. A view takes the output of a query and treats it as a table.
Integrity Constraints
Integrity constraints are used to enforce business rules associated with your database
and to prevent having invalid information in the tables.
In data warehousing environments, constraints are only used for query
rewrite. NOT NULL constraints are particularly common in data warehouses.
?
?Indexes
Indexes are optional structures associated with tables or clusters.
Bitmap indexes are very common in data warehousing environments.
Bitmap indexes are optimized index structures for set-oriented operations. Additionally, they are necessary for some optimized data access methods such as star transformations.
?
Materialized Views
Materialized views are query results that have been stored in advance so long-running calculations are not necessary when you actually execute your SQL statements.
From a physical design point of view, materialized views resemble tables
Dimensions
A dimension is a schema object that defines hierarchical relationships between columns or column sets.
Bitmap Indexes
Bitmap indexes are widely used in data warehousing environments.
The environments typically have large amounts of data and ad hoc queries, but a low level of concurrent DML transactions.
?
For such applications, bitmap indexing provides:
?
■ Reduced response time for large classes of ad hoc queries.
■ Reduced storage requirements compared to other indexing techniques.
■ Dramatic performance gains even on hardware with a relatively small number of CPUs or
a small amount of memory.
■ Efficient maintenance during parallel DML and loads.
·?????? Bitmap indexes are typically only a fraction of the size of the indexed data in the table.
?·?????? An index provides pointers to the rows in a table that contain a given key value.
领英推荐
?·?????? A regular index stores a list of rowids for each key corresponding to the rows with that key value.
?
·?????? In a bitmap index, a bitmap for each key value replaces a list of rowids.
?Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself is accessed. This improves response time, often dramatically.
?
As the bitmaps from bitmap indexes can be combined quickly, it is usually best to use single-column bitmap indexes.
?When creating bitmap indexes, you should use NOLOGGING and COMPUTE STATISTICS. In addition, you should keep in mind that bitmap indexes are usually easier to destroy and re-create than to maintain.
Benefits for Data Warehousing Applications
Bitmap indexes are primarily intended for data warehousing applications where users query the data rather than update it.
?
They are not suitable for OLTP applications with large numbers of concurrent transactions modifying the data.
Parallel query and parallel DML work with bitmap indexes
?
How to Determine Candidates for Using a Bitmap Index
Bitmap indexes should help when either the fact table is queried alone, and there are predicates on the indexed column, or when the fact table is joined with two or more dimension tables, and there are indexes on foreign key columns in the fact table, and predicates on dimension table columns.
?
A fact table column is a candidate for a bitmap index when the following conditions are met:
?■ There are 100 or more rows for each distinct value in the indexed column.
When this limit is met, the bitmap index will be much smaller than a regular index, and you will be able to create the index much faster than a regular index.
?
An example would be one million distinct values in a multi-billion row table.
And either of the following are true:
■ The indexed column will be restricted in queries (referenced in the WHERE clause).
or
■ The indexed column is a foreign key for a dimension table. In this case, such an
index will make star transformation more likely.
Dimensions
■ What are Dimensions?
■ Creating Dimensions
■ Viewing Dimensions
What are Dimensions?
A dimension is a structure that categorizes data in order to enable users to answer
business questions.
Commonly used dimensions are :
customers,
products,
time.
?
The fact information is stored in what is called a fact table, whereas the dimensional
information is stored in dimension tables.
?
In Oracle Database, the dimensional information itself is stored in a dimension table.
?
In addition, the database object dimension helps to organize and group dimensional
Information into hierarchies.
This represents natural 1:n relationships between columns or column groups (the levels of a hierarchy) that cannot be represented with constraint conditions.
?
Going up a level in the hierarchy is called rolling up the data and going down a level in the hierarchy is called drilling down the data.
Within the customer dimension,
?
customers roll up to city.
Then cities roll up to state.
Then states roll up to country.
Then countries roll up to subregion.
Finally, subregions roll up to region,
?
Data analysis typically starts at higher levels in the dimensional hierarchy and gradually drills down if the situation warrants such analysis.
?
Creating Dimensions
Before you can create a dimension object, the dimension tables must exist in the
database possibly containing the dimension data.
?
Example:
If you create a customer dimension, one or more tables must exist that contain the city, state, and
country information.
?
In a star schema data warehouse, these dimension tables already exist.
This customer dimension contains a single hierarchy with a geographical rollup, with arrows drawn from the child level to the parent level .
If levels of a hierarchy come from different tables, such a hierarchy is either a fully or partially normalized hierarchy. This section shows how to define a normalized hierarchy.
?
Suppose the tracking of a customer's location is done by city, state, and country.
?
This data is stored in the tables customers and countries. ?
The customer dimension customers_dim is partially normalized because the data entities cust_id and country_id are taken from different tables.
?
The clause JOIN KEY within the dimension definition specifies how to join together the levels in the hierarchy.
The dimension statement is partially shown in the following.
?
CREATE DIMENSION customers_dim
LEVEL customer IS (customers.cust_id)
LEVEL city IS (customers.cust_city)
LEVEL state IS (customers.cust_state_province)
LEVEL country IS (countries.country_id)
LEVEL subregion IS (countries.country_subregion)
LEVEL region IS (countries.country_region)
HIERARCHY geog_rollup (
customer CHILD OF
city CHILD OF
state CHILD OF
country CHILD OF
subregion CHILD OF
region
JOIN KEY (customers.country_id) REFERENCES country);
?
How to view the created dimension :
You then see the following output results:
EXECUTE DBMS_DIMENSION.DESCRIBE_DIMENSION('SH.CHANNELS_DIM');
?
-Result you get :
DIMENSION SH.CHANNELS_DIM
LEVEL CHANNEL IS SH.CHANNELS.CHANNEL_ID
LEVEL CHANNEL_CLASS IS SH.CHANNELS.CHANNEL_CLASS
HIERARCHY CHANNEL_ROLLUP (
CHANNEL CHILD OF
CHANNEL_CLASS)
ATTRIBUTE CHANNEL LEVEL CHANNEL DETERMINES
SH.CHANNELS.CHANNEL_DESC
ATTRIBUTE CHANNEL_CLASS LEVEL CHANNEL_CLASS DETERMINES
SH.CHANNELS.CHANNEL_CLASS
?
?
Overview of Extraction, Transformation, and Loading
■ Overview of ETL in Data Warehouses
■ ETL Tools for Data Warehouses Overview of ETL in Data Warehouses
You need to load your data warehouse regularly so that it can serve its purpose of facilitating business analysis.
?
To do this, data from one or more operational systems needs to be extracted and copied into the data warehouse.
?
The challenge in data warehouse environments is to integrate, rearrange and consolidate large volumes of data over many systems, thereby providing a new unified information base for
business intelligence.
?
The process of extracting data from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for extraction, transformation, and loading.
?Note that ETL refers to a broad process, and not three well-defined steps.
The acronym ETL is perhaps too simplistic, because it omits the transportation phase and implies that each of the other phases of the process is distinct. Nevertheless, the entire process is known as ETL.
?
ETL Basics in Data Warehousing
?What happens during the ETL process?
?The following tasks are the main actions in the process.
?
Extraction of Data
During extraction, the desired data is identified and extracted from many different
sources, including database systems and applications.
The size of the extracted data varies from hundreds of kilobytes up to gigabytes,
depending on the source system and the business situation.
The same is true for the time delta between two (logically) identical extractions: the time span may vary between days/hours and minutes to near real-time.
?
Transportation of Data
After data is extracted, it has to be physically transported to the target system or to an intermediate system for further processing.
?Depending on the chosen way of transportation, some transformations can be done during this process, too.
ETL Tools for Data Warehouses
We are going to use Oracle Data Integrator in (example), as an ETL tool that will provide:
1-?? extracting data from the XMLs of T24 core banking (example)
2-?? Transformation of data , columns , if required as per business requirements.
3-?? Loading the data in the target data warehouse database .
Oracle Data Integrator :
streamlines the high performance movement and transformation of data between heterogeneous systems in ?:
-batch,
-real time,
-synchronous,
-asynchronous modes;
?
It enhances user productivity with design approach and built-in connectivity to all major databases, data warehouse.
In our case , the source technology to be used is : XML files
The example environment above uses the following elements:
·???????? The Repository: The Repository contains all of the metadata required for the training examples. It is hosted, for these evaluation purposes, in a supplied database.
·???????? Orders Application: An application for tracking customer orders, hosted in a supplied database (the "srcdemo" sample database).
·???????? Parameters (File): Flat files (ASCII) issued from the production system containing a list of sales representatives and the segmentation of ages into age ranges.
·???????? Sales Administration: The administration or tracking of sales, hosted in another supplied database (the "trgdemo" sample database).
?
·???????? This data warehouse is populated with our transformations.
Ref. docs.oracle.com
Oracle Datawarehsousing Guide
Tech Entrepreneur & Visionary | CEO, Eoxys IT Solution | Co-Founder, OX hire -Hiring And Jobs
6 个月Tamer, thanks for sharing!