Stored and Materialized Views in Databricks
Naveen Pn ??
?? Corporate Trainer | SME - Data Engineering on (AWS, Azure, GCP), Apache Spark/PySpark, Databricks, Microservices, Kubernetes, AIML, GenAI, MERN Stack, Java Backend, DevOps
What Are Stored Views and Materialized Views?
Stored Views
A stored view is a virtual table that does not store data. Instead, it saves an SQL query, which dynamically fetches data from underlying tables or other views every time it is queried.
Materialized Views
A materialized view stores the results of a query physically, allowing faster access to precomputed data for repeated queries.
Example: Loading Sample Data
Step 1: Load the Underlying Table (sales)
Use the sales table as the base for stored and materialized views.
SQL to Create and Load the Table:
CREATE TABLE sales (
sale_id INT,
region STRING,
category STRING,
sale_date DATE,
profit INT
);
INSERT INTO sales VALUES
(1, 'North', 'Electronics', '2024-01-01', 100),
(2, 'South', 'Furniture', '2024-01-02', 200),
(3, 'North', 'Electronics', '2024-01-03', 150),
(4, 'East', 'Furniture', '2024-01-04', 300),
(5, 'South', 'Electronics', '2024-01-05', 250),
(6, 'West', 'Furniture', '2024-01-06', 400);
Creating and Managing Stored Views
Step 1: Create a Stored View (sales_summary)
This view dynamically calculates the total profit for each region.
CREATE VIEW sales_summary AS
SELECT
region,
SUM(profit) AS total_profit
FROM
sales
GROUP BY
region;
Step 2: Query the View
Fetch the aggregated data dynamically:
领英推荐
SELECT * FROM sales_summary WHERE total_profit > 300;
Step 3: Update the View
Modify the view to include the category column.
CREATE OR REPLACE VIEW sales_summary AS
SELECT
region,
category,
SUM(profit) AS total_profit
FROM
sales
GROUP BY
region, category;
Step 4: Drop the View
Remove the view if it is no longer needed:
DROP VIEW sales_summary;
Creating and Managing Materialized Views
Step 1: Create a Materialized View
Use a Delta table to store the materialized view (sales_summary_materialized):
CREATE TABLE sales_summary_materialized
AS SELECT
region,
SUM(profit) AS total_profit
FROM
sales
GROUP BY
region;
Step 2: Refresh the Materialized View
Manually refresh the view to include the latest data:
INSERT INTO sales_summary_materialized
SELECT
region,
SUM(profit) AS total_profit
FROM
sales
WHERE
sale_date > (SELECT MAX(last_refresh_date) FROM sales_summary_materialized)
GROUP BY
region;
Step 3: Query the Materialized View
Access the precomputed data:
SELECT * FROM sales_summary_materialized WHERE total_profit > 300;
Key Benefits of Views
Associate Manager at EY GDS | Azure Data Engineer | ETL solution architect
3 个月Very helpful