DP-900: Microsoft Azure Data Fundamentals - Relational Data on Azure
Varun Patel
Data Engineer || Big Data || PySpark || 2xMicrosoft Azure || AWS || Databricks || Kafka || ETL || Data Science || Mentor @Codebasics
Hello Everyone, here is the second post in my series on DP-900 certification. In this article, we will examine relational databases in detail, along with their fundamental ideas and importance in data management. We'll also go into great depth on the range of Azure services created especially to make relational database setup, maintenance, and scalability easier. You will have a comprehensive grasp of how Azure enables companies to use relational databases within their cloud environment by the conclusion of this article.
Certification Content Breakdown
What is Relational Database and Why do we need it?
Data is arranged into tables in relational databases so that they can be conveniently maintained and retrieved. Every table is made up of rows and columns, where the rows stand for records and the columns for those records' properties. Structured Query Language (SQL) is used by relational databases for data maintenance and querying. Relational databases are necessary because of their capacity to allow numerous users' concurrent access, guarantee data integrity, and perform complicated queries with efficiency. Because of their obvious organization, they offer an organized method for storing and retrieving vast quantities of data and make updating and searching simple. Because of this, they are perfect for a wide range of uses, such as conducting financial transactions in corporate systems and maintaining user information in online applications.
We represent sets of real-world entities as tables in a relational database. Anything for which we wish to keep records might be considered an entity; significant items and events are common examples. You may make tables for customers, goods, orders, and line items inside an order, for instance, in the example of a retail system. Every row in a table corresponds to a single instance of an entity. Every row in the customer table in the retail scenario contains information about a single customer; every row in the product table describes a single product; every row in the order table shows a customer's order; and every row in the line item table shows a product that was part of an order.
What are ACID Properties?
The foundational ideas of ACID characteristics in DBMS guarantee consistency and transactional dependability in databases. The letters ACID stand for Isolation, Atomicity, Consistency, and Durability.
Normalization in Relational Databases
Normalization is a fundamental database design strategy aimed at efficiently organizing data. It involves breaking down large tables into smaller ones to reduce redundancy and ensure all information is stored in a single location. This minimizes the risk of inconsistencies, thereby maintaining data integrity. Additionally, normalization enhances the flexibility and scalability of database management, making it easier to maintain and expand the database structure as needed.
Let's Understand the practical concept using a simplified schema for a hypothetical online retail store.
Customer Table
CustomerID (PK) | FirstName | LastName | Email | Phone
---------------------------------------------------------------------------------
1 | John | Doe | [email protected] | 123-456-7890
2 | Jane | Smith | [email protected] | 987-654-3210
Order Table
OrderID (PK) | CustomerID (FK) | OrderDate | TotalAmount
-----------------------------------------------------------
101 | 1 | 2024-07-21 | 150.00
102 | 2 | 2024-07-21 | 200.00
2. Separate each discrete attribute into its own column:
Each attribute such as CustomerID, FirstName, LastName, Email, Phone in the Customer table, and OrderID, CustomerID, OrderDate, and TotalAmount in the Order table is represented by its own column. This ensures that data is stored atomically and avoids mixing unrelated data.
3. Uniquely identify each entity instance (row) using a primary key:
4. Use foreign key columns to link related entities:
The CustomerID column in the Order table serves as a foreign key (FK) referencing the CustomerID column in the Customer table. This establishes a relationship between orders and customers, indicating which customer placed each order.
Structured Query Language
Relational databases are managed and altered using a standardized computer language called SQL, or Structured Query Language. It offers an extensive collection of commands for managing transactions (TCL), controlling access (DCL), creating database structure (DDL), altering data (DML), and querying information (DQL). With SQL, users may construct tables and relationships, establish databases, insert, edit, and remove data, retrieve information using robust querying features, and manage transactions to guarantee data integrity. It is a vital tool for developers, database administrators, and analysts to effectively interact with and manage relational databases across different applications and industries because it is widely supported by various database management systems (DBMS) like MySQL, PostgreSQL, Oracle, and SQL Server.
1. Data Definition Language (DDL)
- CREATE TABLE: Defines a new table.
- ALTER TABLE: Modifies an existing table structure.
- DROP TABLE: Deletes a table.
- CREATE INDEX: Creates an index on a table column for faster data retrieval.
- DROP INDEX: Removes an index from a table.
2. Data Manipulation Language (DML)
- INSERT: Adds new records into a table.
- UPDATE: Modifies existing records in a table.
- DELETE: Removes records from a table.
- MERGE: Combines insert, update, and delete operations into a single statement (in some databases).
- SELECT: Retrieves data from one or more tables.
3. Data Query Language (DQL)
- SELECT: Retrieves data from one or more tables based on specified criteria.
- FROM: Specifies the tables from which to retrieve data.
- WHERE: Filters data based on specified conditions.
- GROUP BY: Groups rows that have the same values into summary rows.
- HAVING: Filters groups based on specified conditions.
4. Data Control Language (DCL)
- GRANT: Provides specific privileges to users or roles.
- REVOKE: Removes specific privileges from users or roles.
- DENY: Explicitly denies permissions to users or roles (in some databases).
5. Transaction Control Language (TCL)
- COMMIT: Saves transactions to the database.
- ROLLBACK: Reverts transactions to a previous state.
领英推荐
- SAVEPOINT: Sets a point within a transaction to which you can later roll back.
What are Views, Indexes, and Stored Procedures?
1. Views
A view is a hypothetical table created from SELECT query results. Views can be thought of as windows on particular rows in one or more underlying tables. To provide a single object that makes it simple to identify delivery addresses for orders, you may, for instance, establish a view on the Order and Customer databases that pulls order and customer data:
CREATE VIEW Deliveries
AS
SELECT o.OrderNo, o.OrderDate,
c.FirstName, c.LastName, c.Address, c.City
FROM Order AS o JOIN Customer AS c
ON o.Customer = c.ID;
2. Indexes
In order to find data in a table, you can use an index. Consider an index above a table to be similar to the index at a book's back. A sorted list of references along with the pages where each reference appears can be found in a book index. The index is where you go to look up references to specific items in the book. To navigate straight to the appropriate pages in the book, utilise the page numbers in the index. You might have to go through the entire book without an index in order to locate the references you need.
In a database, an index is created by selecting a column from the table; the index then holds a copy of the data, arranged according to the table's row references. The database management system can utilise this index to collect data faster than it would if it had to go through the entire table row by row when the user performs a query that includes this column in the WHERE clause.
To build an index on the Name column of the Product database, for instance, you could use the code below:
CREATE INDEX idx_ProductName
ON Product(Name);
The index creates a tree-based structure that the database system's query optimizer can use to quickly find rows in the Product table based on a specified Name.
3. Stored Procedure
A stored procedure defines SQL statements that can be run on command. Stored procedures are used to encapsulate programmatic logic in a database for actions that applications need to perform when working with data.
You can define a stored procedure with parameters to create a flexible solution for common actions that might need to be applied to data based on a specific key or criteria. For example, the following stored procedure could be defined to change the name of a product based on the specified product ID.
CREATE PROCEDURE RenameProduct
@ProductID INT,
@NewName VARCHAR(20)
AS
UPDATE Product
SET Name = @NewName
WHERE ID = @ProductID;
Relational Database Services in Azure
Leading relational database management systems including SQL Server, PostgreSQL, and MySQL are supported in the cloud by a range of database services provided by Azure. Because these services are completely handled, users can concentrate on other important duties because there is no longer a need for manual database administration. Azure's built-in high availability guarantees enterprise-grade performance, allowing you rapid growth and worldwide distribution without the worry of outages. Azure environments provide developers with enhanced productivity and dependability due to advanced capabilities like automated security monitoring, threat detection, and performance tweaking, as well as assured availability.
1. Azure SQL
Azure SQL is a collective term for a family of Microsoft SQL Server-based database services in Azure. Specific Azure SQL services include:
You may use complete versions of SQL Server in the cloud using SQL Server on Virtual Machines, which eliminates the need to manage any hardware on-premises. An illustration of the IaaS methodology is this.
A database operating on actual on-premises hardware is precisely replicated by an SQL Server running on an Azure virtual machine. Transferring databases from one on-premises server to another and migrating from an on-premises system to an Azure virtual machine are equivalent processes.
This method works well for apps that need access to operating system functionalities but may not be supported at the PaaS level, as well as migrations. For current applications that need to be quickly and with little modification migrated to the cloud, SQL virtual machines are ready for lift and shift. Additionally, SQL Server on Azure VMs can be used to expand existing on-premises applications to the cloud in hybrid deployments.
It is a platform-as-a-service (PaaS) choice that abstracts the underlying hardware and operating system and offers near-100% compatibility with on-premises SQL Server instances. The service lessens the administrative load of maintaining a database server instance by automating backups, software update management, and other maintenance chores.
In essence, Azure SQL Managed instance operates a fully managed SQL Server instance in the cloud. On a single instance, you can install more than one database. Just like with an on-premises server, you have total control over this instance. Although SQL Managed Instance automates typical operations like software patching, backups, and database monitoring, you still maintain complete control over the security and resource allocation of your databases.
Azure SQL Database is a PaaS offering from Microsoft. You create a managed database server in the cloud, and then deploy your databases on this server. Azure SQL Database is available as a Single Database or an Elastic Pool.
- Single Database
With this option, you can easily install and manage a single SQL Server database. You establish and manage a cloud-based database server, and utilize this server to get data from your database. You only need to construct your tables, configure the database, and add data to them because Microsoft will take care of server management. If you require additional memory, processing power, or storage space, you can grow the database. Resources are pre-allocated by default, and you are billed hourly for the resources you have requested. Additionally, a serverless configuration is specified. Microsoft builds its own server under this setup, which may share databases with other Azure subscribers. Microsoft protects your database's privacy.
- Elastic Pool
This option is comparable to a single database, with the exception that many databases can share the same resources by default, including memory, computing power, and data storage space. This is achieved by multiple tenancy. We refer to the resources as a pool. The pool is yours to make and can only be accessed by your databases. This model can help you cut costs and is helpful if you have databases with changing resource requirements. When processing payroll at the end of each month, for instance, your payroll database may need a lot of CPU power; but, during other periods, the database may become much less active. You may have an additional database that you utilize for reporting.
2. Azure Database for MySQL
Azure Database for MySQL is a managed MySQL database service offered by Azure. Pay-as-you-go pricing allows it to scale to meet your demands while guaranteeing excellent availability at no additional expense. Data protection features include point-in-time restore capabilities and automatic backups.
The service supports SSL connections and uses firewall rules to maintain connection security. Lock modes, connection limitations, and timeouts are just a few of the settings that may be customized using different server parameters. It offers a widely dispersed database system that can manage massive deployments without necessitating the maintenance of network, hardware, or software changes.
Important security and administrative responsibilities are handled by Azure Database for MySQL; however, certain features associated with these domains are not adjustable by the user, guaranteeing an efficient administration process by Azure.
3. Azure Database for MariaDB
Azure Database for MariaDB is a managed service providing the MariaDB database management system within Azure. Azure minimizes administrative costs by fully managing and overseeing the database once it is created.
The following are some advantages of using Azure Database for MariaDB:
- Built-in high availability: Guarantees constant database availability without incurring additional costs.
- Predictable performance: Pay-as-you-go pricing is used to ensure dependable and consistent performance.
- Scalability: Easily expands to meet needs for a higher workload in a matter of seconds.
- Automatic backups and point-in-time restore: Keeps backups for a maximum of 35 days, which makes data recovery easier.
- Enterprise-grade security and compliance: Compliance with all legal and security regulations.
4. Azure Database for PostgreSQL
You can run a PaaS implementation of PostgreSQL in the Azure Cloud by selecting Azure Database for PostgreSQL. Similar to the MySQL service, this one offers similar features in terms of administrative benefits, security, scalability, and availability.
Several aspects of on-premises Azure Database for PostgreSQL do not offer PostgreSQL databases. The majority of these features deal with the additions that users can make to a database to carry out certain functions, including creating stored procedures in different programming languages (apart from the standard pgsql) and integrating with the operating system directly. The list of available extensions is constantly being updated, and a core set of the most popular extensions is supported.