登录查看更多内容

Data Replication Tools in Azure SQL Databases

Ananya Ghosh Chowdhury

Data and AI Architect at Microsoft | Public Speaker | Startup Advisor | Career Mentor | Harvard Business Review Advisory Council Member | Marquis Who's Who Listee | Founder @AIBoardroom

发布日期: 2022年12月29日

Data Replication includes data tracking, loading, streaming, and synchronizing functionalities.

Change Data Capture (CDC), SQL Data Sync, and change tracking are 3 distinct data replication tools used for tracking, loading, replicating, and streaming data changes in Azure SQL databases. Below is an explanation on the applicability of each one of these:

1.????Change Data Capture tracks and loads data changes on Azure SQL Databases:

CDC records insert, update, and delete activities that apply to a table. On SQL Server and Azure SQL Managed Instance, the SQL Server Agent runs the CDC scan and cleanup. Whereas, on Azure SQL Databases a scheduler runs capture and cleanup automatically within the database, without any external dependency for reliability or performance. Users still have an option to run capture and cleanup manually on demand.

Below steps highlight how CDC works on Azure SQL Databases:

a)??????To track Data Modification Language (DML) changes (insert, update and delete) enable CDC on the source database and tables.

b)?????Once CDC is enabled on source tables, associated change tables get created for each source table enabled.

c)??????The DML changes are reflected in the database transaction log. The CDC scan process picks up the committed changes from the log and adds them to the change table associated to the CDC-enabled source table. The commit Log Sequence Number (LSN) identifies changes that were committed in the same transaction and orders those transactions.

d)?????The cleanup process cleans the change tables on a retention-based policy set by the user. The default retention period is 3 days.

e)?????Query functions are provided to allow systematic access to the change data stored in the change tables.

The following illustration shows the principal data flow for CDC.

2.????SQL Data Sync to synchronize data on Azure SQL Databases:

SQL Data Sync is a service built on Azure SQL Database that synchronizes data bi-directionally across multiple databases, both on-premises (SQL Server) and in the cloud (Azure SQL Databases). It tracks changes using insert, update, and delete triggers and can be accessed through the Azure portal, PowerShell, or REST APIs. The changes are recorded on a side table in the user's database.

Data Sync is based around the concept of a?sync group. A sync group is a group of databases that one can synchronize using a hub and spoke topology. Within a sync group:

a)??????The hub database must be an Azure SQL Database.

b)?????The member databases can be databases in Azure SQL Database or in instances of SQL Server.

c)??????The Sync Metadata Database contains the metadata and log for Data Sync; it must be an Azure SQL Database located in the same region as the hub database.

SQL Data Sync is useful in scenarios such as Hybrid Data Synchronization, Distributed Applications or Globally Distributed applications where data must be updated across several databases in Azure SQL Database or SQL Server

Atul Kumar 8 个月前

Copy Tables from On-Premise SQL Server to Azure Data…

Akshay T. 8 个月前

CRUD in SQL: Unleashing the Power of Seamless Data…

Amr Saafan 10 个月前

3.????Change Tracking to track data changes on Azure SQL Databases:

Change tracking can be used as a foundation for both one-way and two-way synchronization applications.

a)??????One-way synchronization applications such as a client or mid-tier caching application, can be built to use change tracking. Here, a caching application requires data to be stored in the Database Engine and to be cached in other data stores. The application must be able to keep the cache up-to-date with any changes that have been made to the database tables and no changes are passed back to the Database Engine.

b)?????Two-way synchronization application built to use change tracking synchronizes data in an instance of the Database Engine with one or more data stores. The data in those stores can be updated and the changes must be synchronized back to the Database Engine.?

After change tracking is configured for a table, any DML statement affecting rows in the table will record change tracking information. To query the rows that have changed and to obtain information about the changes, use change-tracking functions. The value of the primary key column is the only information from the tracked table that is recorded with the change information. To obtain the latest data for changed rows, an application can use the primary key column values to join the source table with the tracked table.

How to choose the right Data Replication tool?

With a multitude of replication tools available and various use cases and business needs, it's often difficult to decide what tools to use in what specific scenarios. Below table highlights some of the common scenarios and the associated tools that could be used:

Some other considerations to have while selecting a data-replication tool are:

a)????Performance Impact

Assess the database tier size limitations before enabling replication tools since enabling change data capture (CDC) at the level of database and tables will lead to the creation of several additional system tables on the source database. Enabling replication tools also impacts log throughput. CDC's scan/cleanup are part of user workload. Performance impact can be substantial, because entire rows are added to the change tables and for updates operations pre-image is also included. This translates to a log generation that doubles (or more) when CDC is used (relative to when CDC is not enabled).

b)????Type of operation (Tracking / Loading / Streaming)

·??????CDC tracks source tables for changes and records these changes in change tables on the same source database. However, to replicate those changes to other targets other streaming tools must be integrated.

·??????SQL Data Sync automatically syncs source and target databases, either uni-directionally (SQL Server -> Azure SLQ DB) or bidirectionally (SQL Server -> Azure SQL DB -> SQL Server).

·??????Change tracking only tracks whether and which rows have been changed and the change type (insert, update, or delete).

?c)????Latency

Critical operations (like financial transactions) might have a lower appetite for latency. However, data changes for analytical workloads might have some room for latency.

Data Replication Tools in Azure SQL Databases

Ananya Ghosh Chowdhury

Data and AI Architect at Microsoft | Public Speaker | Startup Advisor | Career Mentor | Harvard Business Review Advisory Council Member | Marquis Who's Who Listee | Founder @AIBoardroom

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

The Future of SQL: Why It Remains Essential in Data Analytics

SQL, NoSQL & NewSQL

Legacy of SQL: A Pillar in Data Management and Analysis

SQL Server Big Data Clusters on Azure

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

Database Intermediate Series: Change Data Capture(I)

The Importance and Best Practices of SQL Language

100 SQL functions and statements with brief explanations and examples

Triggers in SQL database to extend its declarative data logic (but not business logic)

How to Enable SQL Insights (preview) to monitor your SQL deployments

领英推荐

Leveraging LLMLingua for Efficient Inference in Large Language Models

2024年10月12日

Exploring Phi-2: The Evolution of Small Language Models and their Impact on NLP Efficiency and Innovation

2024年2月5日

Transforming Retail and Consumer Goods with Generative AI: Real-World Use Cases

2024年2月1日

Demystifying the Transformer Architecture: A New Era in Natural Language Processing

2023年11月9日

Large Language Models - How are the OpenAI GPT models trained?

2023年4月19日

What is Generative AI?

2023年4月12日

From Coding to Decoding: A Consultant’s way to take informed business decisions

2019年1月26日