登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

SQL database replication: Logical or Physical?

Franck Pachot

Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master

发布日期: 2024年9月26日

Traditional SQL databases have incorporated replication after their initial design. This can be achieved by implementing replication on top of crash-recovery write-ahead logging (WAL) or capturing the changes within the SQL processing. The changes can be applied to the replica in two ways: by writing the same data into the database storage, known as physical replication, or by generating an SQL statement to apply to the replica, known as logical replication. More recently designed databases, often referred to as cloud-native, have built-in replication that combines the advantages of both logical and physical replication. Let's explore all these alternatives.

Physical replication from WAL

Traditional SQL databases are typically monolithic. They write to one shared buffer pool in memory, which is then flushed (checkpoint) to disk asynchronously. This approach helps to avoid slow random writes on HDD and SSD. To protect against potential memory loss in case of server failure or instance crash, all writes are safeguarded by Write-Ahead Logging (WAL), described in 1992 in the?ARIES?paper. WAL is also called an online redo log, journal, or transactional log and is a single thread written sequentially.

To reduce the Recovery Point Objective (RPO), the WAL is archived to roll forward transactions during recovery from backups. To minimize the Recovery Time Objective (RTO), this recovery process is continuously run on a standby database, ready to be activated for Disaster Recovery. When high availability (HA) is required to minimize data loss and automate failover, the WAL is streamed to the standby through the network, written to files, and synchronously committed to ensure the durability of committed transactions.

This is called physical replication because it synchronized an exact binary copy of the database:

Physical Replication From Physical WAL (example: Oracle Data Guard)

Physical replication has its advantages and disadvantages. On the positive side, it results in low overhead on the primary database since it utilizes the existing Write-Ahead Log (WAL). Additionally, there is minimal overhead on the standby database as it writes directly to the buffers, bypassing the query layer. Indexes are replicated along with the tables as bytes in datafiles.

On the downside, there is limited flexibility in using the standby database beyond its activation in case of a disaster recovery scenario. At best, it can be opened in Read Only mode to allow consistent queries with minimal synchronization with the standby. However, there are very few optimization options for this read-only workload. This is because, being a physical replica, it runs the exact same version of the database software and has the same physical and logical model. Such a replica cannot be used to reduce downtime for upgrades or to design different indexes for the read workload.

Logical replication from SQL

To provide the best flexibility, replicating between different platforms, versions, and data models, the replication can be logical, at higher level. Thanks to data independence, it would be possible to run the exact same SQL statements on a replica. However, this is not as easy as it looks like because reads and writes must be intermixed in SQL. A simple example is a sequence, or a generated always as identity column. Running the same INSERT on two databases may set a different number. Even if both are correct, the replicas differ and the next statements will not have the same effect. Even when not using those non-deterministic functions of SQL, the ordering of the SQL statement effects is dependent on the concurrent transactions which may conflict or lock the same rows, and this will be different on a replica.

In practical terms, logical replication from SQL involves generating a SQL statement to be applied on the replica. This statement is created at a lower level of SQL processing based on the final writes of table rows. Changes to the rows are captured using triggers or similar mechanisms, which then generate a set of simpler SQL DML statements. Oracle previously had Advanced Replication, but it has been deprecated. MySQL, on the other hand, provides binlog replication. The same can be done on other databases with triggers.

Logical Replication From SQL (example: MySQL binlog)

Logical replication has a significant advantage in its flexibility on the replica side. It can apply SQL statements independent of the database's version or the physical data model. This means it can replicate only a subset of tables, rows, or columns and be set up as a two-way, multi-master system with custom conflict resolution. However, one drawback is the resources used on the replica side, as the SQL must be processed and indexes maintained. This is an unavoidable effect of the flexibility provided by logical replication.

The main disadvantage of this solution is on the capture side: triggers or equivalent impact the primary workload.

Logical replication from WAL

A third option combines the advantages of the two previous solutions by capturing the changes at the physical level and applying them at a logical level. The general idea is to mine the WAL and, combined with some additional metadata, reverse-engineer it to produce an SQL DML statement with the same effect on data. The overhead on the primary database is minimal, with supplemental logging to add information unavailable in the physical change vector. The flexibility on the replica is maximized as the SQL queries insert, update, or delete the corresponding rows using the SQL layer. What is more complex is the extraction to capture the logical changes from the physical log.

Logical Replication From Physical WAL (Example: Golden Gate)

This solution is specific to each database, and there may be some limitations about what data types are supported. Another limitation of the capture process is linked to the monolithic nature of the WAL stream. Log mining is rarely multi-threaded as it is sequential, and the transaction order must be preserved. Another thing to consider is that the initialization must follow a different process. Before being able to apply the captured changes, the tables must be copied logically, like with a dump. This is slow and increases the window for logical replication to resolve the gap. Another possibility is to start with a physical replica and convert it to a logical one.

Even though this architecture looks like a workaround, operating between different layers, it is the most used in traditional databases when more flexibility than physical replication is desired and there is no change data capture (CDC) built-in at higher level.

Built-in replication of table and index changes

The options mentioned above do not consider the most straightforward solution: creating replicas above the physical layer to allow for greater flexibility. These replicas would not be exact binary copies of each other but would exist below the SQL layer to reduce the overhead on the capture and application. Between SQL and blocks, there are table rows and index entries, all containing values that would provide a replica that is logically equivalent to the primary but allows for a different physical organization.

Built-in Replication in Cloud-Native Databases (example: YugabyteDB)

Traditional databases such as Oracle and PostgreSQL are monolithic and do not have a clear separation between the SQL processing layer and the transactional storage. In these databases, a single session process handles SQL parsing, execution plan building, and directly reading/writing into table and index blocks. The table rows are stored in heap tables without a logical identifier. For instance, Oracle uses ROWID and PostgreSQL uses CTID to represent physical addresses within a block and offset within a file.

YugabyteDB addresses this issue by separating the SQL processing layer, which uses PostgreSQL code for compatibility, from the transactional storage layer, sharding on top of RocksDB. The SQL processing layer generates read-and-write operations as key-value changes, while the transactional storage layer applies these changes to the LSM Tree, which stores table rows and secondary index entries.

This separation enables scalability: the SQL processing layer can scale horizontally because it is stateless, and the transactional storage layer can also scale horizontally because it is partitioned (storage sharding). The write API between them is a log of timestamped key-value changes distributed and replicated with Raft consensus. This log is applied to each replica, strongly consistent (linearizable), and is also used as the Write-Ahead Log (WAL) to protect the first level of the LSM Tree before being flushed to SST files. Below this key-value API, each replica physically applies the changes. They can run different versions of the database software, allowing for online rolling upgrades. Each replica runs the Multi-Version Concurrency Control (MVCC) garbage collection locally. This is embedded in the SST File compaction and does not impact replication.

From the storage point of view, YugabyteDB replication is similar to logical replication, where it relies on the schema to identify the change record (primary key, indexed columns). From the SQL point of view, it looks like physical replication as it resolves sequence, identity columns, transaction ordering, and index maintenance. This is integrated with Raft, LSM Tree, and MVCC.

Which one is better?

The best solution is one that matches your database. Each database vendor has implemented what works best for their engines.

Traditional databases have been designed as monoliths. They make changes by modifying buffers that match the disk blocks and rely on physical write-head logging (WAL). This approach makes sense rather than adding hooks to capture changes at higher levels. The WAL is directly applied for physical replication, using the same code used for recovery, or it is mined to extract the changes and apply them using the existing SQL layer. Traditional databases leverage proven technologies and prefer to add replication on top of them. Mostly deployed on premises with dedicated hardware, their users do not expect horizontal scalability and resilience to failure.

On the other hand, new SQL database engines built for the cloud have been designed with replication in mind. They write to the network rather than to local disks. The log is at the core of distributed databases because a log of buffered write operations is the best fit for the network, can be replicated with strong consistency using a consensus algorithm like Raft, and is a perfect fit for LSM Tree, which is more efficient than B Tree for modern storage like SSD. For distributed SQL databases, the replication is built-in at a level that combines the advantages of traditional logical and physical replication.

Atul Joshi

Chief Consultant - Technology at Genius Computing Services

5 个月

Insightful

Steve R.

30K 1st level connections | Servant Leader | Cloud DBA/DBE/Developer | #ladataplatform #sqlsatla #sqlsatsv #sqlsatoc #sqlsatsd

5 个月

LA Data Platform UG

查看更多评论

要查看或添加评论，请登录

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025年3月11日

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

Relational data modeling or document data modeling? With different terms, this question has existed for 50 years of…
2025: I'm joining MongoDB

2025年2月6日

2025: I'm joining MongoDB

I have 30 years of experience with SQL databases, including Oracle Database, Amazon RDS, PostgreSQL, and YugabyteDB. I…

109 条评论
Where is the database schema? #SQL #NoSQL

2025年1月31日

Where is the database schema? #SQL #NoSQL

Although SQL databases can evolve using DDL (Data Definition Language), they are recognized for rigid schemas. In…

8 条评论
SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

2024年11月11日

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

The long-standing debate between SQL and NoSQL used to be framed as a choice between structured and unstructured data…

1 条评论
No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

2024年11月4日

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

YugabyteDB is recognized for its resilience and scalability. The distributed storage was also designed to overcome…

7 条评论
Starting with YugabyteDB or MongoDB?

2024年9月12日

Starting with YugabyteDB or MongoDB?

MongoDB has gained popularity among developers due to its user-friendly interface and flexible schema-less design…
CQRS != Read-Only Database Replicas

2024年9月6日

CQRS != Read-Only Database Replicas

Command Query Responsibility Segregation (CQRS) is an important design pattern in microservices architectures. It…

2 条评论
A not-so-good idea: Pipe Syntax In SQL

2024年8月26日

A not-so-good idea: Pipe Syntax In SQL

Many SQL users have expressed frustration with the SQL query syntax for SELECT. They argue that beginning with the FROM…

13 条评论
Separation of compute and storage for YugabyteDB

2024年7月29日

Separation of compute and storage for YugabyteDB

Separating computing instances and persistence service, also known as disaggregation of compute and storage, gives the…

3 条评论
RocksDB, an excellent choice for modern SQL Databases (LSM Tree vs. B-Tree)

2024年7月10日

RocksDB, an excellent choice for modern SQL Databases (LSM Tree vs. B-Tree)

RocksDB is a high-performance embedded data store that powers many modern databases. It is highly customizable and an…

See all articles

社区洞察

Database Administration

What tools are essential for RDBMS backup and recovery?

Physical replication from WAL

Logical replication from SQL

Logical replication from WAL

Built-in replication of table and index changes

Which one is better?

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025: I'm joining MongoDB

Where is the database schema? #SQL #NoSQL

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

Starting with YugabyteDB or MongoDB?

CQRS != Read-Only Database Replicas

A not-so-good idea: Pipe Syntax In SQL

Separation of compute and storage for YugabyteDB

RocksDB, an excellent choice for modern SQL Databases (LSM Tree vs. B-Tree)

社区洞察