登录查看更多内容

NewSQL: is the code back in the DB?

Franck Pachot

Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master

发布日期: 2021年7月19日

TL;DR Distributed database can scale with the same elasticity as application servers and this allows us to colocate code and data again. Especially when the NewSQL database has the full power of SQL and PL/pgSQL (like YugabyteDB ??)

The idea of NewSQL databases is about scaling out a relational database, with the same elasticity as NoSQL databases (distributed to multiple active nodes), but without the burden of coding the transaction management and access path in the application. NewSQL databases are ACID, allow joins, foreign keys, strongly consistent secondary indexes... This started with Google Cloud Spanner, adding a few SQL features on top of a distributed storage. And because the need for it reached many applications and enterprises, Cockroach came with some more SQL features, protocol-compatibility with PostgreSQL, and running on any platform. This clearly exposed the need for relational features and SQL in enterprise applications. Of course, this is not new, but had been forgotten with NoSQL because, at that time, scaling out was not compatible with strong consistency.

Now, as the future is about SQL still being there with distributed databases, Yugabyte has built an open-source database for it where, on top of the ACID distributed document store (DocDB), the SQL layer has all the power of PostgreSQL. This goes beyond simple SQL DDL and DML. All modern SQL is there: Triggers, Procedures, User-Defined Functions, ready-to-use XML and JSON transformations....

But... in the past years, the trend was to put less processing in the database, use it only as a bit bucket to store data, and put all the processing in the application. So, you may wonder if all those SQL and Stored Procedures will be be used?

I'm writing this article after reading Gunnar Morling question on Twitter (https://twitter.com/gunnarmorling/status/1416499675683856387?s=20), whether the transformation from SELECT query results should be transformed to JSON on the database side or returned as relational result sets to be transformed by the application. As Lukas Eder mentioned in one answer, the overhead of doing it on the application side is additional roundtrips (multiple queries) and more data transfer (repeated column values). But, on the other side, the CPU usage of the database servers is always critical and people prefer to offload most of the processing to the application servers, which can run on multiple compute nodes. But... when you can scale out the database like the application servers... things changes, right?

领英推荐

The Latest in Distributed SQL - December

TiDB, powered by PingCAP 3 个月前

Why Postgres Stands Out Among Relational Databases

Adam Brown Sr. 4 个月前

Building Your First Database - Part 2

Shrey Batra 2 年前

I've heard a lot of "reasons" not to use stored procedures and to limit the database calls to simple DML without too much processing. Especially with Oracle. Let's look at the major motivations.

People thinking that "PL/SQL" is dead since Oracle supports Java stored procedures. This is completely wrong when you look at the new features in recent versions, like native compilation, continuous deployment, fine-grain dependency... Just rumors.
On the other side, there are less and less PL/SQL developers on the market, and juniors don't really want to go there. Here is an understandable reason I've heard: "if we do it in PL/SQL we are 3 in the team to be able to maintain it, but with Java or Python we have armies of developers available". This is a valid with Oracle but not in PostgreSQL where, thanks to extensions, you can code your procedures in any language.
People thinking that using "advanced" SQL, like JSON functions, will lock them with one database vendor and they cannot move to another. First, very few applications are directly moved to another database. And if they do, re-writing for a different syntax is not so difficult. There are tools for that, as long as the target databases is not too limited in SQL features. And, anyway, PostgreSQL is very popular popular, and follows strictly the SQL standard. You have a wide choice of commercial (like AWS Aurora), to fully open-source (like YugabyteDB) databases, offering a different storage engine, but the same API where you don't have to change your code.
The cost of compute. Of course, this makes sense with Oracle where you pay software licenses on the number of processors: you don't want to process there anything that can be processed on another node. However, people often forget that offloading to the application often requires more network round-trips between the application and the database, and many context switches, and then, finally, more CPU consumption on both sides. With open-source databases you don't have this per-processor license constraint but the compute instance for a database may still be more expensive than the ones for the application. However, what you do in one database call will always outperform the offloading which requires multiple round-trips.
Still with CPU, and RAM, many applications have to scale-out when the business grows. Of course, you can run a lot in one server, and I've always recommended some tuning before running into hardware scaling. And calling stored procedure is a way to optimize (see this example with pgbench). But when you reach the limit, you can't re-design the whole application. You just want to add a node. Maybe automatically for short peaks of load. This was a reason to design the application to run most the the processing on application servers. And this is still valid for monolith databases where you can write on one node only. Offloading some queries on reader nodes, or manual table sharding, may help, but this goes again to higher complexity in the application design.
Stored procedures was also a problem for application deployment. Development teams tried to minimize the impact of application releases on the database as this usually involved another team (DBA) and it was hard to get a continuous integration environment synchronized with the application. However, this has changed in the past years, DevOps, Liquibase/Flyway, Terraform, persistent volumes for databases, managed cloud services... today the database is just another component of the application, fully automated with Infrastructure as Code. The code can be deployed in the database in the same way as for the application.

With some of those reasons, and a few more, the use of stored procedures, and complex SQL functions, had been abandoned. Higher operation cost, because of the additional CPU and network round-trips, but this was accepted to run most of the code on scalable components. However, all this changes again with NewSQL databases. Because distributed databases can scale-out in exactly the same way as application servers. Just add a node (or have auto-scaling do it for you) and the data and the load is automatically balanced. Without any downtime, you can run on more nodes, multiple AZ, regions, cloud providers, for better availability and lower cost. You can even move to cheaper instances when available (like ARM processors). The database nodes are operated in the same way as the application ones, the stateful synchronization and persistent storage being managed automatically by the system.

With PostgreSQL you have all features to encapsulate the services into stored procedures, in the language of your choice. And the powerful SQL functions can do the heavy lifting and return all data in one JSON. And when this runs on a distributed database, this processing can scale without the risk to reach one node limits. Putting all those ideas together, you can understand why Yugabyte decided not only to be wire-compatible with PostgreSQL, but use its code for the query layer. Each node is running a stateless postgres, with stored procedures and powerful SQL functions, and your application can connect to any node (for load-balancing or geo-proximity) to scale-out this processing. And all of those nodes are also writer nodes, to persist data, and replicate to other nodes, thanks to the DocDB distributed engine. This is the "cloud-native" architecture of YugabyteDB: a stateless query layer with all PostgreSQL features and a distributed document store for the strong consistency and persistence.

The header picture of this article is taken from Bryn Llewellyn's slides about the YugabyteDB design. More information about the PostgreSQL compatibility in the doc: https://docs.yugabyte.com/latest/explore/ysql-language-features/

Aivars Kalvans

FinTech Developer | Software Architect | Consultant

3 年

Instead of putting the code back in the DB, I like the approach of ok.ru by embedding a distributed database into application code https://youtu.be/Fhu-qh5eGXc?t=1273

4 次回应

要查看或添加评论，请登录

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025年3月11日

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

Relational data modeling or document data modeling? With different terms, this question has existed for 50 years of…
2025: I'm joining MongoDB

2025年2月6日

2025: I'm joining MongoDB

I have 30 years of experience with SQL databases, including Oracle Database, Amazon RDS, PostgreSQL, and YugabyteDB. I…

109 条评论
Where is the database schema? #SQL #NoSQL

2025年1月31日

Where is the database schema? #SQL #NoSQL

Although SQL databases can evolve using DDL (Data Definition Language), they are recognized for rigid schemas. In…

8 条评论
SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

2024年11月11日

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

The long-standing debate between SQL and NoSQL used to be framed as a choice between structured and unstructured data…

1 条评论
No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

2024年11月4日

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

YugabyteDB is recognized for its resilience and scalability. The distributed storage was also designed to overcome…

7 条评论
SQL database replication: Logical or Physical?

2024年9月26日

SQL database replication: Logical or Physical?

Traditional SQL databases have incorporated replication after their initial design. This can be achieved by…

2 条评论
Starting with YugabyteDB or MongoDB?

2024年9月12日

Starting with YugabyteDB or MongoDB?

MongoDB has gained popularity among developers due to its user-friendly interface and flexible schema-less design…
CQRS != Read-Only Database Replicas

2024年9月6日

CQRS != Read-Only Database Replicas

Command Query Responsibility Segregation (CQRS) is an important design pattern in microservices architectures. It…

2 条评论
A not-so-good idea: Pipe Syntax In SQL

2024年8月26日

A not-so-good idea: Pipe Syntax In SQL

Many SQL users have expressed frustration with the SQL query syntax for SELECT. They argue that beginning with the FROM…

13 条评论
Separation of compute and storage for YugabyteDB

2024年7月29日

Separation of compute and storage for YugabyteDB

Separating computing instances and persistence service, also known as disaggregation of compute and storage, gives the…

3 条评论

See all articles

NewSQL: is the code back in the DB?

Franck Pachot

Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master

领英推荐

Franck Pachot的更多文章

社区洞察

其他会员也浏览了

SQL and No SQL in System Design part 13

Designing a Scalable Reporting Service Under Constraints

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

Friends don't let friends use MongoDB!

Federated Queries with Trino: Joining Data Across Multiple MySQL , PostgreSQL(Vice Versa) Hands on labs for Begineers

An Approach to Database Fine-Grained Access Controls

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

Kafka Connect

What is NoSQL? Databases for a cloud-scale future

Using DuckDB to Cache Query Results and Reduce Load on Your Operational Database

领英推荐

Franck Pachot的更多文章

Relational and Document Data Modeling (50 years ago, 25 years ago, and 2025)

2025: I'm joining MongoDB

Where is the database schema? #SQL #NoSQL

SQL Alone Isn’t Enough: Why Modern Applications Need More Than Just SQL

No Vacuum, No Bloat, No Downtime on Failover, No Lock Escalation, No Manual Sharding, No Delays in Cloning or Backup, No Outage for Database Upgrades

SQL database replication: Logical or Physical?

Starting with YugabyteDB or MongoDB?

CQRS != Read-Only Database Replicas

A not-so-good idea: Pipe Syntax In SQL

Separation of compute and storage for YugabyteDB

社区洞察

其他会员也浏览了

SQL and No SQL in System Design part 13

Designing a Scalable Reporting Service Under Constraints

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

Friends don't let friends use MongoDB!

Federated Queries with Trino: Joining Data Across Multiple MySQL , PostgreSQL(Vice Versa) Hands on labs for Begineers

An Approach to Database Fine-Grained Access Controls

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

Kafka Connect

What is NoSQL? Databases for a cloud-scale future

Using DuckDB to Cache Query Results and Reduce Load on Your Operational Database