NewSQL: is the code back in the DB?
Franck Pachot
Developer Advocate at ?? MongoDB ??AWS Data Hero, ?? PostgreSQL & YugabyteDB, ??? Oracle Certified Master
TL;DR Distributed database can scale with the same elasticity as application servers and this allows us to colocate code and data again. Especially when the NewSQL database has the full power of SQL and PL/pgSQL (like YugabyteDB ??)
The idea of NewSQL databases is about scaling out a relational database, with the same elasticity as NoSQL databases (distributed to multiple active nodes), but without the burden of coding the transaction management and access path in the application. NewSQL databases are ACID, allow joins, foreign keys, strongly consistent secondary indexes... This started with Google Cloud Spanner, adding a few SQL features on top of a distributed storage. And because the need for it reached many applications and enterprises, Cockroach came with some more SQL features, protocol-compatibility with PostgreSQL, and running on any platform. This clearly exposed the need for relational features and SQL in enterprise applications. Of course, this is not new, but had been forgotten with NoSQL because, at that time, scaling out was not compatible with strong consistency.
Now, as the future is about SQL still being there with distributed databases, Yugabyte has built an open-source database for it where, on top of the ACID distributed document store (DocDB), the SQL layer has all the power of PostgreSQL. This goes beyond simple SQL DDL and DML. All modern SQL is there: Triggers, Procedures, User-Defined Functions, ready-to-use XML and JSON transformations....
But... in the past years, the trend was to put less processing in the database, use it only as a bit bucket to store data, and put all the processing in the application. So, you may wonder if all those SQL and Stored Procedures will be be used?
I'm writing this article after reading Gunnar Morling question on Twitter (https://twitter.com/gunnarmorling/status/1416499675683856387?s=20), whether the transformation from SELECT query results should be transformed to JSON on the database side or returned as relational result sets to be transformed by the application. As Lukas Eder mentioned in one answer, the overhead of doing it on the application side is additional roundtrips (multiple queries) and more data transfer (repeated column values). But, on the other side, the CPU usage of the database servers is always critical and people prefer to offload most of the processing to the application servers, which can run on multiple compute nodes. But... when you can scale out the database like the application servers... things changes, right?
领英推荐
I've heard a lot of "reasons" not to use stored procedures and to limit the database calls to simple DML without too much processing. Especially with Oracle. Let's look at the major motivations.
With some of those reasons, and a few more, the use of stored procedures, and complex SQL functions, had been abandoned. Higher operation cost, because of the additional CPU and network round-trips, but this was accepted to run most of the code on scalable components. However, all this changes again with NewSQL databases. Because distributed databases can scale-out in exactly the same way as application servers. Just add a node (or have auto-scaling do it for you) and the data and the load is automatically balanced. Without any downtime, you can run on more nodes, multiple AZ, regions, cloud providers, for better availability and lower cost. You can even move to cheaper instances when available (like ARM processors). The database nodes are operated in the same way as the application ones, the stateful synchronization and persistent storage being managed automatically by the system.
With PostgreSQL you have all features to encapsulate the services into stored procedures, in the language of your choice. And the powerful SQL functions can do the heavy lifting and return all data in one JSON. And when this runs on a distributed database, this processing can scale without the risk to reach one node limits. Putting all those ideas together, you can understand why Yugabyte decided not only to be wire-compatible with PostgreSQL, but use its code for the query layer. Each node is running a stateless postgres, with stored procedures and powerful SQL functions, and your application can connect to any node (for load-balancing or geo-proximity) to scale-out this processing. And all of those nodes are also writer nodes, to persist data, and replicate to other nodes, thanks to the DocDB distributed engine. This is the "cloud-native" architecture of YugabyteDB: a stateless query layer with all PostgreSQL features and a distributed document store for the strong consistency and persistence.
The header picture of this article is taken from Bryn Llewellyn's slides about the YugabyteDB design. More information about the PostgreSQL compatibility in the doc: https://docs.yugabyte.com/latest/explore/ysql-language-features/
FinTech Developer | Software Architect | Consultant
3 年Instead of putting the code back in the DB, I like the approach of ok.ru by embedding a distributed database into application code https://youtu.be/Fhu-qh5eGXc?t=1273