SQL is the past...AND the future!
Greg Turnquist
Senior Staff Technical Content Engineer at Cockroach Labs, Best-Selling Author, Conference Speaker, and YouTube Content Creator
You ever watch TRON? You know, the uber-tech world movie that all pro coders watch once a year?
(Okay, maybe you SHOULD watch it once a year!
Anyway, did you ever go back and watch that movie, after having learned a bunch of coding, and WATCHED the commands that Flynn and Alan are actually typing?
Whelp, here I am, talking to a relational database, and it feels a bit like I'm in TRON!
CockroachDB has a unique take on things. First of all, to be distributed, CockroachDB takes every table and chops it up into chunks called RANGES. These ranges are then replicated per cluster settings (3 replicas by default, but you can increase it.)
These replicas are then distributed across the clusters's node. You could be running 15 nodes (e.g. 5 in SF, 5 in NY, and 5 in London), but your replica count could be 3. That means that not every node has a copy of every replica.
Instead, CockroachDB spreads the load across the whole enterprise.
(Remember, a given range is a CHUNK of a table, not the entire database. Just one bite of a given table).
Now amidst a given set of replicas (remember we're talking 3), only 1 is has the role of LEASEHOLDER. The leaseholder is the one that coordinates all inserts and updates to a given range.
If you want to INSERT or DELETE or UPDATE, you connect to a node of the cluster, and that node figured out the leaseholder of the range you want to talk to, and routes your request to it automatically.
Your change to the table is submitted, and then the leaseholder starts talking to the other replicas to coordinate the change. Once it has MAJORITY QUORUM (50% + 1 agreement), it will respond and say the change is COMMITTED.
With a replica count of 3, that means it needs 2 to agree to the change. With the leaseholder being 1, it only needs confirmation from ONE of the other replicas, and its done.
If the replica count were 5, then it would need 3 to commit. That would be itself + 2 other replicas. See how that works?
It should come as NO SURPRISE that there are SQL operations you can do when talking to CockroachDB to peek at the ranges to get insight into whats happening!
For example, you can type this:
SHOW RANGES FROM TABLE book;
That command, which is about as close to typing in humanese as I can imagine, will literally show your the the various ranges that your book table is split into!
I have a set of test data imported used for training, with 1 MM rows of data (visualize Dr. Evil saying "1 mellion rows!"), and that command yields this:
start_key | end_key | range_id | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until
---------------+--------------+----------+----------+--------------------+-----------------+---------------------+------------------+-----------------------------
…/<TableMin> | …/1 | 70 | {1,3,5} | {"","",""} | {1,3,5} | {} | {} | NULL
…/1 | <after:/Max> | 69 | {1,2,3} | {"","",""} | {1,3,2} | {} | {} | 2024-11-08 16:47:36.858293
(2 rows)
领英推荐
Now maybe that is kind of jam-packed in X here, but it shows me that this table currently has 2 ranges. But you can see some cool stuff:
Typically, as an app developer, you don't care about this level of detail. However, we happen to be developing an Ops-based course, and Ops staff WILL want to know about this.
So we are showing an even MORE detailed command.
And how else would you write such a SQL command?
SHOW RANGES FROM TABLE book WITH DETAILS;
In plain English.
I'll go ahead and paste in the results, though they'll be hard to read, just to show that you can get a lot of info using nice-to-read commands:
start_key | end_key | range_id | range_size_mb | lease_holder | lease_holder_locality | replicas | replica_localities | voting_replicas | non_voting_replicas | learner_replicas | split_enforced_until | range_size | span_stats
---------------+--------------+----------+-----------------------+--------------+-----------------------+----------+--------------------+-----------------+---------------------+------------------+----------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
…/<TableMin> | …/1 | 70 | 0 | 1 | | {1,3,5} | {"","",""} | {1,3,5} | {} | {} | NULL | 0 | {"approximate_disk_bytes": 0, "intent_bytes": 0, "intent_count": 0, "key_bytes": 0, "key_count": 0, "live_bytes": 0, "live_count": 0, "sys_bytes": 503, "sys_count": 6, "val_bytes": 0, "val_count": 0}
…/1 | <after:/Max> | 69 | 105.13704000000000000 | 2 | | {1,2,3} | {"","",""} | {1,3,2} | {} | {} | 2024-11-08 16:47:36.858293 | 105137040 | {"approximate_disk_bytes": 195562383, "intent_bytes": 0, "intent_count": 0, "key_bytes": 35054422, "key_count": 1000000, "live_bytes": 105137040, "live_count": 1000000, "sys_bytes": 502, "sys_count": 7, "val_bytes": 70082618, "val_count": 1000000}
(2 rows)
This command shows that the first range has approximate_disk_bytes of 0 while the second range has 195562383, hence all the data is in the second range.
I don't fully grok the internals of CockroachDB, still being the new guy, but this strongly implies that the first range is more of a bookkeeping thing. (And that's where part of my job is to followup and ping our SMEs to check that and ensure we have the right notes in our content to help our students!)
The point of all this fun, is that relational database engines have historically used what some deem to be bulky, long-winded commands to look into internals FROM WITHIN THE SAME SQL shell that developers use for ages.
We may scoff at things like TRON where they have the same lengthy commands, and not have all the curly braces and ::: and === and <=> operators...
...but SQL REALLY IS the language where people speak in a somewhat human language, telling the system what they want to know in a declarative fashion...
...and the system goes and gets what you want!
THIS is the reason that SQL has persisted for decades.
THIS is the reason that SQL is beloved by developers, program managers, business analysts, and even people that have retired from their professions but decided to launch a non-profit and need to throw together an MS Access database to manage their new enterprise (I actually know someone like that!)
THIS is the reason that SQL has transcended all other forms of data management and is nicely secured in the #1 slot. Despite ORMs and other solutions.
You can always go back to SQL and make solid progress.
Spring Developer Advocate at Broadcom
3 个月I agree!