Got a new MVP web project coming up? What you really need is a semi-relational semi-normalized semi-structured document-store database!
In almost every web application for the past 3 decades or more, developers have gone through the following (or similar) thought process:
After answering these questions, the next predictable step for most developers will be to build some sort of CRUD scaffolding on top of the ORM within their application development framework.
Inevitably as their app starts to rapidly grow, so do the number of schema changes. As fields get added and dropped, more data needs to be migrated, more seeds need to be created, more CRUD endpoints need to be updated, etc, etc.
This is all too much. More often than not, we want to be able to launch a minimal viable product fast and validate/fail just as fast. We should probably not be spending more than 2 minutes of thought on what, how, and where we will store our data in 2024. Not because we don’t care but because it’s been way too many decades and this crap should be a lot easier and more straight forward by now.
?? Tangent Warning
And no, I will not entertain your ‘cloud-based-serverless-no-code-ai-driven-buzztastic’ database startup (hint: vendor lock-in and de-platforming risks). We need a quick, repeatable, framework independent, and open-source way of connecting an app/web app to a database without all of the bullshit. We want to be able to quickly iterate our apps and make schema changes across team members without clunky migrations and for the love of god.. please… we do not want to use ORMs. Dammit! ??
?? End Tangent
So what I will try to do in today’s post is demonstrate that for most small(ish) sized apps, we can make our lives significantly easier by simply using an unstructured+relational paradigm on top of SQLite. And, to stick with the trend-du-jour, I shall also point out that obviously there is nothing more serverless than SQLite ??.
I’ve said too much…more on SQLite later.
Why not MongoDB and Document Stores
At first you might read my introduction and say “Well, you just described a document store”. This is partially true. Document Stores like MongoDB definitely provide the flexibility of not needing to do schema migrations on top of unstructured data however Mongo and similar projects often lack features like joins, create lots of duplication, and tend to consume large amounts of ram. Many of them (Mongo included) do not natively support SQL and they require running a server, cluster, or cloud offering.
What do we actually want
By ‘we’ I’m referring to my personal experience of building countless web applications over the last 20+ years with very diverse teams and methodologies. Your requirements may differ slightly but hopefully these ‘wants’ are general enough for most web-app use cases:
So, at a high level we want:
So obviously we want a semi-relational semi-normalized semi-structured document-store database ????
In all seriousness though, many databases on the market can achieve most of these requirements but few can check all the boxes. There is however one O.G. database that has secretly and silently become ubiquitous in our lives and is a core component in technologies like Android, Google Chrome, Firefox, Airbus airplanes, etc that has the ability to check all of the boxes: SQLite.
The Core Architecture Decisions
So how will we construct this magical developer friendly database experience? Let’s begin by defining a few core decisions that we want to stick with:
Decision 1 - As mentioned, we will use SQLite to accomodate a modern application architecture.
Decision 2 - ORMs are lame, so we won’t use ORMs. Here’s why:
Decision 3 - Schema Migrations Suck(most of the times) so we want something better:
Decision 4 - Entity Relationships should be allowed to exist but not at the database level.
Decision 5 - Keep it Simple
Disclaimer - One size does not fit all.
Building a Document Store Like Experience on Top of SQLite
As it turns out that we actually need to do very little to accomplish our goals with SQLite! In fact, to achieve our goals of being semi-relational, semi-structured, semi-normalized we start by adopting an extremely basic database schema, and from there on out we rely on the built SQL functions offered by SQLite.
Our simple, no-fuss schema:
From a database schema perspective, we need to create a table like we would in any relational model with one exception - our tables will only need two fields: ID and data
In other words, each table is simply a set of rows that:
Here’s a visual representation of what a database using our new sql lite powered document store methodology might look like:
Pretty simple right? What we’re mimicking here is the noSQL concept of ‘collections’ versus tables but don’t worry, we’re not going to lose too much relational functionality by doing this!
Let’s see how this plays out!
An overview of JSON functions in SQLlite
Let’s review some basics of working with JSON in SQLite.
First, you will notice that I choose the BLOB type when defining the data field in our schema vs just TEXT. This is because of a fantastic recent contribution to SQLite called JSONB.
To quote the SQLite website directly:
Beginning with version 3.45.0, SQLite supports an alternative binary encoding of JSON which we call "JSONB". JSONB is a binary format that stored as a BLOB.
In other words, we can store regular string-based JSON documents in a more optimized binary encoding into our data BLOB fields and SQLite will handle the entries just like regular text based json which will make working with JSON data much faster.
Additionally, like many modern databases, SQLite ships with a set of over 30 scalar, aggregate, and table-valued functions that makes working with JSON truly pleasurable and simple.
?? If you’d like to follow along and try the following examples yourself, you will need the?SQLite CLI?≥ v3.45.0 installed. SQLite comes packaged with some operating systems such as macOS but you may need to install it separately.
Let’s fire up a new sqlite db:
sqlite3 movies.db
?? This will open the SQLite command prompt and now we can execute SQL commands.
领英推荐
Example of creating a table:
CREATE TABLE movies (
-- The primary key
ID INTEGER PRIMARY KEY AUTOINCREMENT,
-- A BLOB that must be JSONB
data BLOB CHECK(json_valid(data, 0x08))
);
Confirm the table exists
SELECT name FROM sqlite_master WHERE type='table';
Example of inserting data:
-- Insert out JSON document into the data column in the Movies table
INSERT INTO movies (data) values
(jsonb('{"Name":"Forgotten in the Planet","Year":1970,"Genre":["Comedy","Crime"],"Director":"Henrie Randell Githens","Cast":["Adrian Gratianna","Tani OHara","Tessie Delisle"],"Runtime":90,"Rate":7.0}'))
Now let’s select that data
-- Get all Name values from the data JSON document in table movies
select data->'Name' from movies;
-- or using js path expression
select data->'$.Genre[0]' from movies;
Now let’s update our Movie
UPDATE movies SET data = jsonb_replace(data, '$.Name', 'My Amazing Movie') WHERE data->>'$.Name' = 'Forgotten in the Planet'
?? Notice the shorthand operator ->> in the where clause differs from our select. The only difference is that if the left-hand side is not well-formed JSON, the ->> raises an error whereas the -> operator simply returns NULL. Depending on your use case, its usually ‘safer’ to use ->>
Handling JOINS and table relationships without foreign keys
Previously I stated that we will not be adding any Foreign Keys to our tables. Relational DB purists have probably scoffed at least several times now but remember, for our methodology we want to avoid doing anything directly in the database that would require working with migrations. Foreign keys , by design, create strict constraints that over time make migrations more difficult. That’s not to dismiss the many benefits they provide, we’re just trying to find a simpler way of doing things at the cost of giving up a few things.
We of course, still JOIN tables using SQL without the use of foreign keys. Let’s take a look at the following example of joining two regular relational tables without foreign keys but instead using the WITH statement:
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
order_date DATE
);
-- Query using a WITH statement to enforce the relationship
WITH customer_orders AS (
SELECT customers.name, orders.order_date
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
)
SELECT * FROM customer_orders;
This is no different in our document store model. We can use JSON functions to accomplish joins using fields in our data field JSON documents!
Let’s create a table called actors which has a JSON document with a field called movies that defines a relationship between actors and movies:
Create Actors
CREATE TABLE actors (
-- The primary key
ID INTEGER PRIMARY KEY AUTOINCREMENT,
-- A BLOB that must be JSONB
data BLOB CHECK(json_valid(data, 0x08))
);
Let’s add some actor data. Notice they movies key contains actor-movie relationships:
INSERT INTO actors (data) values
(jsonb('{"Name":"Jane Doe", "movies":[1]}'));
INSERT INTO actors (data) values
(jsonb('{"Name":"John Smith", "movies":[1,2]}'));
Now let’s join it all together!
-- First create our actor_movies relationship 'table' by splitting
-- on the movies array in the movies JSON field
WITH actor_movies AS (
SELECT
a.ID,
a.data,
json_each.value AS movie_id
FROM
actors AS a,
json_each(a.data->>'movies')
)
-- Now do a normal join.
SELECT
am.data->>'$.Name' as actor_name,
m.data->>'$.Name' AS movie_name
FROM
actor_movies AS am
JOIN
movies AS m
ON
am.movie_id = m.ID;
?? Notice what we’ve done here - we’ve taken unstructured data and built relational dependencies on top of the JSON documents themselves, and, because of SQLite’s fantastic JSON features we can still treat the unstructured JSON as table ‘columns’ . This isn’t anything ground breaking or SQLite specific, many databases have similar JSON functions, but the important thing to notice is how we’re choosing to maintain a relational model on top of a single JSON field.
Let’s say you need to add a new JSON style foreign key to your actors documents in the future. You can easily do this using a query that looks like this:
UPDATE actors
SET data = jsonb_set(data, '$.shows', json('[1]'));
Enforcing Referential Integrity with Triggers
Aside from easy joins, foreign keys also typically provide us with referential integrity. While this can be enforced at the application layer, sometimes we still want some basic referential integrity to happen inside of our database and with SQLite triggers, we can accomplish this without having to touch our table schemas.
Having integrity rules split from table definitions feels more natural and allows our rules to be easily adjusted/modified as our application grows. However, I still maintain that the majority of this type of logic should live outside of your DBMS. Even though triggers don’t require schema migrations, you will need to introduce a more rigid deployment process when adding/removing triggers and with too many triggers things can become messy.
That being said let’s use our ongoing example of Actors and Movies, and we want to ensure that no one inserts a JSON document containing values in the movies key that do not exist as actual Movie IDS in the Movies table. In other words, don’t allow actors to be associated to movies that don’t exist. Here’s how we can use a SQLite BEFORE INSERT OR UPDATE trigger to accomplish this:
CREATE TRIGGER validate_actor_movies
BEFORE INSERT OR UPDATE ON actors
FOR EACH ROW
BEGIN
-- Check if there are any invalid movie IDs in the actor's data
WITH movie_ids AS (
SELECT value AS movie_id
FROM json_each(jsonb_extract(NEW.data, '$.movies'))
),
invalid_movies AS (
SELECT movie_id
FROM movie_ids
WHERE movie_id NOT IN (SELECT ID FROM movies)
)
SELECT CASE
-- Raise an error when we have a failure.
WHEN EXISTS (SELECT 1 FROM invalid_movies) THEN
RAISE(ABORT, 'One or more movie IDs in actor.data->"$.movies" do not exist in the movies table')
END;
END;
Now if we try to insert an actor which belongs to a movie that does not exist, SQLite will throw an error!
The same goes for cascading deletes
Just like with our insert trigger above where we enforce referential integrity on inserts and updates, we can similarly use triggers to cascade deletes!
Let’s say we want to remove all actors who were associated with a particular movie when that movies is deleted:
CREATE TRIGGER cascade_delete_actors
AFTER DELETE ON movies
FOR EACH ROW
BEGIN
DELETE FROM actors
WHERE ID IN (
SELECT a.ID
FROM actors AS a,
json_each(jsonn_extract(a.data, '$.movies')) AS je
WHERE je.value = OLD.ID
);
END;
If you want to see all the triggers you’ve created to date for a particular table you can run:
SELECT name
FROM sqlite_master
WHERE type = 'trigger' AND tbl_name = 'movies';
-- or to see all triggers:
SELECT name
FROM sqlite_master
WHERE type = 'trigger';
Don’t want a trigger anymore - simply delete it:
DROP TRIGGER IF EXISTS cascade_delete_actors;
Indexing JSON fields for faster queries.
With traditional database tables we can create indexes to speed up our queries. The nice thing about SQLite is that you can still build these indexes on JSON fields! Let’s assume we have tons and tons of Movies and we want to build an index on the Year field
CREATE INDEX movie_year_idx ON movies(data->>'Year');
Now let’s select something from the movie’s table and see if our index is being used - yep, sure enough when we select the JSON key Year, our select queries are now optimized by the index!
EXPLAIN QUERY PLAN SELECT * from movies where data->>'Year'='1970'
Results:
id parent notused detail
3 0 0 SEARCH movies USING INDEX movie_year_idx (<expr>=?)
Need to kill this index at a later point in time?
drop index if exists movies.movie_year_idx
Conclusion: Yes, I realize this sounds absurd.
It is not lost on me that what I’m describing is essentially the reconstruction of built in relational database features on top of a non-relational and unstructured database schema. In fact, that is kind of the point here. You see, my entire premise of this thought experiment was to try to find a way to be able to develop an MVP quickly using the benefits of NoSQL document stores while still being able to have some of the great features a DBMS offers us. In the normal relational world, adding or changing foreign keys can have significant impacts to the availability and integrity of data but abstracting some of those basic features using the described techniques allows us to add and remove them without disrupting the underlying data or schema. It also means that we can create multiple versioned copies of our relationships and allow them to co-exist if needed. Most importantly, our developers can quickly add new columns/fields to tables without needing to create significant migrations.
There are of course also obvious downsides and as I pointed out this was mostly a thought experiment. As with any process, its important to set some ground rules that your team agrees on when managing your data storage system and in this case, those rules will operate much more on good faith than through programmatic enforcement. YMMV.
Part 2: Implementing this approach in code
In my next installment I will attempt to show some of the benefits of using the proposed approach in a real world web app and hopefully demonstrate that its probably ok to throw that ORM you love so much to the curb :) . But that’s for another day.