Generic Abstractions
A few days ago I posted about the genesis of my new mock-db library, which mocks DynamoDB-like behavior with local JSON data.
That library is really just a bit player in a larger drama, which is the ongoing refactor from Javascript to Typescript of my entity-manager library.
FYI the entity-manager link above takes you to the PRE-refactor branch!
This widget provides the internal data access layer for all of my Serverless Framework projects, and this refactor will not only make it type-safe, but will incorporate close to two years of lessons learned across well over a dozen backend services.
High stakes.
The release of mock-db triggered a cascade of activity, and as I look back on the past few days, I see a couple of pretty good answers to the question: What does a senior Typescript developer actually DO all day?
So I thought I’d share.
Some Context
To understand what I’ve been up to, you need to know a bit about the entity-manager library and the problem it’s meant to solve.
All of my backend projects are built on the Serverless Framework. For this discussion, the key feature is that all of these projects are serverless: they run on AWS Lambda instead of some box on a server farm. When I use a database, I use DynamoDB, which is a NoSQL database in the AWS cloud that is also serverless.
NoSQL databases are very different from traditional relational database systems (aka RDBMS) like SQL Server.
The entities in an RDBMS (like users, orders & invoice) are each represented by a table. You tell the platform what tables you want, what data they contain, and how they relate to one another (this is the schema). Mostly the platform takes care of the messy parts under the hood. You can query the database however you want, and if your schema changes—say, you want to add another table or change a relationship—the platform can easily accommodate that.
That’s the good news. The bad news: all that flexibility comes at a price. RDBMS systems are flexible and fast when you just have a few records in the system. But when you scale those records up into the millions, things start slowing down. If you aren’t careful about pulling inactive data out of the system—assuming it has inactive data—your fancy RDBMS can wind up unusably slow.
NoSQL databases like DynamoDB are just the opposite.
NoSQL databases are often characterized as schemaless, but that isn’t really accurate. Instead, if you want your data to have any structure, it is up to you to encode that structure directly into your data! Rather than interacting with a sophisticated schema layer hiding a bunch of intermediate indexes, you interact directly with your data and a very simple set of indexes.
So from an organizational perspective, NoSQL databases are real monsters if you want any kind of significant structure. But compared to RDBMS systems, their performance at scale is off the hook! A well-designed NoSQL database will query 100 million records just as fast as it queries 100 thousand.
The problem with encode your schema directly into your data is that it isn’t super obvious how to do that. Concepts like the DynamoDB single-table design pattern are great in principle, but they’re really hard to implement, and they’re really hard to implement in a consistent fashion across multiple back-end services that each use an independent DynamoDB data store.
entity-manager solves that problem:
So on the one hand entity-manager provides an opinionated, out-of-the-box answer to the question of how to structure NoSQL data. And on the other hand, it replaces a ton of code with a simple configuration that looks and works consistently across every independent microservice in your entire back end.
Pound for pound, entity-manager is by far the most successful piece of software I’ve ever written.
So if it ain’t broke…
Why Fix It?
Part of the answer can be boiled down to a single word: Typescript.
I picked up Typescript about a year ago after having spent about a gazillion years in the Node.js/Javascript trenches. I was late to the game, and frankly skeptical: I am a very anal coder who writes very tight, well-documented code, and I just didn’t see the point of enforcing type safety in code that is just not very likely to suffer from type errors.
Once I actually started using Typescript, I was immediately hooked.
Writing software is often very much about getting a package of data from over here to over there, with some transformations along the way.
If you write software like a grown-up, then to get that thing done, you:
Not strictly in that order.
Now that’s a lot to keep in sync, especially when requirements change. And your requirements (or at least your understanding of them) will always change. So how do you keep track? When the various facets of your code fall out of sync, how do you actually find out?
That is what Typescript is for. When the left hand falls out of sync with the right hand, you know instantly. You don’t have to run a test. You don’t even have to ask. Red squiggles just appear in your IDE, and you know.
By an amazing kind of alchemy, the result of communicating the left hand’s intent to the right hand through consistent typing is almost always better code! And really good Typescript is like magic: once the left hand’s intent is perfectly expressed, the right hand’s code just flows.
So these days, when I write something new, it is always in Typescript. Since Serverless Framework projects are my bread and butter, and since entity-manager is my go-to data access layer, it just needs to be type-safe so it plays nicely with all my other Typescript goodies.
Oh, the Serverless Framework itself? They went Typescript with version 4.
The switch to Typescript has a lot of knock-on effects. For example:
Abstractions
A few days ago I found myself mid-swing in the entity-manager refactor. My strategy was to refactor the entire existing feature set and get all of the existing tests to pass before adding any new features.
I saved the hard part for last: the query method. This function’s job is to…
This method works just fine in production, but its unit test coverage was very thin. No surprise there: the only thing that really behaves like DynamoDB is DynamoDB, and it’s generally poor practice to unit test against a cloud service. But a complete refactor means I can no longer trust that production experience, so time to write better tests.
Meanwhile, remember: entity-manager is generic. Except for a little leakage, it doesn’t contain any DynamoDB-specific code. So I really didn’t need a DynamoDB clone to test against. I just needed something that behaved enough like DynamoDB to exercise the key features of the query method.
In practice, it needed to:
Enter mock-db. It took about fifteen minutes of coding to understand that mock-db was going to be big enough and generally useful enough that it made sense to abstract it out into its own library, and then import it back into entity-manager as a development dependency. That’s where I wound up this past Monday.
But as soon as I did that, I realized I had a new problem.
Like any other library, mock-db required unit tests. And, since mock-db was built to handle the exact same kinds of data sets as entity-manager, it needed the exact same set of types. Also, they both needed to perform some simple data operations like sorting and de-duping on the same kind of data.
A core principle in software engineering is DRY: Don’t Repeat Yourself. So how do I use the exact same types & functions in two different libraries without repeating myself?
领英推荐
Easy: I created a third library, called entity-tools, and made it a dependency of both!
Also, a class called DynamoDbWrapper encapsulates the DynamoDB-specific stuff and injects it into EntityManager. DynamoDBWrapper has a unique dependency called aws-service-search that handles some of that load. But in retrospect it didn’t really make sense for aws-service-search to be a distinct dependency, so as this work proceeds I’ll be merging it into DynamoDbWrapper when that class gets its own Typescript refactor in a few weeks.
Put it all together, and here’s the before & after:
If you get the sense that I’m sort of going around in circles here, making a modest improvement at each pass: you’re right! Those iterations are the essence of Agile software development, which in the main is the only way to write software that actually works.
In fact, that’s where the entity-manager package came from in the first place: as soon as I had more than one Serverless Framework project to manage, it became abundantly clear that they all needed to be handling data the same way. Those original projects birthed both entity-manager and DynamoDbWrapper.
So the arrows in the diagram above don’t just represent the dependency flow of these components. They also represent the historical decomposition of the work.
Genericization
Is that an awful word, or what?
As soon as Typescript enters the picture—stay with me here—everything gets typed.
From a trivial perspective, this means that Typescript will complain if I try to do something dumb like treat a string as an integer. I can kind of cheat, accidentally or on purpose, by using the special-purpose any type, which means exactly what it implies. But if I’m smart, I’ll set Typescript up to complain when I do that, too. Which leaves me no alternative but to type consistently.
But what does that actually mean?
Say I have a simple type representing a user, which I want to manage in my data store with entity-manager. I can define that type like this:
interface User {
id: number;
name: string;
optional?: string;
data: JsonData; // <-- what is this??
}
Now say I want to create a new instance of entity-manager. I can write this:
const entityManager = new EntityManager(config);
… where config is the configuration I use to describe my entities, indexes, and sharding strategy. But hang on… how do I know if my configuration is valid?
For example:
Remember, in Typescript we have two kinds of configuration:
Guess which kind is preferable.
So when I create a new instance of EntityManager, Typescript style, I should be able to tell it:
If I do that right, a couple of things will happen:
That’s worth doing a little work for.
Let’s start with data types allowed by the database. Say I define this type:
type DefaultProperty =
| string
| number
| boolean
| null
| undefined
| { [key: string]: DefaultProperty } // JSON objects
| DefaultProperty[]; // JSON arrays
This recursive type definition describes a bunch of scalar types, as well as JSON objects and arrays. Now I can alter my EntityManager class definition so it looks like this:
export class EntityManager<P = DefaultProperty> { ... }
Say my database can’t handle structured data like JSON objects and arrays. I can define an appropriate new property type and create a new instance of EntityManager like this:
type CustomProperty = string | number | boolean | null | undefined;
const entityManager = new EntityManager<CustomProperty>(config);
What you are looking at there is a generic class definition. The EntityManager class is parameterized by a type P, which defaults to DefaultProperty. When I create a new instance of EntityManager, I can specify the type P by passing it as a type argument to the class constructor… and, just like magic, Typescript will yell at me if I write code that wouldn’t handle this type properly!
Take this to its logical conclusion, and when you see the full EntityManager Typescript refactor, creating a new instance will look something like this:
interface User extends Entity {
userId: number;
name: string;
optional?: string;
data: JsonData;
}
interface Email extends Entity {
email: string;
userId: number;
}
interface MyEntityMap extends EntityMap {
user: User;
email: Email;
}
Then I’ll be able to write a config object & define a new instance of EntityManager like this:
const config: EntityManagerConfig<MyEntityMap> = {
entities: {
user: {
primaryKey: ['userId'],
indexes: {
name: { 'name' },
email: { 'email' },
},
},
email: {
primaryKey: ['email'],
indexes: {
userId: { hashKey: 'userId' },
},
},
},
};
const entityManager = new EntityManager(config);
… and internally the types will just work. The MyEntityMap type will impose design-time constraints on the config object and the rest of my code, and prevent me (and other users of EntityManager) from making expensive mistakes while coding.
What… That’s It?
Seems like kind of a ragged ending to a long article, doesn’t it. Know why?
Because the very act of writing this article sent me into a couple of new directions that will significantly (and positively!) affect the outcome for EntityManager.
Remember the question at the top of the article? What the hell do I actually do with my time?
Well… this.
As I mentioned above, iteration is the heart & soul of modern software development. It works just like any other Hero’s Journey: you’ve got to hoof it some ways down the road before you can see very far into the distance.
More to come. Meanwhile: time to iterate!
Visit my website for more great content & tools, all built for you with ?? on Bali!
"VeteranCrowd is revolutionizing how merchants engage with the military community, moving them out of antiquated silos and into the future of customer loyalty."
6 个月Thank you. We’re blessed to have you and your unique talents helping us solve complex problems.