Climbing The Mountain of Modularity
As a trainer and coach of software developers, I've seen how modular software design (e.g., object-oriented or microservices) can be a difficult skill for folks to really wrap their heads around.
It doesn't help that it's explained in many different ways, at many different levels of modularity (methods and functions, classes and modules, packages and components, processes, services, servers, systems and systems of systems).
Just taking objects as an example, we have multiple sets of design concepts and principles: SOLID, "Tell, Don't Ask", data hiding, encapsulation, abstraction, inheritence, polymorphism, message passing etc etc.
And every level of code organisation seems to have its own set of principles. The way classes are grouped into packages is explained quite differently to the way members are grouped into classes.
The question is: are the principles of modular design genuinely different at each level? Could we, in fact, explain modularity in more universal terms? I'm finding that we can, in ways that can be applied at each level.
First, though, we need to revisit our motivations for modularity. Why break up code into separate units? While some may talk of reuse or of scalability and other issues (e.g., when justifying microservice architectures), what I've seen over the last 30 years using pretty much every mechanism for modularisation I can think of, it's about making change easier.
Think how we might break up a long method into multiple methods that are invoked in the original. Not only can this help to explain the flow of the original code using well-chosen method names, it means a developer changing one part of that flow can do so with less potential impact on the rest.
Or think of how we might split a class that has two distinct responsibilities - two reasons to change - into two separate classes. And so on.
I see a definite correlation between increasing degrees of modularity and our ability to independently change parts of a system with less impact on the rest of it.
There's a trade-off, of course. A system with 100 classes in a single .NET DLL or Java .jar will cost a lot less to develop and maintain than the same 100 classes deployed in a dozen microservices.
And the bulk of this additional overhead is created by scaling up the dependencies in our code from methods invoking methods, to classes using classes, to components referencing components, to services calling services and so on.
At higher levels of modularity, these module dependencies can turn into organisational dependencies - dependencies between teams or even separate businesses - and then our problems really begin.
Just to qualify what I mean by "dependencies", a dependency is a relationship between two parts of a system where changing one part can break the other. When I refactor code to make it more modular, some developers - typically more new to programming - will protest "Jason, you've introduced a dependency!" Many argue this is the second edge of the double-edged sword of reuse. And at higher levels of code reuse, that's certainly true.
But at the code level... Well, take a look at this simple Python code:
def order_average(orders)
total_orders = sum(map(lambda order: order.price, orders))
order_count = len(orders) if len(orders) > 0 else 1
return total_orders / order_count:
Changing the the name of total_orders on the first line will break the third line. Changing the calculation of order_count could break the third line, too. For example, if we allows it to be zero.
So, without any modularity, there are already dependencies in this code. If I extract the first two lines into their own functions, and maybe even into their own source files, it will just make the dependencies easier to see.
def order_average(orders)
return total_orders(orders) / order_count(orders)
def total_orders(orders):
return sum(map(lambda order: order.price, orders))
def order_count(orders):
return len(orders) if len(orders) > 0 else 1:
Now I can change the way order totals are calculated without changing the higher-level logic of averaging the orders. Put total_orders in its own source file, and independent changes get even easier. Now if two developers want to change either of those concerns, they'll be changing separate files, with a much-reduced risk of a merge conflict.
Put them in separate Python packages, and they can each be part of their own release cycles, perhaps even maintained by different teams, with their own goals, practices, rhythms, and rota for coffee runs.
Some managers believe this is a way to scale up software development. And it could be, if it wasn't for just one teeny tiny detail: our dependency between two lines of code in the same function has now become a dependency between two teams. Changing total_orders can still break order_average.
And this is where our modular design principles come in. We can't just be arbitrarily packaging our code, because we run the risk of creating many dependencies between modules that could produce a wide "ripple effect" when one module is changed.
Our aim is to contain the ripples; to localise the impact of making changes to modules on the rest of the system. This is as true of how we design, say, classes as how we choose which classes to include in each component or package. It's the same 4 principles at every level - turtles all the way down.
领英推荐
1 and 2 are about cohesion and coupling. Think of the relationships between members of a class - methods, fields, dependencies etc (cohesion) - and the relationships with members of other classes (coupling). At a higher level, think of the relationships between classes in one package, and between classes in other packages. Turtles, remember? Coupling and cohesion are two sides of the same coin. To reduce coupling between modules, we need to increase their cohesion. We should strive to internalise relationships relating to the same concern inside modules as much as we can.
3 is our old friend polymorphism. This is typically thought of as an object-oriented concept. But in reality, other programming paradigms have it, too. In functional programming, we can see an example of a function being used by another function without knowing its implementation. The map function in Python accepts a function implementation (in this case, a lambda expression) as a parameter value. As long as the function has the signature map requires, it can apply it to every element in the list. What the lambda actually does is what I like to call "Somebody else's problem"; map doesn't need to know.
In OOP, we can achieve a similar thing to passing in function implementations with dependency injection.
class Totaller
def total(self, orders):
return sum(map(lambda order: order.price, orders))
class Counter:
def count(self, orders):
return len(orders) if len(orders) > 0 else 1
class Customer:
def __init__(self, orders):
self.orders = orders
def order_average(self, totaller, counter):
return totaller.total(self.orders) / counter.count(self.orders):
Note that Customer doesn't have any dependencies on the implementations of Totaller and Counter. As with passing a lambda into the map function, here the client code decides which implementations Customer should use.
This means we can swap the implementations from the outside without making any changes to Customer. We call this Inversion of Control. (As with all of these concepts, you may know it by other names, like "orchestration". My favourite is "soft-wiring") In short, the decisions about which modules are talking to which other modules are taken higher up in the stack, and can be dynamically changed - even at runtime - without having to change lots of individual modules in which dependencies are hardwired.
At a higher level, there are many examples where component or package dependencies are equally soft-wired. Back in the days of Microsoft's Open Database Connectivity standard - ODBC - there were a set of interfaces (abstractions) that all database drivers needed to implement, packaged in their own library which our code depends on. Our code didn't need to directly bind to, say, the Oracle ODBC driver library. The decision of which actual physical library (DLL) to use was taken outside of our code - Inversion of Control.
Or think about web services; it's considered a bad idea to hard-code their URLs in our code. Many teams have centralised services for mapping abstract web service calls to physical addresses on the Net. As long as our requests match what the service is expecting, and its response matches what we're expecting, we don't need to worry about its implementation changing.
At every level of modularity, then, we concern ourselves with the contracts between modules. If we don't change the contract, the calling code doesn't need to change. And hence, the ripples are contained to that module.
Finally, 4 is about how modules present themselves to the outside world. Many developers come at interface design from the wrong angle, asking "What does this module do?" This can lead to larger interfaces, and can also lead to integration mismatches. Imagine designing the pieces of a jigsaw then trying to make a picture out of them at the end. (I've seen some eye-opening examples of this in my career, where developers were sent off to work on individual modules, only to discover at the eleventh hour that none of the modules fit when we tried to put them together.)
The much more useful question is what do clients of this module need to tell it to do? Define the shape of the hole before we design the piece to fit it.
A great example of this approach is described in the book Growing Object-Oriented Software Guided By Tests by Steve Freeman and Nat Pryce. They drive their end-to-end designs by starting with an end user outcome, then working from the outside-in (starting at the module that's the entry point to their logic), they write the code - starting with a failing test for it - for what that module needs to do, and use interfaces as placeholders for any significant dependencies - cans of worms we might not want to open yet (and approach sometimes referred to as "Fake it 'til you make it").
When the outermost module is working, they move inwards and write the code for implementations of it dependencies (which may, in turn, have their own dependencies for which interfaces serve as placeholders - like I said, it's turtles all the way down.)
Imagine we want to focus on how the average order value's calculated, without thinking yet about how the total orders value and order count are handled.
def order_average(orders, total_orders, order_count)
return total_orders(orders) / order_count(orders):
If we use dependency injection to provide those functions, our client code - in this case, our unit tests - can swap in their own implementations of total_orders and order_count that will just supply our test numbers.
class OrdersTest(unittest.TestCase)
def test_order_average(self):
self.assertEqual(7.5, order_average([], lambda: 15.0, lambda: 2))
def test_order_average_no_orders(self):
self.assertEqual(0.0, order_average([], lambda: 0.0, lambda: 1)):
At this point, no implementations exist outside the test code. But the signature of the functions is defined in the client code - by the user, if you like. Any function that conforms to the contract defined here will work. Importantly, the contract is defined from the client's point of view. It describes what they need from that function. It's a subtle but important distinction.
In this sense, it's the interactions between modules - and the contracts that constrain those interactions - that are the most important thing after whether or not the end user's goal was achieved. In OOP, messages are first-order design concerns . I would go further and suggest that they're first-order concerns at every level of software modularity. Turtles, yeah?
So, to recap, our primary aim in modularising software is to enable code to change independently; to localise the ripples when we make changes. There are four principles of modular software design I teach:
Well-designed modules:
The details may differ between, say, class modularity and microservice modularity, but the underlying principles are essentially the same.
Of course, the map is not the terrain, and the real learning is in the applying. And that's where someone like me comes in :-)