My worst nightmare: Maps of Strings and Objects.
So I've been around the dev world for a few years. As someone with a diverse background in different languages, but mostly someone that's been relatively active in the Java world, I've seen things that I've liked a lot (stream processing, lambdas, groovy closures, to name a few) and things that I've despised. The worst of them all isn't poorly formatted code, or 4000 line pom files, or builds that take ages. There is a line that evokes fear in me whenever I see them in code, that is:
Map<String, Object> TheMaps.
From now on, "The Maps" are Maps of Strings to objects, if you're a polyglot like me you might also know them as Dictionaries. I'll be using this name to refer to them instead of having to write "Maps of Strings to Objects" every single time I want to refer to them. You'll be glad I'm doing this, I promise.
I'm well aware that some Map implementations (in particular Hash Maps) have excellent performance and have their place in high performing code. But this isn't something that I'll be discussing here. I'll be talking about The Maps as thieves of both clarity and security and as code breakers. Here's a rough outline of what I'll be talking about.
- Why do we use Maps of Strings and Objects?
- Why you should stop using them?
- Damage control, stop using them and getting around their many flaws.
Why do we use these things?
Putting aside the high performance that we get out of Hash maps, my answer to this question in 90% of the cases that I've seen is simple: Because we're lazy.
Let's face it; the power of this structure relies on the fact that they allow us to stick a highly complex structure into a format that anything will be able to process. You don't need an interface; you don't need to know what's in there, you need to know how to get it. All you need is an enum so that your devs don't try to murder you while you sleep, and you've got universal access to this structure from any point in your stack. That is fantastic! I'm not being sarcastic, this is a mighty powerful data structure. You do not need to create an additional type of object to hold something that could be infinitely complex, but, as Uncle Ben told Peter Parker, I'll let you, dear readers know:
"With great power comes great responsibilities."
I'm sad to inform you that Ben wouldn't be able to rest in peace knowing how we're using maps though. We were supposed to use this power for good. Instead, we've been corrupted, we've allowed this mega-structure to be the placeholder for any data structure that's likely to change.
What usually happens is that at the start of a project, when it's being developed as a prototype, most people want an MVP, a Minimum Viable Product. We know that in this industry there's loads of work and very little time to get it working. So instead of defining an object like:
We simply place the customer order like a Map of String with a static public definition class that defines what goes in.
To enforce that the data is safely extracted from the map from those keys. Remember this method, we'll come back to it in a few lines.
What you see above is definitely messy and overly verbose code. This might initially not be a big problem, but it is a little like hearing your three-year-old toddler drop an f-bomb in front of your in-laws for the first time. The problem comes further along the road when your kid is 26 and can't stop swearing. When your cute MVP has grown into a full project with 40 devs and with possibly millions of customers to serve. At that point, you'll have a much bigger issue. And surely, you know that you have to level up and remove this map that's causing so many headaches. But how?
Why should I stop using them?
The Maps, even though they're flexible, and might have their place in smaller objects, have their issues. First of all, Maps are opaque. Because they can hold anything, no one knows what's in there. Also, the fact that you've been given a set of keys to access the data in it, doesn't necessarily mean that is the only thing inside the map.
Secondly, you can make maps "safer" by having an enum or a bunch of static strings to access its fields. However, even then, you don't know when the new programmer is going to add a new object with the key "hello" to your map. Maps are unsafe. Thirdly, because of their previous faults, Maps are unpredictable and undeterminate. When you debug through a Map, you never know what you're going to find.
When I start programming around my favourite data type, there are a few problems that will arise due to the opaqueness of this data structure.
- As a developer, I don't know what data that parameter can hold.
- When I open the enum with the keys to access that map, we all know it's going to be 1000 lines long. But even if it isn't, the map allows anyone with access to it to introduce another key.
- As a QE, I don't know how to test any function that has a Map as a parameter. Because I don't know which of the keys the map is going to have, my input domain for that function has grown exponentially. I have thousands of probabilities to explore in terms of my input. Say bye-bye to your unit testing, at least, your love of them.
At this point, you're convinced, you know we need to stop, but... How? When you search for map.put() in your codebase, you get thousands of hits. Ah, that's another point, Maps are resilient, and hard to get rid of. No problem friend, I've got you covered.
How to get rid of The Maps?
I won't lie, the main reason why I've seen these in my professional life is that they've grown so large that they're basically impossible to get rid of without doing a major refactor. It isn't easy, and even though I hate working with them I understand how they got there, and probably why I need to bear the burden that they lay on my professional life. You might be lucky enough to not have to, so here is a simple guide to getting rid of Maps, there are three steps:
Step 1: Getting around The Maps.
If you don't have the resources or the capabilities to allow you to make a refactor, you can mitigate some of the opaqueness of your Maps by creating a util class that creates instances of objects that are a faithful representation of the data the map is supposed to hold. That will at least let your devs know what kind of data you're dealing with. If you want to up your game, force your test team to create one map per scenario. Here's an example of a normal map containing regular information:
For examples on possible maps containing faulty information, you could include a card number that contains characters that aren't numbers or dashes, a null basket, a past date of expiry for the card, or a negative number for the total amount. Every little detail will give your QE team more room to create better tests to suit your complex everchanging structure.
This could get tedious to whoever is tasked to do it, but it will be worth it, as the aim of this is to reduce the unknowns in your codebase. With this, your entry domain is clearly defined and your QE team will thank you for it.
Step 2: Casting The Maps to full objects.
At this point, you have either the power or the resources to be considering this change. I'm actually kind of proud that you're reading my article.
There are two ways of getting rid of The Maps: you either wrap around them, or you cast them. I'll discuss the pros and cons separately.
Wrap around them using Skinny Wrappers(SW):
What are they? Essentially they're a POJO that looks like this:
The main advantage of SW is that they are a trivial change to make. If you have function prototypes designed to take the Maps as a parameter, changing these parameters to proper objects will be painful. To alleviate the change, you can simply create your SW object the moment you receive your particular strain of The Maps. After creating your SW, change the get() and put() calls to your map to the access methods of your SW and most of your woes will be gone. For example, if we look at our new processOrder method, it would look like this:
Which someone could argue is a vast improvement from all the map access in our previous processOrder version.
This change has little to no effect to performance and it will increase the readability of your codebase tenfold. As this object is introduced to your codebase, those calls to put and get will decrease, and your Map<String, Object> objects will decrease, staying only as remnants of your past sins in those immutable prototypes of your interfaces and REST APIs. At this point, you might consider to up your game and try option number two:
Cast your map to an object.
Here we need to talk performance. You will take a hit. This hit might be negligible, it might not, depending on how large your map originally is. It's up to you to say whether the change is worth it or not.
Your object would look like a normal object, but it would also feature a map constructor, which would allow you to pass the map in and map each map attribute to its variable. Once again, this is a change you will want to roll over incrementally. Create the POJO and then change one module at a time, performance test your system and make sure you're doing fine. There are multiple libraries out there that will do the casting for you, Jackson for java is a notable example, but there are many more out there. Go crazy.
At this point, you might want to get rid of The Maps entirely.
Step 3: Use your POJOs instead of the maps in your function arguments.
This, I'm sure you'll know it's easier said than done, but it is possible. Next time your major version of the product comes up, change that prototype and bring that object in. Make sure that you have methods from your POJOs that take in Maps and return Maps, as no one really ever gets rid of the Maps once they've caught it.
Conclusions:
Maps are opaque, dangerous and hard to get rid of. Now you know. Don't let Uncle Ben down. Next time you're doing an MVP, create that object. Define it early, and if you have to change it further down the road, so be it. It'll be more annoying, but you'll be happier.