Microservices are usually the right choice, but not always!!
Some points to consider before deciding to migrate workloads to microservices.
?Authors: James Millen , Sabyasachi (Saby) Roy , Cha?maa Ibnoucheikh
"The intention of this document is to inform the process of deciding future architectural strategy. It doesn’t advise on direction but rather highlights some points that should be considered when making decisions on architectural direction.
The intended audience will be people in roles such as CIO, CTO, Head of Architecture and Head of Engineering, and anyone who acts as an advisor to the mentioned roles about architectural strategy."
There is an increasingly noticeable drive for organisations to migrate their monolithic applications to a microservice based architecture and in a lot of cases this makes a lot of sense, but not all cases and in their rush to do this migration, the necessary foundations are not being put in place which usually leads to high levels of cost and low levels of agility, the very things microservices are supposed to solve.
Monolithic applications for the most part are complicated, fragile and difficult to maintain. The only software engineers that are usually able to work with them and change the code are engineers that have been doing that job for a long time, engineers that understand the idiosyncrasies and unwritten rules of working with them, things that a new software engineer would take years to learn. Nearly every business with applications like this see the risks they pose but at the same time, these applications are quite often performing critical business functions so there is an understandable level of fear with the idea of changing them, or any of the dependencies around them. This fear is a double-edged sword and whilst a perception of safety comes from leaving these applications alone, this also leads to developers losing the knowledge of the software so when it comes to having to make a change, nobody has the skills to do it. This fragility is accumulative and the longer these apps are left untouched, the more fragile they become.
However not every monolithic app is fragile, plenty of them are being actively developed and supported but there are still relatively high levels of risk running business critical workloads in monolithic apps, risks that must be managed.
Why are Monolithic Apps problematic?
It is probably worth noting at this point how the preference for Microservice design came about.
In a similar way to the wider technology landscape, system design patterns evolve over time to meet the constantly changing environments and business requirements.?
Up until the 1990s, the traditional architecture application (later referred to as the monolithic design) was predominantly used. This design style came from the first main frame computers of the 1960s which were only really able to run a single program.
Then in the 2000s, the need for service reusability emerged, which led to the development of the Service Oriented Architecture. SOA is a coarse-grained architectural design pattern. The system components are lightly coupled, and service driven, the design focuses on logical layers, so the application is built using abstract modules which can be integrated in a frictionless manner and are intended to be seamlessly reusable. The SOA concept relies on two main roles which are played by a software agent: a service provider and a service consumer. The communication between the modules is usually done using a central point of connectivity – more autonomy than a monolith but it needs coordination with other pieces of the design
Later, in the 2010s, businesses start focussing on portability and automation, while distancing their systems from physical connectivity alongside the need for technology and language agnostic platforms, Microservices design was the answer to these new requirements. The design is fine grained, decentralised architecture similar to SOA but not connected with any central service as is the case with SOA. Decoupled and very independent although many people consider Microservices to be a type of SOA architecture. Any application that follows the Microservices design pattern can also be considered an SOA compatible design, but not the other way around.
So, there is middle ground between monolithic apps and microservices although more general SOA alternatives are not covered in this article as most organisations wanting to migrate away from monolithic apps want to move straight to microservice architecture.
Looking at some of the specific issues with using monolithic apps, their shortcomings become more apparent.
Monolithic applications are notoriously hard to test effectively.
The very nature of them being monolithic means that they cover a lot of ground in terms of functionality, which usually involves a lot of code, all of which should be tested. Making sure all of the different choices, options and possible outcomes are tested is hard to do for a number of reasons
Tight Coupling. The best way to test any software component is to isolate it and control its inputs by injecting prepared test data and monitor its outputs. With monolithic apps that isolation is often near impossible because the functional components are all tightly coupled together. This means all of the real input dependencies have to be set up or configured for each test case, usually a very big and time-consuming task, which for failure test cases may not even be possible. If the App is old and doesn’t use one of the more modern languages with inbuilt unit testing frameworks, the risk of unnoticed gaps in the test coverage is considerable.
Complexity. Another thing especially with legacy monolithic Apps is that there may be long forgotten, unknown permutations and interactions in the code that get inadvertently broken when changes are made, only to be discovered as not working when running live in production. If the App is asynchronous and has multiple threads, those problems and bugs can be almost impossible to find where only familiarity with the code base can identify the root cause. A software engineer in the long and distant past may have thought he was being really clever solving a particular problem in the App in the way that he did, and then sometime later after that particular engineer was long gone, someone making a supposedly simple change is left trying to figure out why their change is breaking something in a completely unrelated area of the App.
Lack of Documentation. For most senior software engineers, referring to the documentation is usually a last resort, when everything else has failed. Documentation is rarely kept up to date, assuming it exists in the first place and when it does exist, is so thin as to be near useless. The older the App, the less likely there is to be any documentation of any use. Maintaining documentation is time consuming and it only takes for it not to be done for a couple of important code changes to render the documentation to be near valueless.
The lack of documentation on these Apps is a critical failure point, because the chances are, they have little or no built-in unit tests. Unit Tests for any application are important for a number of reasons, but it matters here because unit tests give software engineers the comfort of knowing whether the changes they are making break anything. The more complicated the code, the more the engineers rely on those tests, especially if they are unfamiliar with the application. Without unit tests, making changes to an unfamiliar complex code base is fraught with unknowns, with little or no control or visibility over the consequences of making those changes.
Susceptibility to entropy build up
The terms complex and complicated are often used in the software engineering domain to mean two different things. Complex tends to refer to structured complexity, which is intentional and comes from the nature of the functionality of the application. With time and effort on the part of a software engineer, structured complexity is ultimately ‘knowable’ as it is part of the design of the application. The term complicated in contrast refers to unstructured complexity, which is also referred to as entropy. This is an unintended consequence of changes made to the source code over time and is accumulative, increasing with each change made over time if left unchecked.
There are numerous causes of entropy build up, and its levels can be controlled through active measures. The problem with high levels of entropy in a code base is that making changes to the code in that state is a highly risky proposition, with changes often resulting in more bugs than they fix. When a code base reaches this tipping point, it is almost impossible to reduce the entropy levels. The only viable option is to discard the code and start from scratch, obviously a very expensive, risky and time-consuming endeavour.
The bigger and more complex the code base is, the more difficult it becomes managing entropy.
New Technology Barriers
Introducing new, updating versions of existing, or swapping out old for new technology is usually an extremely complex undertaking for monolithic applications. The whole application has to be gone through, making changes to potentially many functional components and the ensuing testing will require a lot of time and resources. It will have a lot of risk associated with it and will be very expensive.
They have limited ability to leverage the flexibility and cost savings offered by IaaS Cloud platforms.
One of the biggest benefits to using Cloud platforms is their ability to scale resources up and down according to the load they are being subjected to. The more granular the functionality components this scaling can be applied to, the more cost effective the Cloud Platform application estate becomes. Herein lies the problem with monolithic apps, scaling them is more of an all or nothing capability than it is granular so scaling up becomes expensive because the increase in compute needed to run another monolithic instance can be quite significant. If the only increase in load concerns a small corner of the monolithic app capabilities, scaling up becomes a very inefficient proposition.
Why are Microservices better then Monolithic Applications?
Much smaller test surface area
In contrast to monolithic applications, a microservice app should be limited to performing one role. That means that it is much easier to identify all the test cases needed to fully test the application. It will also be much easier to isolate it and control the inputs so this means it is much more likely that the failure test cases will be able to be performed. Creating the test cases will be much more straightforward as well, without the long dependency tails needing to be set up for each test.
Overall, testing a microservice versus testing a monolithic app is a much less complex proposition. Whilst the argument about there being many more tests needed for ten microservices versus one monolithic app may be true, the increase in confidence of test coverage, the increase in visibility of the testing regime and being able to run a full suite of microservice tests in minutes or hours, verses days or weeks for a monolithic more than makes up for that.
Increased Granularity maximises the benefits of Cloud Platform scaling and offers a low-risk change
Microservices are perfect for leveraging the automatic scaling capabilities of cloud platforms. With the focused scope of the functionality of a microservice, the scaling actions are applied only to the functionality that is under load, which is much more cost effective than the ’all or nothing’ scaling operations of a monolithic application.
Low risk of change
The focused functionality of a microservice also has benefits when making changes to the code. It reduces the risks associated with change. Much simpler interactions within the code, and no risks to other functional areas because they are part of separate applications, vastly reduces the risk and impact of any changes. Deploying those changes to production is also much less risky and if done correctly, becomes almost a non-event. If there are problems detected during a production deployment, the focused area of change allows much faster identification of the problem which in turn leads to faster remediation.
Restricted Blast radius
If there is a failure either in the microservice itself, or with the infrastructure or dependencies around it, running it as a microservice limits the scope and footprint of the functionality affected by the problem. Any such footprint will almost certainly be smaller for a microservice. This same principle is also applicable to security concerns with microservices allowing more clearly defined and controlled security boundaries.
Technology Diversity
With each service being independent, they can be written in different languages, leverage different dependent technologies, even different versions of the same technology without risk of conflict.
When might Monolithic Architecture be the Right Choice?
With all the apparent advantages of Microservices and disadvantages of monolithic applications, why would there ever really be a reason for a Monolithic application to be a better choice?
领英推荐
Low latency
Latency is an important aspect of many applications and reflects the time it takes for the app to complete tasks. Driving low latency within an application is all about making it process requests and perform actions as quickly as possible and is often critical to the success and usability of applications.
Input Data validation. In any distributed system, applications are responsible for validating data being passed into them. It is very rare even for monolithic apps, in today’s business environment for the app to be completely standalone, without receiving data from somewhere else. To protect itself from bad data, the application needs to perform validation, even if it is just basic checks, to stop that bad data crashing the application. Validating all the data being passed in obviously takes time and therefore affects the latency of the app. The more thorough the validation, the slower the app gets.
With microservices, there are many more validation points as data is passed between those services, each one having to do it’s own validation. A monolithic app covering the same functionality as those microservices would only need to do that validation once as it would only have one data entry point. This obviously makes the monolithic app faster. This can be mitigated in microservices with things like increased levels of trust about the integrity of the incoming data, but this also introduces risk
Network Call stack. Each time a network connection is established for data transfer, a lot of code needs to be run to make that connection happen, all the way from the application down into the Operating System and the network device driver. Then it has to happen at the destination of the connection where a similar stack of calls has to be made to connection the network driver to the application receiving the data.
Protocols like HTTP do all this for each and every data exchange. If you add any authentication processing to this as well, it can have quite dramatic effects on the latency. Obviously Microservices are communicating between themselves and need to carry out these types of operations a lot more than a monolithic app where all of the communication is internal. There are ways to mitigate this with microservices such as long running connections that stream data, but these long-lived connections bring their own problems.
Ultimately latency and robustness are often weighed against each other to try and balance speed against stability in both microservice and monolithic architectures, however monolithic architecture will make it much easier to get better performance.
Organisational Maturity
To successfully run a microservice estate requires a high level of maturity and knowledge within the organisation, not just IT but the wider business organisation as well. If this is not in place, or the IT function is very small, running a small number of monolithic applications might be a much more realistic undertaking.
Some Key Points to Get the Most Benefits out of a Microservice Estate
One of the key points to successfully running an estate of microservices is to use as much automation as possible. Deployments, configuration changes, onboarding new services and reporting and monitoring should all be heavily automated.
To this end, there are a number of key aspects that need to be in place to ensure the smooth running of the microservice estate.
Service Discovery
Service discovery is all about automated mechanisms for finding the network locations of services and devices in a distributed system. There are potentially several network protocols involved in this and there often has to be services dedicated to helping other services find what they’re looking for. There are two fundamental aspects to this, self-discovery and dependency discovery.
Dependency Discovery When a service starts up, it needs to understand its environments, where its dependencies are etc. For example, an app may need to connect to a database. Hardcoding the network address or even the DNS of that database could be problematic. The better way would be for that app to ask which database it needs to connect to. However, before it can do that, it may need to work out exactly where it is and what it is supposed to be doing. So, before it can ask about dependencies it will need to perform some self-discovery.
Self-Discovery In its simplest form, this may just be about the app working out whether it is running in production or a lower environment, and maybe if it is in production, which geographical region it is in. These things may then determine exactly what dependencies it needs and what they may be called. The way self-discovery often feeds into dependency discovery is through patterns in naming conventions. So, for example a database network name may follow the convention of env-region-cluster, so prod-apac-2 or dev-eur-1, so if the app knows where it is running in terms of its region and environment, it can work out the name of the database it needs to find. Obviously, this can get significantly more complex.
Automatic reconfiguration with a microservice estate, the lose coupling between each service means that they can be updated and changed independently. Changes to a service that impacts another service needs to be pushed to the consuming service.
Typically, such changes should be rare, the API contracts between services should change as infrequently as possible, but when they do change, those changes should ideally be pushed out to consuming services without the need to restart those consuming services. This will keep the impact of rolling out those changes to a minimum.
Monitoring and Alerting
Comprehensive monitoring and alerting are vital to successfully running an estate of microservices. Ideally an event driven operations model should be used, where the support teams wait to be notified of things that need their attention rather than having to go and look for problems.
Metrics Vs Logs It is important that the difference between logs and metrics is clearly understood by the teams developing and monitoring the services, and that those conventions and rules are adhered to. They should cover things like naming conventions, units of measurement and dimensions.
In general, metrics are time sensitive with any single item being less important than the pattern they collectively indicate. Logs are typically discrete events, each of which is important data in its own right and their usefulness does not age in the same way as metrics.
Visibility It is important that the logs and metrics any one service generates provides enough visibility of what the service is doing, how much traffic it is handling, how it is performing etc. This data needs to provide enough information for anyone looking at it, to have positive confirmation that the service is running normally or if it isn’t, exactly where the problem is impacting, so latency, or HTTP errors, or connectivity issues with a different service etc.
Logs should provide an audit trail of what the service has done, so for example, if a customer were to query a particular transaction, the logs should provide enough evidence of exactly what the service did for that transaction. Logs should also provide enough information on failures to allow someone to analyse that data and be able to have a good chance of working out exactly where the failure occurred.
Clear Escalation Paths When a problem occurs, it is obviously important that it is resolved as quickly as possible. For large microservice estates, it is important that there is consistency, that escalating issues is done quickly without having to go digging for the information on how and who, and that everyone understands the rules of when escalation needs to be done. The more microservices there are, the more important this becomes.
There should be consistent patterns for incident management across all of the services, that are easy to follow and easily understood. The middle of an ongoing incident is not the time to be finding gaps and inconsistencies.
Asset and Configuration Management – understanding the estate
Service Ownership Tracking the ownership of assets, which product they are part of as well as which individual is responsible for them are all very important. This allows costs of ownership to be much more easily tracked and it helps with the support and escalation.
Service Dependencies This should include other services and products that any one service depends on, and also third party packages or frameworks (and their versions) that the software may include, or that may be installed on servers or hosts or connected in some other way to the assets. This data is vital in the event of things like zero-day security alerts.
Stable APIs The APIs provide the touch points between services so, to maintain the necessary separation they need to be stable, with changes only being done when absolutely necessary. There are numerous techniques that can be leveraged to enable that stability such as abstraction and versioning, but this stability is really important. Each time an API changes, all of the services calling that API have to be checked and changed accordingly. If it is a breaking change, as soon as the new API is in place, all of the consuming APIs stop working so coordinating deployments on a scale like this can be a huge task with lots of risk, and well worth avoiding. Follow an API first development process if possible, this will dramatically reduce the amount of rework that has to be done.
CI/CD Using automation for build, testing and deployments is critical to successfully running a microservice estate. The more automated it is, the better. This reduces the chance for human error, increases repeatability and predictability, all of which minimise risk.
Autonomy One of the main advantages of a microservice is that it is autonomous. It can be changed and rewritten, if necessary (as long as it stays faithful to the APIs that it implements), and it needs to be resilient enough to handle problems and error states in any of its dependent services. It’s important that it can gracefully deal with any failures of systems around it.
Self-Healing If a dependency that the service depends on goes down or fails in anyway, the service needs to handle it, and when that dependency is back up, for the service to detect that recovery and for it to automatically recover and start working normally again. When the dependency is down, it needs to report that fact through logs and metrics, and when it is back to normal, it needs to report that too.
Self-Protecting This capability is closely aligned to self-healing, but not the same thing. This is where a failure in a dependency just causes this service to report a problem, and not actually fail itself. There are a number of ways to do this for example the circuit breaker pattern, but aside from a dependency having a failure, the service should be executing code normally and not having any problems itself.
Loosely coupled This is closely aligned to the importance of APIs and the fact that those APIs must abstract away any indication of how the service is implemented. This stops implementation technology bleeding across API boundaries which can cause all sorts of risk and complications.
Trust Boundaries There are two aspects to this, security and data integrity. From a security perspective, the trust boundaries need to be defined up front, typically microservices would operate in a zero-trust domain but there are often reasons why this can be impractical so as a rule, build on a ‘as little trust as possible’ basis. From a data integrity perspective, the same rule applies. There are security implications with trusting data as well so there is a lot of overlap between security and data trust. Again, validate as much of the data integrity as practical.
?
Conclusion
Following the Microservices architectural pattern brings a lot of benefits, but the foundations have to be put in place to get maximum benefit
Doing a good job of running microservices at any sort of scale requires a lot of pieces to be in place which can take a lot of time and resources to achieve.
Org Readiness the teams building and supporting the apps need to be organised very differently to teams running on-prem monolithic apps. The skills are different, the processes are different, the tools are different. Quite often the culture is also different. Without these changes, organisations can still run microservices, but getting the benefits businesses typically expect will be very challenging. To do it properly, the changes needed are organisational, they go beyond just technical and architectural, and it requires commitment and time to achieve. This is a paradigm shift and needs to be viewed as a journey, a long term commitment to go into a cycle of iterative steps of change and evaluation. The scale of change needed isn’t something that can be done overnight.
Tech Readiness automation is the key to running microservices successfully, and as much of it as possible. Leveraging automation on the required scale requires investment in tooling, often investment in IaaS, PaaS and SaaS offerings and the skills to be able to use them. Processes need to be lean, agile and automated.
Microservices are usually the right choice, but not always
For most organisations microservices are a good choice, however there are situations where this is not the case, both technically and organisationally. It is important for organisations to carefully assess whether microservices are the right choice for their particular situation and not to assume that they are.
Building skills and expertise on Architecting with Cloud to enhance Digital Tx experience
2 年Great paper and the need to make right decisions for clients as well as partners is critical. The microservices architecture is here to stay. Key here is to understand its usage on legacy landscape and transform. Any incorrect decision made will take entire relationship to different level. Very good read, few examples with hybrid structure will add further value to the paper. Thanks to all the authors for penning this.
Digital Program Manager at Sonata Software | #aws #gcp #Cloud #DevOps #SRE #Architect #ProgramManager #ProductInnovator| Multi Tasker, Tech and People Leader transforming customer problems to agile, effective solutions
2 年Good read which captures all the relevant points well.
Partner, Technology Consulting, UK FS Data and Cloud Leader
2 年Sabyasachi (Saby) Roy, James Millen , Cha?maa Ibnoucheikh The paper is timely and brings out finer aspects and considerations for a well architected solution, which is one of the top priorities for technology leaders when they think about migrating to Cloud where modernization is clearly a key value driver. The guidance and considerations highlighted in the paper seems like a fantastic guide for CIO's, CTO's and technology advisors.
Partner at EY
2 年Saby as your article shows the devil is in the detail.It is certainly not just about a religious belief of one over the other.What the client is trying to do and their readiness and starting point are key factors building on your outline of some of the key technical considerations.Good read thank u.Chris