Overcome Technical Debt in IT Infrastructure 2024
Andreas Hartig
Microsoft MVP Windows Server | Azure Hybrid & Migration, RCDA Trainer, CGI Luminary, Director Consulting Expert
What is Technical Debt?
Technical debt is the cost of choosing an easy or cheap solution now instead of using the correct approach, whatever that means. It'll probably cost you more in the long run. In my words, technical debt is the cost you have to pay for running or supporting an outdated technology.
Two things were important for me to learn in my career. The creation of technical debt can be the result of a conscious decision or the consequence of external factors, such as the availability of resources, skills, roadmaps, budgets, or the result of a merger or acquisition.
Just as with monetary debt, technical debt also accumulates interest over time. The longer it is left unaddressed, the higher the cost of remediation.
Technical Debt in IT Infrastructure
In IT Infrastructure we are looking into network, servers, computers, components, applications and cloud components, that are not in the expected and required state to support the business.
Especially in IT infrastructure it is challenging to have business leaders understand why lifecycle management and operations are a necessary component to have their business up and running. You will be challenged with phrases from current management trainings, like "keeping the lights on" or that IT operations is a non value add department, which does not create business value. This has become especially true in the world of "the cloud". Business leaders still think that using the cloud does not require on premise hardware or that "virtual hardware" in times of AI and Infrastructure as Code (IaC) will magically maintain itself.
Typical Technical Debt in IT Infrastructure
Technical Debt can occur in many ways in your IT infrastructure.
Documentation
It is not uncommon for documentation to become outdated or for it to exist only in the minds of a few individuals. These individuals may be external consultants or internal team members. Loosing access to this documentation will cause significant delays in development, longer outages or investments into resolving an issue a 2nd or 3rd time.
Personal Recommendation: You will very often hear a risk "what happens if person xyz is hit by a bus"? I personally consider this totally inappropriate. If you want to use this example, please use a nicer example. My standard is "What happens if person xyz wins in the lottery?".
Operational
If you wish, you may choose to differentiate between operational and infrastructure technical debt. I maintain them together as ownership for hardware and operations is primarily with the Network Manager and Datacenter Manager, or these roles.
Typical examples of operational debt include a lack of monitoring, which can be defined as any situation that prevents an organization from meeting its service-level agreements (SLAs). I have encountered instances where there was no monitoring system in place, a limited number of systems being monitored, an unusually high number of red alerts in the monitoring system, no on-call team, or simply the promise of 99% uptime with no 24x7 operations team.
Infrastructure
A few examples of debt directly related to your infrastructure include outdated (End of Support/End of Life) routers, switches, WiFi access points, firewalls, servers, and any other infrastructure component.
It can also be related to expired service contracts for components or missing licensing options, such as the monitoring system only monitoring 100 servers when you have 200. You accepted 100 server that are not critical for 24x7 as technical debt, but you should work towards resolving this technical debt in the next budget round.
Security
IT Infrastructure plays a significant security role. If you look at end to end communication for any service it will be from client to network components, your datacenter (server room), firewall service, router and somehow to your ISP. If some of these components are End of Life or not in current patch cycles, they will become extremly vulnerable. Making this debt visible to your Chief Information Security Officer, Data Privacy Officer, ISB (Informationsicherheitsbeauftragter) or finally the CEO as the owner of these risks, will help you mitigate a lot of infrastructure related topics. A good source to understand the debt in this area is the CVE database
How to avoid it
Lifecycle Management
Implementing a lifecycle management is easier said than done. If you organization doesn't have a centralized CMDB and you don't have a contract management solution, find your own way. I have seen a lot of very fancy excel files over my 20 years and in your role as datacenter, network or server manager they can be sufficient. Don't wait for someone else to handle this for you. If you have the opportunity to use a tool, I would always recommend Lansweeper.
Monitoring & SLAs
A core component to avoid technical debt is implementing monitoring and SLA's. SLA's will challenge your own team to keep delivering, but will also allow you to use them as the driving factor to ask for getting a 24x7 support, keep a support agreement, get a hardware replacement or look into a 3rd party vendor support i.e. Service Express. If you want to quickly get hardware under support or you want to at least get hardware replacements in time of financial challenges for your organization, this is a great opportunity. I had good experiences with this service.
领英推荐
Consider Leasing, Hosted or Cloud Services (Capex to Opex)
None of these are a 100% fix. Each of these does require tracking of changes, planning of the replacements or regular meetings with the service partners. Moving services to a 3rd partner will not take away responsibility from you and you will still have challenges with products and solutions going End of Life. But you can transition a lot of responsibility away.
When you are in the manager role and involved with budgeting, you should be aware of two things. Your finance team does normally love opex vs capex.
When you are leasing something talk to your finance and budget planning contacts. In my early years I expected that I can just have a leasing replacement, when the costs remain equal. In reality I learned the hard way, that you need to plan for the re-invest in your budget.
Develop Best Practices
To avoid technical debt you need to look at the context of your roles and responsibilites. I would always look into basic to do's.
How to handle it
Roadmapping and Priortization
Buisness Units and Organizations are normally very good in supporting IT Infrastructure requirements. The IT Infrastructure is in a 36 to 60 months lifecycle driven by leasings or support contracts. When it comes to replacements these have to be budget no matter if leasing, capex or 3rd party (project management, traveling, planning,...) supported.
You need to make sure during your budget plan, that there is clear visibility in what components are to be replaced and when. This could be smaller replacements at remote locatons or your centralized datacenter infrastructure.
Be prepared in those discussions to share your roadmap on when you are replacing what. I would also and always come prepared with a priortization list. In case there are budget restrictions or challenges for the business, you want to be able to show alternatives.
Alternatives includes extending the lease by another 12 months, buy the leased components and move to 3rd party hardware support (make sure your CIO / CEO / CISO accepts any risk to it in a written form) or even move to alternative technologies / vendors / outsourcing.
If you are seen as prepared and willing to help and that you care about alternatives and can present the risks and savings to these, it will explain why you want the like for like (premium) replacement. This will make it more likely to get your budget approved.
Visualize & Storytelling
One of my recent lessons I learned is the below Visulization in 4 colours. I use these colors and this visualization to show where I am running projects and investments. This is my main driving visualization to show the black (tombestone) area of all the investments I have been denied or where I have decided to take technical debt to help the business. If you can how your KPIs there, i.e. 35% of the switches will be End of Life by end of next business year and 70% in the following year, you will get attention.
The other component to use is story telling and make things understandable. Sometimes business leaders cannot grab the complexity of our daily IT work. Start to explain things in visual diagrams, make things visible in Euro or Dollar by showing risk and costs or tell a story. My favorite story in my last role with over 120 sites was to explain to my leaders, that if we want to be on a 36 months life cycle, I would need to replace the hardware at one site per week and need to plan the according capacity in travelling, project management and on site support.
Conclusion
Managing IT Infrastructure debt can be done very much like in the software architecture. There are some important messages that need to be transfered, i.e. even in a cloud only IT world you need certain infrastructure and also cloud infrastructure IaC requires updates and maintenance.
While the tools are similiar for Software and Infrastructure architecture and the technical debt managing, you have the advantage of IT Infrastructure being somehow visible and easier to understand compared to software. Develop a relationship and make your area visible to the business and explain how you are driving value by running the business 24x7, with xx % data more each year and how you are able to deliver on KPIs, SLAs and keep costs under control.