Backstage: Cornerstone of our DevOps Platforms at The Warehouse Group
Backstage (https://backstage.io) has become a critical part of our DevOps platforms at The Warehouse Group. We use it to enable teams to deliver at pace, and the results have been truly astounding. 60 second bootstrap times for our Dev Squads for container based middleware components.
How did we get here? In this article I'll explain how we facilitated the deployment of Backstage within a complex environment. I wont be delving into any technical detail of how we build Templates or Plugins, that might be a seperate post. This is offered as an example of how we influenced the company to invest in this as an asset, and hopefully aid you do the same.
I was reading Team Topologies (Matthew Skelton and Manuel Pais) a few months back, and theres a paragraph in there which resonated with me.
A digital platform is a foundation of a self-service API's, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace with reduce co-ordination.
At the Warehouse group, we are now meeting and providing a platform experience that aligns to that definition.
I can say with 100% certainty that Backstage was the catalyst for us to transition into providing that "compelling internal product" as aligned with the definition of a Platform Team.
Intro:
DevOps at The Warehouse Group 18 months ago was writing Jenkins pipelines. We did a completely bespoke solution for all squads we interacted with. The solutions we built had no commonality. It worked for a time but was not scalable. While we certainly provided value for squads from a build/test perspective, how we worked as a squad defined in-efficiency. We were an efficiency anomaly.
We knew we had to standardise on the how, to make us more efficient at what we did. We knew we had to introduce certain standards for example monitoring, security, logging or how we tested.
In Nov 2020, Mustansar Anwar us Samad, one of our Senior Engineers discovered Backstage. I watched the Introduction video and in 3 minutes I was sold.
Backstage is a way to enable services in a way that also addressed the standards paradox. Because all Backstage Services were created from easily consumable templates it solves both challenges. The approach to standardising or boot strapping the developer eco system, and how the information flowed back into the Developer Lens simply blew my mind. Below is Spotify's implementation of Backstage - the Developer Lens.
I immediately saw how this could benefit us, and ultimately make our Developer squads even more efficient.
Our challenge however was that we were a team of 3 with relatively low profiles that "Just did Jenkins Stuff".
Game on.
The POC:
We set aside time to experiment and iterate on a Proof of Concept. Mustansar set aside to to build our first sample Template.
It was a Sample Java MicroServices (80% of our Dev Teams used Java). It had logging and metrics via Prometheus built in. We also created a standard Jenkins Pipeline in order to do a build, and then Deploy into our Dev Kubernetes Environment. This is itself likely warrants a completely seperate article.
In 2 weeks we delivered our first run.
It created the repo, built the container and deployed in 5 mins.
We achieved something in 5 mins that used to take up to 4 weeks depending on the maturity of the team. We knew we had to push harder to improve the Developer experience, and make our collective lives easier.
Mustansar and I split our focus, and 2 streams were formed, loosely named the Cheerleading stream (top down) and the Customer Steam (bottom up)
The Cheerleading Stream:
Domain Leads, Chapter Leads, Tribe Leads, and Chapter Area Leads were targeted, and then ultimately the CTO over a couple of months.
I was quite bold in stating that we had to do it. In my mind there was no negotiation. We did not want to be the "Jenkins Script Kiddies". We had to be the enablers of efficiencies for squads.
We had immediate support in some areas. The Chapter Area Lead especially immediately saw the benefit from an efficiency, governance and standardisation view point. It played into re-use across teams, lower technical dept, easier adoption, faster to market, faster to swarm and provided a common business language.
He was quick to challenge us in terms of our this would fit into our Existing ITSM practices based around Service Now.
Meanwhile in the Customer Stream...
Mustansar started the demos within a couple of squads. In one case we enabled them to create a service they had in their current sprint. We took our basic API template and created their service in a couple of minutes. The first comment was "what do we do now?". Our answer was "you start coding".
领英推荐
Too used to the wait times of previous ways of working, it was an eye opener for them that they started work immediately. We used this example in the Cheerleading I was doing and gave many more demos over the next few weeks. Squads then challenges us with how testing played into the templates, and how this could be automated. More on that shortly.
DORA and Visions.
Our worlds converged with respect the challenges presented back. It became apparent that the technology was only going to get us so far. We needed a better framework and operating model to align to. The DORA Framework (https://www.devops-research.com/research.html) was very quickly and deliberately adopted to ensure we aligned with an industry known standard. We didn't want to re-invent the wheel here. I loved DORA's approach with respect to understanding how DevOps dimensions play into squad performance. I loosely grouped the dimensions into 2 areas, Culture and Architecture.
It was very apparent at this stage we needed a holistic approach to ensure that Backstage could really shine. Lets take testing as an example. In order to drive efficiencies in deployment, we had to automate testing. This relied on ensuring we had contract boundaries clearly defined - which played into well articulated architecture. We also needed to ensure that our approach to testing shifted to the left, which in some cases was a Cultural issue. We revert back to our 2 areas for Success - Culture and Architecture. The problem therefore grew in scope, but it was absolutely needed for us to transform.
We created our vision for the changes we were looking to drive:
As a (Development) Squad, we want to deploy into prod, on a Friday, at the pub with a beer.
Deployments needed to be a non event. The governance and autonomy had to be part of that flow, and Trust was implied. We knew this was going to be a larger picture to sell.
How we started
To understand the Scope of the problem, we needed help. At the time, there were just 3 of us. Clearpoint were onboarded to run through an assessment, which then provided clear guidance on gaps. The Clearpoint Assessment was also based on the Dora Framework, which made the engagement perfect for us. These Gaps provided input into our Architectural Framework, and ultimately a roadmap for Delivering Value.
Our Roadmap outlined an approach to start which would focus on providing a small number of in demand templates - namely a REST API, Kafka to REST, and REST to Kafka, in order to enable our Dev Teams.
those templates had to have:
We succeed here in all aspects, our templates now do all of this in 60 seconds!
The Roadmap, then fed into the Business Case for Enablement. The Business Case was an overwhelming success in terms of approval because:
We had proven the Value, we had 8 successful template creations when this went for approval and our Dev squads had an overwhelming positive response to the direction. We understood the demand (100 services over 1 year). We built the foundations for Governance enforcement by Integrating through ServiceNow. We looked to also tackle Culture, Architecture and had a plan for change management.
Backstage was the key to enabling this. It shouldn't be underestimated that Backstage does require a bit of work. The flexibility, and architecture it provides means that it plays nicely with anything that has an API, and provides ultimate flexibility to get it right for your environment.
Things that "I wish I knew".
Testing automation needed the most attention
Testing is key to getting this right. We spent 3 months investing in the automated testing frameworks that were made part of Backstage Templates that meant Devs could test locally, and the same tests would run again at build time. Testing is now the first thing we think about for all new services.
Change Management was Under-estimated.
We scheduled 12 hours of training across all of IS, but this wasn't enough. We needed to spend more time on the journey for the customers, and we will look to get more involvement for future projects.
We under estimated the importance of Architectural.
We needed to get across Architecture early. Having clear boundaries through decoupled services, is the key in order to make testing especially work in our Templates.
Change is hard.
We did have limited success in some areas. Predominately this was because of the tech stack that was in play. In some cases it is a long play. We needed to work hard to ensure we had inputs at the architecture phase, and our EA's and SA's are awesome to work with. We are having some great successes here.
Whats next for us?
We are about to launch the next iteration of Template Creation. We will be focusing on our SFTP service, APIM Service, and Cloud Services. We will focus on embedding security across all layers. We will build an upgrade template feature that provides a path for older templates to take advantage of new features, and it also inherently plays into compliance. We continue to work with our EA's and Change Management experts to bring change across the business to ensure that we provide the best experience to our Squads.
Most Importantly we will continue to focus on our Dev Squads to ensure they have the best Development Experience.
I'll leave you with a quote from one of the Senior Devs enabled via Backstage
"Ive never deployed into Prod in 1 hour before, this is cool"
Engineering Leader
2 年This is another great article Matthew Law . Thanks for sharing. The Warehouse Group seems like a great place to get stuff done!
Facilitating Security Outcomes | Security Solutions Engineer @Sysdig
2 年I definitely get not wanting to be "Jenkins Script Kiddies", love it. This is an amazing post!
Building a better world through open collaboration
2 年This is awesome, we would love to turn this into a Cloud Native Computing Foundation (CNCF) project case study if you’re interested?
Chief Roadie
2 年"60 second bootstrap times for our Dev Squads for container based middleware components." ??