True Names in Platform Engineering
Defining Platform Engineering
Across a series of SaaS companies of very different scale, I have seen a common pattern: one or more teams called Platform (or sometimes Core) Engineering with important-sounding but opaque charters, designed to make a broad swath of engineering efforts successful. As an engineering leader whose functional success depends on thoughtful approaches to service architecture for scale and reliability, I am a strong supporter of platform engineering investments, but have come to understand that using the term “platform” as a shortcut fails to be specific enough to convey to an organization the benefits having platform teams provide.
Per Gartner, “Platforms improve user experience by offering a curated set of tools and services, designed to present users with best-of-breed technical capabilities and highly optimized processes without end users having to create the operating platform for themselves.” Gartner’s research also calls out that platforms are created from a mix of hard technical requirements (“guardrails”) and softer recommendations (“guidelines”). This mix of approaches can often make the boundaries of a platform - or even the value it offers - unclear.
Working through what platform engineering means with a talented product management leader, I recognized a fundamental naming problem: calling something a platform without defining what the platform is for hides the value of the engineering work. To be clearer, I am challenging myself to be specific and name the platform(s), rather than indulging in laziness in naming and leaving what the platform actually does to be determined by the listener.
In leading Site Reliability Engineering teams, I have learned that it is particularly essential to understand what the critical shared capabilities powering the product experience are, because the shared services are often the best touch points for eliminating toil, enabling scale, and improving reliability. Site Reliability Engineering as a Google-originated concept is deeply rooted in the concept that good foundational service empower Google’s success: per Niall Murphy’s insightful LinkedIn article aimed at helping ex-Google SRE find roles after the 2023 Google layoffs, “We’ve not seen anything like Google’s commitment to shared infra and company-wide engineering solutions in other companies. [...] In Google, the conviction of success strongly related to using Google software to achieve answers to the complicated, large problems Google suffers from.” Consciously considering the set of platform capabilities that drive reliability makes the role of the reliability-focused, customer-facing engineering organization significantly easier.
A Pragmatic Taxonomy for Naming Platforms (and Their Owners)
At a high level, I am using the following mental model to come up with better names for foundational (see what word I didn’t use there) capabilities for a standard SaaS product - a multi-tenant public cloud-based product that ingests data, stores it reliably, transforms it as needed to fulfill customer needs, and presents it quickly to customers when requested or scheduled:
Foundational Compute Platform
Customer Lifecycle Platform
Service Delivery and Reliability Platform
Security Platform
*: Performance and Resource Management stood out in discussions of this article as falling between the Customer Lifecycle Platform and the Service Delivery and Reliability Platform charters. I have seen it in both, and it often is essential to tying the success of the two types of platforms together. Wherever it is placed, it should have a strong cross-functional element to the work it does.
领英推荐
At a small to mid-sized company, this list of four major categories might represent four “Platform Engineering” teams. At a large company, each of the bullets under the Platform header could be one or more unique teams, but with a shared identity and a well-understood related mission.
On Monoliths
After an early career blessed by many years spent growing up with Service-Oriented Architecture, in the last ten years I’ve seen more and more products built around the monolith design pattern. I am not inherently opposed to the monolith; I’ve also seen its opposite, the microservices big ball of mud, and its failed evolutionary attempt to cross the break-up-the-monolith chasm, the distributed monolith. For modern companies with the monolith in place already, I offer this strongly-emphasized advice:?
Treat the monolith as a standalone platform.?
The monolithic design pattern is adopted because it originally makes it easy to get code to production, because it represents the all-in-one path out to customers. The monolith embodies the early-maturity company’s approach to platform engineering, fulfilling the Gartner declaration that “The overarching goal of the platform is enhancing user productivity.” Monoliths are platforms for easy deployment of new code, utilized by many engineering teams simultaneously. What they are not, however, is safe or guaranteed to be performant, because like any unmanaged shared resource, they are prone to the tragedy of the commons: with everyone sharing ownership of the resource, there are few natural checks in place to drive fairness in its use.?
Rather than purely using process in place to protect shared resources, enlightened engineering management should remember the principles of Google SRE, paraphrased as “Operational toil is an unsolved engineering problem” and put one or more engineering teams in charge of keeping a monolith viable, performant, decoupled from complex microservice versioning requirements, and generally run as a fully owned and managed service with the same expectations of reliability and efficiency as if it were a standalone service or microservice.?
Conclusion
Platforms are powerful capabilities for enabling productivity and innovation by reducing the cognitive load on the broader set of developers in a company. However, they are at their best when named clearly, designed with clearly defined purpose and boundaries, managed with business goals in line, and treated as products that benefit both internal and external users. By taking the time to go beyond the abstract name “platform engineering,” the benefits conveyed by creating services that enable developer productivity are clarified for the benefit of the entire company.
Additional References and Related Articles
Delory, P.,? Matvitskyy, O.. “Top Strategic Technology Trends for 2023: Platform Engineering”, 17 October 2022,.Gartner ID G00774324
Majors, C. “The Future of Ops Is Platform Engineering”, 30 September 2022, Honeycomb.io blog (not cited directly)
Murphy, N. “SRE in the Real World”, 04 Mar 2023, LinkedIn Article
Building Temperstack | Full stack AI Agent for Software Reliability
1 年Neil, ??
Strategic Engineering Leader with Deep Experience in SRE, Cloud, and Incident Management | Passionate About Scaling Teams, Driving Operational Excellence & Transforming Organizations
1 年Still learning from you! Thanks so much for sharing.
Director of Engineering @ AuditBoard
1 年Fantastic read!
Vice President, Site Reliability Engineering at AuditBoard
1 年Puneet Kandhari pointed me at this intriguing link today, which looks very much in line with addressing the same naming challenge. Popping it in here for easy discovery and future integration: https://tag-app-delivery.cncf.io/whitepapers/platforms/.
Regional Vice President of Customer Success at Salesforce
1 年Hey Neil I really enjoyed this. As you know I'm FAR from an engineer - 'platform' or otherwise :) But as someone who works with engineering leaders, this line really resonates "I recognized a fundamental naming problem: calling something a platform without defining?what the platform is for?hides the value of the engineering work." PS sorry to see you have left Salesforce! Best of luck in the new role. Paul Rashleigh Yogesh sharma I think you'll find this a good read