登录查看更多内容

True Names in Platform Engineering

Neil Laughlin

Vice President, Site Reliability Engineering at AuditBoard

发布日期: 2023年6月26日

Defining Platform Engineering

Across a series of SaaS companies of very different scale, I have seen a common pattern: one or more teams called Platform (or sometimes Core) Engineering with important-sounding but opaque charters, designed to make a broad swath of engineering efforts successful. As an engineering leader whose functional success depends on thoughtful approaches to service architecture for scale and reliability, I am a strong supporter of platform engineering investments, but have come to understand that using the term “platform” as a shortcut fails to be specific enough to convey to an organization the benefits having platform teams provide.

Per Gartner, “Platforms improve user experience by offering a curated set of tools and services, designed to present users with best-of-breed technical capabilities and highly optimized processes without end users having to create the operating platform for themselves.” Gartner’s research also calls out that platforms are created from a mix of hard technical requirements (“guardrails”) and softer recommendations (“guidelines”). This mix of approaches can often make the boundaries of a platform - or even the value it offers - unclear.

Working through what platform engineering means with a talented product management leader, I recognized a fundamental naming problem: calling something a platform without defining what the platform is for hides the value of the engineering work. To be clearer, I am challenging myself to be specific and name the platform(s), rather than indulging in laziness in naming and leaving what the platform actually does to be determined by the listener.

In leading Site Reliability Engineering teams, I have learned that it is particularly essential to understand what the critical shared capabilities powering the product experience are, because the shared services are often the best touch points for eliminating toil, enabling scale, and improving reliability. Site Reliability Engineering as a Google-originated concept is deeply rooted in the concept that good foundational service empower Google’s success: per Niall Murphy’s insightful LinkedIn article aimed at helping ex-Google SRE find roles after the 2023 Google layoffs, “We’ve not seen anything like Google’s commitment to shared infra and company-wide engineering solutions in other companies. [...] In Google, the conviction of success strongly related to using Google software to achieve answers to the complicated, large problems Google suffers from.” Consciously considering the set of platform capabilities that drive reliability makes the role of the reliability-focused, customer-facing engineering organization significantly easier.

A Pragmatic Taxonomy for Naming Platforms (and Their Owners)

At a high level, I am using the following mental model to come up with better names for foundational (see what word I didn’t use there) capabilities for a standard SaaS product - a multi-tenant public cloud-based product that ingests data, stores it reliably, transforms it as needed to fulfill customer needs, and presents it quickly to customers when requested or scheduled:

Foundational Compute Platform

Compute (Enough CPU and Memory)
Data (Right Data to Right Users at the Right Time)
Storage (Managing Ever-Growing Data)
Networking and Glue (Service Interoperability)

Customer Lifecycle Platform

User Experience and Reporting (UX Design and Business Intelligence)
Integrations and Data Transformations (Features)
Feature Licensing (Access to Features)
Tenant Management (Includes Data Migrations)
Sandbox/Trial/Demo Capabilities (Selling and Testing Features)
Performance and Resource Management*

Service Delivery and Reliability Platform

Infrastructure Provisioning (Rapid and Reproducible Infrastructure Stack Creation)
Build/Test/Deploy (CI/CD Ideally; Sometimes Called “DevOps Engineering”)
Performance and Resource Management*
Release Orchestration (Complex Release Management for Monoliths)
Observability (Insight into the Service, Including Monitoring and Cost-to-Serve)
Incident Response and Remediation (Tooling to Detect, Mitigate, and Learn From Failures)

Security Platform

Software Vulnerability Threat Detection and Patching
Activity-Based Threat Detection and Defense
Identity and Permissions Management (IAM)?
Secrets Management
Certificate Lifecycle Management

*: Performance and Resource Management stood out in discussions of this article as falling between the Customer Lifecycle Platform and the Service Delivery and Reliability Platform charters. I have seen it in both, and it often is essential to tying the success of the two types of platforms together. Wherever it is placed, it should have a strong cross-functional element to the work it does.

领英推荐

Q&A: Expert advice on getting started in platform…

GitHub 11 个月前

Mastering Platform Services

BBD 1 年前

Platform Engineering Conferences in 2025

XenonStack 1 个月前

At a small to mid-sized company, this list of four major categories might represent four “Platform Engineering” teams. At a large company, each of the bullets under the Platform header could be one or more unique teams, but with a shared identity and a well-understood related mission.

On Monoliths

After an early career blessed by many years spent growing up with Service-Oriented Architecture, in the last ten years I’ve seen more and more products built around the monolith design pattern. I am not inherently opposed to the monolith; I’ve also seen its opposite, the microservices big ball of mud, and its failed evolutionary attempt to cross the break-up-the-monolith chasm, the distributed monolith. For modern companies with the monolith in place already, I offer this strongly-emphasized advice:?

Treat the monolith as a standalone platform.?

The monolithic design pattern is adopted because it originally makes it easy to get code to production, because it represents the all-in-one path out to customers. The monolith embodies the early-maturity company’s approach to platform engineering, fulfilling the Gartner declaration that “The overarching goal of the platform is enhancing user productivity.” Monoliths are platforms for easy deployment of new code, utilized by many engineering teams simultaneously. What they are not, however, is safe or guaranteed to be performant, because like any unmanaged shared resource, they are prone to the tragedy of the commons: with everyone sharing ownership of the resource, there are few natural checks in place to drive fairness in its use.?

Rather than purely using process in place to protect shared resources, enlightened engineering management should remember the principles of Google SRE, paraphrased as “Operational toil is an unsolved engineering problem” and put one or more engineering teams in charge of keeping a monolith viable, performant, decoupled from complex microservice versioning requirements, and generally run as a fully owned and managed service with the same expectations of reliability and efficiency as if it were a standalone service or microservice.?

Conclusion

Platforms are powerful capabilities for enabling productivity and innovation by reducing the cognitive load on the broader set of developers in a company. However, they are at their best when named clearly, designed with clearly defined purpose and boundaries, managed with business goals in line, and treated as products that benefit both internal and external users. By taking the time to go beyond the abstract name “platform engineering,” the benefits conveyed by creating services that enable developer productivity are clarified for the benefit of the entire company.

Additional References and Related Articles

Delory, P.,? Matvitskyy, O.. “Top Strategic Technology Trends for 2023: Platform Engineering”, 17 October 2022,.Gartner ID G00774324

Majors, C. “The Future of Ops Is Platform Engineering”, 30 September 2022, Honeycomb.io blog (not cited directly)

Murphy, N. “SRE in the Real World”, 04 Mar 2023, LinkedIn Article

Amal Kiran

Building Temperstack | Full stack AI Agent for Software Reliability

1 年

Neil, ??

Kimberly Lowe-Williams

Strategic Engineering Leader with Deep Experience in SRE, Cloud, and Incident Management | Passionate About Scaling Teams, Driving Operational Excellence & Transforming Organizations

1 年

Still learning from you! Thanks so much for sharing.

1 次回应

Walt Zimmerman

Director of Engineering @ AuditBoard

1 年

Fantastic read!

1 次回应

Neil Laughlin

Vice President, Site Reliability Engineering at AuditBoard

1 年

Puneet Kandhari pointed me at this intriguing link today, which looks very much in line with addressing the same naming challenge. Popping it in here for easy discovery and future integration: https://tag-app-delivery.cncf.io/whitepapers/platforms/.

1 次回应

Alison Angelos

Regional Vice President of Customer Success at Salesforce

1 年

Hey Neil I really enjoyed this. As you know I'm FAR from an engineer - 'platform' or otherwise :) But as someone who works with engineering leaders, this line really resonates "I recognized a fundamental naming problem: calling something a platform without defining?what the platform is for?hides the value of the engineering work." PS sorry to see you have left Salesforce! Best of luck in the new role. Paul Rashleigh Yogesh sharma I think you'll find this a good read

2 次回应

查看更多评论

要查看或添加评论，请登录

Neil Laughlin的更多文章

Five Tips for New Job Seekers, Leveraging LinkedIn

2020年4月16日

Five Tips for New Job Seekers, Leveraging LinkedIn

I recently had the opportunity to get involved in my employer’s amazing university recruiting program. I’m looking…

4 条评论
A Modest Gauge for DevOps Maturity

2019年5月31日

A Modest Gauge for DevOps Maturity

Gauging your organization's readiness to adopt DevOps practices is a big, lucrative consulting opportunity for a…

1 条评论
Pain-Free Software Engineering Job Descriptions: Do Right

2019年3月1日

Pain-Free Software Engineering Job Descriptions: Do Right

I’m a hiring manager for a software company, and like every other hiring manager looking for software engineers, I want…
Pain-Free Software Engineering Job Descriptions: On Seniority

2018年11月29日

Pain-Free Software Engineering Job Descriptions: On Seniority

I’m a hiring manager for a software company, and like every other hiring manager looking for software engineers, I want…

3 条评论
Pain-Free Software Engineering Job Descriptions: Roles

2018年11月12日

Pain-Free Software Engineering Job Descriptions: Roles

I’m a hiring manager for a software company, and like every other hiring manager looking for software engineers, I want…

2 条评论
Pain-Free Software Engineering Job Descriptions: The Basics

2018年10月30日

Pain-Free Software Engineering Job Descriptions: The Basics

I’m a hiring manager for a software company, and like every other hiring manager looking for software engineers, I want…
Accelerating Decision-Making within Site Reliability Engineering

2018年6月28日

Accelerating Decision-Making within Site Reliability Engineering

In my last blog article, I shared my thoughts on books and movies that convey parts of the experience of being a Site…

3 条评论
Understanding Site Reliability Engineering through Movies and Books

2018年1月2日

Understanding Site Reliability Engineering through Movies and Books

In the past, when asked to explain what Site Reliability Engineering is, I found I sometimes covered the plain facts of…

11 条评论
You're My Manager. What Are You For?

2017年12月9日

You're My Manager. What Are You For?

What is the best question you’ve ever asked your manager? As the leader of a Site Reliability Engineering (SRE)…

6 条评论

See all articles

True Names in Platform Engineering

Neil Laughlin

Vice President, Site Reliability Engineering at AuditBoard

Defining Platform Engineering

A Pragmatic Taxonomy for Naming Platforms (and Their Owners)

领英推荐

On Monoliths

Conclusion

Additional References and Related Articles

Neil Laughlin的更多文章

社区洞察

其他会员也浏览了

Platform vs. DevEx teams: What’s the difference?

SRE-Cheat-Sheet

Cultural Change in Engineering: Why SREs are Essential

Platform Engineering : Understanding its Relevance and Application

Platform Engineering - The Backbone of Modern Software Systems

The Hype About Platform Engineering: Echoes of the SRE Revolution

Meet Pete Bevan, our Chief Platform Engineer

Transform People and Practices to Become a World-Class Digital Engineering Organization

How Platform Engineering Addresses Modern IT Challenges

#19: What will platform engineering look like in 2023? ??

Defining Platform Engineering

A Pragmatic Taxonomy for Naming Platforms (and Their Owners)

领英推荐

On Monoliths

Conclusion

Additional References and Related Articles

Neil Laughlin的更多文章

Five Tips for New Job Seekers, Leveraging LinkedIn

A Modest Gauge for DevOps Maturity

Pain-Free Software Engineering Job Descriptions: Do Right

Pain-Free Software Engineering Job Descriptions: On Seniority

Pain-Free Software Engineering Job Descriptions: Roles

Pain-Free Software Engineering Job Descriptions: The Basics

Accelerating Decision-Making within Site Reliability Engineering

Understanding Site Reliability Engineering through Movies and Books

You're My Manager. What Are You For?

社区洞察

其他会员也浏览了

Platform vs. DevEx teams: What’s the difference?

SRE-Cheat-Sheet

Cultural Change in Engineering: Why SREs are Essential

Platform Engineering : Understanding its Relevance and Application

Platform Engineering - The Backbone of Modern Software Systems

The Hype About Platform Engineering: Echoes of the SRE Revolution

Meet Pete Bevan, our Chief Platform Engineer

Transform People and Practices to Become a World-Class Digital Engineering Organization

How Platform Engineering Addresses Modern IT Challenges

#19: What will platform engineering look like in 2023? ??