Learning From "Software Engineering at Google", Part 1: Culture

Aliakbar Abbasi

Senior Full Stack Developer

发布日期: 2020年8月10日

Undoubtedly, Google is one of the best software engineering companies that turned computer programming and the Internet into a lucrative business, on an unprecedented scale. Besides the Google's services that shape our lives, it left a cultural and technological trajectory behind it that is the essence of more than 20 years of engineering activities, ideas of some smart people, and continuous endeavor to be faster, stronger, and better. It's why we should learn from "Software Engineering at Google".

This is the first part of a three-part summary of a book titled "Software Engineering at Google: Lessons Learned from Programming Over Time". There are three major parts in the book: Culture (part II), Processes (part III), and Tools (part IV). Each chapter of the book is an article on a topic of software engineering written by one or more experienced "Googlers". It can give us some engineering ideas to learn, and some metrics to assess the culture, processes, and tools used by us in other companies.

I spent hours studying the book cover-to-cover and tried my best to present the essentials of the book in a way that is as short as possible but also useful per see. I added my own experience at the end of some parts under IMHO (In My Humble Opinion). Please fill free to ignore IMHOs and please buy and read the book if you found the topic useful and interesting.

Chapter 1. What Is Software Engineering?

This chapter is a standalone part of the book that I find useful to put it here. It discusses the authors' thesis about the difference between programming as a code production activity and SE (software engineering) as an activity of maintaining software during its lifespan, in a way that it remains both modifiable and scalable. So, time, scale, and trade-offs differentiated programming from SE.

The main responsibility of SE is to develop sustainable software. The software is sustainable if during its lifespan you can change it in response to changes in the technologies or for business reasons. Based on Google's experience, if the software is going to live for more than a few days (or if it's not for fun, not a college project and not to show your friends that you are smart!), you should prepare for the changes. For this reason, your software, your company, and your company's policies must be scalable.

This chapter has some examples of policies that are not scalable and solutions Google has for them. For example, making markers freely available to everyone, and using a distributed build system instead of local builds to save developers time, are two of the examples in this chapter that show how Google uses scalable policies, processes, and tools. To make these kinds of decisions, Google makes tradeoffs based on data.

IMHO: I shared my view on the difference between software engineering and programming in another article years ago. Though, my viewpoint was different and less credible, surely. You can find it here: https://www.dhirubhai.net/pulse/software-engineers-computer-programmers-successful-team-abbasi/

Chapter 2. How to Work Well on Teams

Software engineering is teamwork. The essence of this chapter is that to be successful in software engineering you should first focus on yourself by applying core principles of humility, respect, and trust.

The authors believe that people are afraid of being judged based on their unfinished or unperfect code and try to find a way to hide their code because of insecurity. This insecurity comes from the Genius Myth which is a tendency to ascribe the success of a team to a person and idolize him/her. The heroic picture of a genius programmer staying in the basement of his/her parents' house, for just three months, and coming out with a masterpiece of working and precious software, that stuns everybody and shakes the industry, comes from this myth.

Other than the Genius Myth and the insecurity coming from it, software engineers like to keep their ideas a secret before finishing work on it because they fear their idea be copied by others. But this hiding is harmful because you can be on the wrong track. Hiding is dangerous and we should encourage knowledge-sharing, early and frequent feedback, and teamwork.

To apply humility, respect, and trust in practice, we should lose our ego and not behave like we are always the most important person in the room. In other words, we should be humble at the same time that we have self-confidence. One solution is to work on the team's identity and pride instead of personal identity. Furthermore, we should learn to give and take criticism and keep in mind that "we are not our code" when faced with criticism from others. The choice of language is also important when giving someone our criticism.

Furthermore, we should note that "failure is an option" and we must try to fail fast and iterate to reach success. To support this, a blameless culture is necessary. Google worked hard to create such a culture. Also, Google has a process to document what they learn from each mistake in a document named "postmortem" with a well-defined structure. To embrace failure, being patient and open to influence is necessary.

Google defines a term, "Googleyness", that is the set of attributes and behaviors that they look for in candidates, which represent strong leadership and exemplify humility, respect, and trust, as below:

Thrives in ambiguity: Can deal with conflicting messages or directions, build consensus, and make progress against a problem, even when the environment is constantly shifting.
Values feedback: Has humility to both receive and give feedback gracefully and understands how valuable feedback is for personal (and team) development.
Challenges status quo: Is able to set ambitious goals and pursue them even when there might be resistance or inertia from others.
Puts the user first: Has empathy and respect for users of Google’s products and pursues actions that are in their best interests.
Cares about the team: Has empathy and respect for coworkers and actively works to help them without being asked, improving team cohesion.
Does the right thing: Has a strong sense of ethics about everything they do; willing to make difficult or inconvenient decisions to protect the integrity of the team and product.

Chapter 3. Knowledge Sharing

Your organization understands your problem domain better than some random person on the Internet and should be able to answer most of its own questions. To achieve that, you need both experts that know the answers to those questions and mechanisms to distribute their knowledge. These mechanisms range from direct questioning to talks, courses, and written documents in any form (code documentation, tutorials, manuals, howtos, and so on).

Code is an important output but only a small part of building products. An organization’s success depends on growing and investing in its people. In other words, investing in engineers' learning and knowledge sharing. Personalized one-to-one advice is invaluable but documented knowledge is more scalable.

Your organization needs a culture of learning and these are the challenges to learning:

lack of psychological safety
information islands, which shows itself in different forms: information fragmentation, information duplication, and information skew
single points of failure, "all or nothing" expertise
parroting or mindlessly copying patterns or code without understanding
haunted graveyards, which are parts of code that nobody dares to touch

One side of knowledge sharing is learning from others. The first way of learning is by asking questions. Also, we should understand the context of any decision made to be able to understand and use it. Consider the principle of "Chesterson’s fence": before removing or changing something, first understand why it’s there. Asking direct questions can be followed by asking the community, writing down the answers if not documented, and sharing them with others. Also, group chats, mailing lists, and other platforms play an important role in learning, even though they are less structured and harder to use.

On the other side, we should share our knowledge to be used by others. Having classes, talks, and written documentation are great ways to scale our knowledge sharing. The first time you learn something is the best time to see ways that the existing documentation and training materials can be improved. Everybody must be able to read any document in the company and contribute to them and fix them if necessary. The company must recognize the value and incentivize documentation because the writer often has no direct benefit of writing the document.

Code and its documentation is another medium for knowledge sharing. It can be completed with the code review process and another Google-specific process named "readability process", discussed in another chapter.

Documentation is tough for software engineers so they must learn how to do that and the company must facilitate the process as much as possible. Google tried to treat the documentation like code by putting them on version control systems (markup languages made it possible), reporting their errors like bugs and fixing them, and reviewing important ones. Google makes sure that there is just one canonical source of information for each topic and prevents wasting time finding answers or duplicating documentation.

IMHO: I found technical writing skills very important and rewarding for software engineers. A good piece of well-structured and maintained documentation is as valuable as a piece of code, if not more. Based on my own experience, markup languages are very good for programmers and I enjoy using them. Using them makes it possible to put documentation very close to code and treat both the same way which is essential to prevent information to go out of date. Also, I find the idea of classifying documents for internal and external usage, useful.

In some cases, I realized that the lack of overall vision in the documentation process is a big problem. We should determine the scope of documentation including the audience, plan for it, structure it perfectly, and maintain it just like code. A pile of wiki pages on Confluence is not a good option!

Chapter 4. Engineering for Equity

Designing and developing software for a diverse user base is challenging, even for Google. Based on this chapter, they failed in some cases. Companies must focus on users of different nationalities, ethnicities, races, genders, ages, socioeconomic statuses, abilities, and belief systems. Because of the hidden biases, even the most talented engineers will fail to design and develop for everybody and it can lead to dissatisfaction or harm.

One way to address this problem is to help the software engineering organization itself looks like the populations for whom it builds products, by hiring people with different backgrounds and from diverse groups. Based on Google's experience, it's not enough and every individual engineer needs to learn how to build for all users. They believe that being an exceptional engineer requires that you also focus on bringing diverse perspectives into product design and implementation. It means that Googlers involved at hiring must try to build a more representative workforce and the organization itself must provide a truly supportive work environment for all people.

This problem is complex and multifactorial. Googlers think that they cannot accept solutions that present a single philosophy or methodology for fixing inequity in the technology sector. A common methodology today is to build for the majority use-case first, leaving improvements and features that address edge-cases for later. Apparently, this is not working. On the other hand, product velocity must be evaluated against providing a product that is truly useful to all users. It’s better to slow down than to release a product that might cause harm to some users.

Chapter 5. How to Lead a Team

SE is a team endeavor and cannot be done without leaders. In this regard, Google has two roles: Technical Lead (TL) and Manager. They have different responsibilities and need different skills; even though some of the required skills are similar. TL and specifically Manager can come from outside of the team but a person in the team also can be promoted to these roles. A person can be in both roles, which makes Tech Lead Manager (TLM). At Google, they prefer managers/leaders with an engineering background.

A Manager is responsible for the performance, productivity, and happiness of every person on their team—including their tech lead—while still making sure that the needs of the business are met with the product. It can be challenging when people aspects and project/product aspects don't align.

TL, while reporting to the Manager, is responsible for technical aspects of the product, including technology decisions and choices, architecture, priorities, velocity, and general project management.

If a software engineer in the team, even unintentionally, steps up in helping the team resolve conflicts, make decisions, and coordinate people, he/she is a good candidate for Manager/TL roles. Software engineers in these roles can be more helpful because they can scale themselves and they might be very good at these roles while they didn't know it before. Though, these roles need different skills. For example, "influencing without authority" is one of the most powerful leadership traits needed for these roles.

Some engineers don't like to be involved in these roles, especially the Manager role, for different reasons. Some reasons are:

In these roles, you have to spend less time coding.
Quantifying management work is difficult.
You have a bad experience with people in these roles.

The last reason is explained very well with "Peter Principle," which states that "In a hierarchy, every employee tends to rise to his level of incompetence." Google generally avoids this by requiring that a person perform the job above their current level for a period of time before being promoted.

Furthermore, the idea of being a "servant leader" helps in this area because people forget about themselves managed by awful people and they repeat the same mistakes while trying to apply their management authority; mistakes such as micromanaging, ignoring low performers, and hiring pushovers. The advice here is to "Above all, resist the urge to manage." Remind that the most important thing you can do as a leader is to serve your team.

Traditional managers worry about how to get things done, whereas great managers worry about what things get done and trust their team to figure out how to do it. For being successful in Manager/TL roles we should stick to cultural principles of humility, respect, and trust. We should remember that "failure is an option". If an individual succeeds, we should praise them in front of the team. If an individual fails, we should give constructive criticism in private.

These are some ANTIPATTERNS in Manager/TL roles:

Hire pushovers, people who are NOT smarter or more ambitious than you and you can push around.
Ignore low performers and let them ruin your team (Read the book to see how to behave in this case.)
Ignore human issues.
Be everyone's friend.
Compromise the hiring bar when you want to hire quickly.
Treat your team like children.

These are POSITIVE PATTERNS in Manager/TL roles:

Lose the ego. Apologize when you make mistake.
Be a zen master by mediating your reactions and maintaining your calm.
Be a catalyst.
Remove roadblocks.
Be a teacher and a mentor.
Set clear goals.
Be honest.
Track happiness.

For more explanation of these positive patterns and antipatterns and for more tips, see the book.

IMHO: There are similarities and differences between these two roles (Manager/TL) and Scrum Master/Product Owner roles in Scrum. Scum Master is more involved in people's side of the team and needs less technical knowledge compared to Manager at Google. Product Owner is more different than TL because the Product Owner is more like a proxy between users/customers (answering WHAT and WHEN questions) but TL is actively participating in designing and developing software and high-level decision making (answering HOW questions).

Chapter 6. Leading at Scale

As you go up in the management/leadership ladder, best practices of the previous chapter still apply. You're a "servant leader", but for a larger group and at a higher level. Google's advice for leaders at scale is "the three Always of leadership": Always Be Deciding, Always Be Leaving, Always Be Scaling.

There is no magic answer for the ambiguous problems; they’re all about finding the right trade-offs of the moment and iterating this decision-making process. So, as a higher-level manager/leader always be deciding. Your job is to build an organization that automatically solves a class of ambiguous problems, over time, without you needing to be present. So, always be leaving. It means to manage/lead in a way that you can leave any time and the team can continue in the right direction. Furthermore, success generates more responsibility over time, and you must proactively manage the scaling of your job. So, you should always be scaling and you should protect your scarce resources: personal time, attention, and energy.

To manage at a higher level, you need to identify the blinders (things that blind the team); next, you need to identify the trade-offs; and then you need to decide and iterate on a solution. With fresh eyes, you can see these blinders, ask questions, and then consider new strategies. There’s no answer that works forever in all situations. There is only the best answer for the moment, and it almost certainly involves making trade-offs in one direction or another. It’s your job to call out the trade-offs, explain them to everyone, and then help decide how to balance them. It’s an iterative process. This is what "Always Be Deciding" means.

There’s a risk here. If you don’t frame your process as a continuous rebalancing of trade-offs, your teams are likely to fall into the trap of searching for the perfect solution, which can then lead to what some call "analysis paralysis." It’s not just your job to solve an ambiguous problem, but to get your organization to solve it by itself, without you present. The antipattern here, of course, is a situation in which you’ve set yourself up to be a single point of failure (SPOF). Googlers have a term for that, "the bus factor": the number of people that need to get hit by a bus before your project is completely doomed; the higher this number, the better.

There are three main parts to constructing this sort of self-sufficient group: dividing the problem space, delegating subproblems, and iterating as needed. A common mistake is to put a team in charge of a specific product rather than a general problem. A product is a solution to a problem. Instead, you should assign a general problem to a team.

To always be scaling, your most precious resource is your limited pool of time, attention, and energy. As you moved into leadership, though, you might have noticed that your job became less proactive and more reactive. You should use best practices to remain effective. One idea is distinguishing between important and urgent things. So, you can force yourself to work mostly on important things, rather than urgent things. Here are a few key techniques: delegate, schedule dedicated time, and find a tracking system that works. Also, you should learn to drop balls (prioritize things and deliberately don't react to some of them) and protect your energy by having real vacations, weekends, times to disconnect, and so on.

Chapter 7. Measuring Engineering Productivity

Google is a data-driven company and tries to make most of the decisions in an objective way rather than a subjective way. When human factors involved, however, analyzing data is difficult. Google found that having a team of engineering productivity specialists is more efficient than relying on each team to do productivity measurement, analysis, and decision making. This team contains generalist software engineers and social scientists from a variety of fields, including cognitive psychology and behavioral economics.

Before measuring productivity, the engineering productivity specialists team asks questions from the stakeholders to be sure that the effort worth it. They ask whether the result is actionable, regardless of the result is positive or negative. If you can’t do anything with the result, it is likely not worth measuring. They also ask people to describe what they want to measure in the form of a concrete question. They found that the more concrete people can make this question, the more likely they are to derive benefit from the process. Note that when you are successful at measuring your software process, you aren’t setting out to prove a hypothesis correct or incorrect. Here, success means giving a stakeholder the data they need to make a decision.

For these kinds of researches, Google selects meaningful metrics using the Goals/Signals/Metrics (GSM) framework. A goal is one desired end result. It’s phrased in terms of what you want to understand at a high level and should not contain references to specific ways to measure it. A signal is how you might know that you’ve achieved the end result. Signals are things we would like to measure, but they might not be measurable themselves. A metric is a proxy for a signal. It is the thing we actually can measure. Some signals don't have any measurable metric (we should ignore them) and some have multiple. A good metric is a reasonable proxy to the signal you’re trying to measure, and it is traceable back to your original goals.

Select metrics that cover all parts of productivity. By doing this, you ensure that you aren’t improving one aspect of productivity (like developer velocity) and the cost of another (like code quality). Their research team divides productivity into five core components named QUANTS: quality of the code, attention from engineers, intellectual complexity, tempo and velocity, and satisfaction.

Some of these signals might be measurable by analyzing tools and code logs. Others are measurable only by directly asking engineers. Qualitative metrics are also metrics. Consider having a survey mechanism for tracking metrics about engineers’ beliefs. Qualitative metrics should also align with the quantitative metrics; if they do not, it is likely the quantitative metrics that are incorrect.

After the analysis, aim to create recommendations that are built into the developer workflow and incentive structures. Even though it is sometimes necessary to recommend additional training or documentation, change is more likely to occur if it is built into the developer’s daily habits. The above-mentioned team at Google always prepares a list of recommendations for how they can continue to improve. Google believes that the engineers will make the appropriate trade-offs if they have the proper data available and the suitable tools at their disposal.

IMHO: Doing these kinds of research is hard and costly for small and medium companies. I think in those companies, we should trust researches and best-practices provided by large companies like Google.

Last Words

It was the essence of part II, Culture (with one chapter of part I). The book is full of case studies, explanations, and other interesting materials that worth reading it thoroughly. I could not cover all the material because of the limited space. I just tried to give you an overall view of the chapters and the type of material you can find in the book. I hope it encourages you to buy this amazing book and read it.

Learning From "Software Engineering at Google", Part 1: Culture

Aliakbar Abbasi

Senior Full Stack Developer

Chapter 1. What Is Software Engineering?

Chapter 2. How to Work Well on Teams

Chapter 3. Knowledge Sharing

Chapter 4. Engineering for Equity

Chapter 5. How to Lead a Team

Chapter 6. Leading at Scale

Chapter 7. Measuring Engineering Productivity

Last Words

更多精彩文章

社区洞察

Chapter 1. What Is Software Engineering?

Chapter 2. How to Work Well on Teams

Chapter 3. Knowledge Sharing

Chapter 4. Engineering for Equity

Chapter 5. How to Lead a Team

Chapter 6. Leading at Scale

Chapter 7. Measuring Engineering Productivity

Last Words

The Story of TeFrame

2019年9月18日

This Code Cannot Be Refactored!

2018年11月22日

Profiling a Containerized Python Service with Pyflame

2018年4月29日

Software Engineers + Computer Programmers = Successful Team

2018年4月5日

Microservices, am I able to use it?

2018年3月26日

Philosophy of Exception Handling

2017年2月10日

社区洞察