DevOps: Cultural Norms and Technical Practices
Source: Canvas (Edited by the Author)

DevOps: Cultural Norms and Technical Practices

Summary

Companies that are able to develop products quickly and with high software quality achieve success in the market. However, most of them are unable to implement changes in the production system in minutes or hours, as it is necessary to interrupt the system to perform updates. In light of this, there is an inherent conflict between IT development and operations, known as the basic chronic conflict. Thus, new methods that help organizations manage changes in the system are useful and necessary. This article presents DevOps, a set of cultural norms and technical practices aimed at increasing a company's ability to deliver services at high speed, as well as promoting communication, integration, and automation between developers and system operators, reducing barriers and increasing the efficiency of the software development process.

Introduction

Software has become a crucial asset in most companies and daily human activities, as products and services rely on software that offers greater security and reliability during operations. Therefore, companies that have the ability to develop products quickly and with high quality achieve better results in the market.

However, almost every tech company faces an inherent conflict between IT development and operations, known as the basic chronic conflict, which results in increasingly longer times to market new products and features (KIM, HUMBLE, and DEBOIS, "et al", 2018).

In this context, a model called DevOps emerged. The term appeared in the market in late 2008, with its essential principles being the involvement of the IT function in every phase of the system development flow, high dependence on automation compared to human efforts, and the application of engineering practices and tools in operational tasks. The use of DevOps enables Development, QA (Quality Assurance), IT Operations, and InfoSec to work together, not only to assist each other but also to ensure the organization's overall success (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik (Edited by the Author)

To enable constant changes in a production environment, the adoption of cultural norms, technical practices, and architectures is necessary, allowing the maintenance of a fast workflow between development and operations without causing chaos and disruption in the production environment.

Therefore, this article aims to present the key concepts of DevOps, a set of cultural norms and technical practices. Additionally, it also presents the benefits and proof of concept of this model, through reports and research conducted by renowned and widely recognized companies.

Definition of DevOps

DevOps consists of cultural norms, technical practices, and architectures that increase a company's ability to deliver systems and services, as well as optimize and refine products at a faster pace compared to companies using traditional software development processes and infrastructure management. This agility enables companies to better serve their customers and compete more effectively in the market (AWS, 2023).

Source: Freepik

The DevOps practices represent the convergence of philosophical and managerial movements, based on change management that provides greater quality, reliability, stability, security, and lower costs. Additionally, it increases speed and confidence throughout the entire technological value stream, including Product Management, Development, QA, IT Operations, and Infosec (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Problem Statement

Most companies are unable to implement changes in the production system in minutes or hours; instead, they end up taking weeks or months to complete the process (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Production deployments result in interruptions due to conflicts between teams, as on one side, the software development team aims to launch updates with new features as quickly as possible into the system, while on the other side, the operations team aims to maintain the reliability, stability, and security of the system.

Source: Freepik (Edited by the Author)

As most interruptions are caused by updates, the objectives of the teams are constantly at odds. According to Kim, Humble, and Debois (2018), this problem is called the basic chronic conflict and brings a series of disadvantages and consequences such as (Kim, Humble, and Debois, 2018; Murphy, Beyer, and Jones, 2016):

  • Long lead time for implementing new products and features in a competitive market that demands short time-to-market;
  • Increased system downtime that continues to grow due to the rising complexity and volume of traffic;
  • High costs due to the proportional increase in manual system administrator demands, as well as the rising complexity and volume of traffic.

These disadvantages highlight how the basic chronic conflict between development and operations can negatively impact the company, limiting its ability to quickly adapt to market changes and efficiently meet customer demands.

DevOps Culture

The transition process to DevOps requires a significant shift in culture and mindset. At its core, the main goal of DevOps is to eliminate the barriers that exist between two traditionally siloed teams: development and operations (AWS, 2023).

While many DevOps patterns require automation and tools, the concept also demands cultural norms and an architecture that enables achieving common goals within a technology company. As Christopher Little, a technology executive and one of the pioneers in DevOps adoption, stated:

"DevOps is not about automation, just as astronomy is not about telescopes".?

Key contexts where DevOps culture is significant include high-trust management, servant leadership, and organizational change management. The result is high quality, reliability, stability, and security at decreasing costs and effort (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Structure and Organization

The way teams are organized affects how work is executed. This was noted by Computer Scientist Dr. Melvin Conway in 1968 through observation in a famous experiment. His observation gave rise to what is now known as Conway's Law, which states:

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations."

In other words, the communication and organization structure of a team or company tends to be reflected in the architecture of the systems they produce.

To achieve a fast workflow between development and operations, maintaining high quality and achieving excellent results, teams and work should be organized according to Conway's Law (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik

In theory, teams are designed to be multifunctional and independent - capable of designing, building, and delivering new features and functionalities to the production environment without relying on manual interventions or other teams (KIM, HUMBLE, and DEBOIS, "et al", 2018).

These teams are small and bring together engineers with diverse skills, including Development, Operations, QA, and InfoSec. This composition makes the teams autonomous, avoiding the need to request support from other groups (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Communication and Collaboration

Promoting communication and collaboration is one of the key cultural aspects of DevOps. Teams establish strong cultural guidelines regarding information sharing, using chat applications, organizational issue or project tracking systems, and wikis (AWS, 2023).

This strategy allows for new discoveries to be incorporated into the organization's collective knowledge, expanding its impact. Integration is achieved through active and comprehensive communication of new knowledge (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik (Edited by the Author)

Such cultural practices streamline the interaction between developers, operations, and other departments such as marketing or sales, promoting a more effective alignment with the company's goals and projects (AWS, 2023).

Shared Responsibility

Essentially, the DevOps culture involves more collaboration and shared responsibility between development and operations teams, working together to create and maintain products (ATLASSIAN, 2023).

Development and operations teams should be accountable for the success or failure of a product. Developers not only develop and hand off their implementations to operations teams but also share responsibilities and oversee the product throughout its lifecycle, adopting a "you build it, you run it" mentality (ATLASSIAN, 2023).

Source: Freepik

Operations engineers can be embedded within development teams, so that their priorities are aligned with the goals of the product teams they are integrated with (KIM, HUMBLE, and DEBOIS, "et al", 2018).

This integration allows for the efficient dissemination of knowledge and operational expertise within a specific team. As the demand for operational knowledge and capability decreases, operations engineers can migrate to other projects or commitments (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Process Automation

A set of centralized platforms and ideally automated tools is essential for any development team to provision resources independently, including the creation of environments, deployment pipelines, automated testing tools, monitoring dashboards, among others (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik

This allows development teams to spend more time building features, rather than dealing with all the infrastructure needed to deliver and support these features (KIM, HUMBLE, and DEBOIS, "et al", 2018).

As a result, the team is no longer hindered by requirements such as the need to open a ticket for the operations team to make a change in the infrastructure, avoiding the scenario where a task that should be completed in seconds ends up taking days or weeks (ATLASSIAN, 2023).

Continuous Feedback

A fast, frequent, and high-quality flow of information throughout the organization is crucial for achieving quality, reliability, and security in the workflow system. This continuous feedback allows for the detection and remediation of issues while they are smaller, cheaper, and easier to correct. Additionally, this practice prevents issues from escalating and generates organizational learning that is integrated into future work (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik

This practice involves the identification and notification of any system failures. Clear and comprehensive code test results are made available to developers as soon as possible. This way, the team is aware of any production failures, performance deficiencies, or reported errors (ATLASSIAN, 2023).

It is worth noting that such an approach applies to any team, involving all necessary collaborators for the identification of failures and their resolution.

Continuous Improvement

In DevOps, the cultural norm of continuous improvement refers to the constant practice of evaluating and enhancing processes, systems, tools, and collaboration within an organization. This approach seeks to identify opportunities for optimization, fault correction, efficiency enhancement, and ensures gradual and consistent improvements over time.

This practice includes establishing a culture of fair learning, considering the inevitable incidents and accidents and their fair responses. Dr. Sidney Dekker, a contributor to defining some of the key elements of safety culture and creator of the term "just culture," wrote:

"When responses to incidents and accidents are perceived as unjust, this can hinder safety investigation, generating fear rather than attention in people who perform critical safety functions. This, in turn, makes organizations more bureaucratic rather than more careful, promotes professional secrecy, and encourages evasion and self-protection behaviors."

Therefore, instead of mentioning, blaming, and shaming an individual who caused a particular failure, it is preferable to continuously reinforce the value of actions that identify and share problems more widely. This enables the transformation of information into knowledge, improves the quality and security of the system, and strengthens relationships throughout the entire organization (KIM, HUMBLE, and DEBOIS, "et al", 2018).

Source: Freepik

This topic is extensive and would make for a dedicated article, as it involves several other practices, such as injecting failures to create resilience, transforming local discoveries into global improvements, centralized documentation of problems and solutions, allocating time to address technical debts, among others.

DevOps Practices

To increase the capacity for distributing systems and services, as well as reduce the risk associated with implementing changes in a production environment, it is necessary to adopt the technical practices of continuous integration, continuous delivery, microservices, infrastructure as code, monitoring, communication, and collaboration. These essential practices help companies innovate more quickly through automation and simplification of software development processes and infrastructure management.

The foundation of these practices is the implementation of frequent, but smaller-scale updates. By adopting frequent yet smaller updates, the risk of each change implementation is reduced. This allows teams to identify errors more quickly, as they can trace back to the last deployment that caused the issue (AWS, 2023).

Continuous Integration

Continuous Integration (CI) represents a paradigm shift. Without it, software is considered non-functional until it is proven to work, which typically occurs during the testing or integration stage (HUMBLE and FARLEY, "et al", 2013).

With continuous integration, it is assumed that the software is working with each change, provided there is a significant set of automated tests. This allows for the immediate identification of any operational issues to be fixed right away. This makes this practice essential for professional teams.

The goal of continuous integration is to keep the software working all the time, meaning every time someone introduces a change to the code. This practice requires some prerequisites to work efficiently, such as regular code commits, creation of a suite of automated tests, and a process of short build and test cycles (HUMBLE and FARLEY, "et al", 2013).

Continuous Delivery

Continuous Delivery (CD) is a set of technical practices that enable the implementation of changes to any environment through a fully automated process (HUMBLE and FARLEY, "et al", 2013). This allows for maintaining a fast workflow without causing issues and disruptions in the production environment.

The practice of continuous delivery expands upon continuous integration, as it promotes new code changes to a testing and/or production environment after the build phase, ensuring a compiled, tested artifact ready for implementation of a new version (AWS, 2023).

Source: Element 61

The practices of continuous integration and continuous delivery (CI/CD) provide various benefits related to distribution time and product quality (AWS, 2023):

  • Increased productivity: Allows developers to prioritize more relevant activities by relieving them of manual tasks, making the team more productive.
  • Reduced errors and bugs: Encourages behaviors that help avoid errors and bugs, contributing to a higher-quality product.
  • Bug identification: With more frequent testing, the team can identify and investigate bugs in early stages, avoiding larger impacts in the future.
  • Rapid distribution of services: Increases the capacity for distributing services, ensuring implementations of improvements and new functionalities for the system.

Continuous Deployment

Continuous Deployment (CD) takes the practices of continuous integration and continuous delivery (CI/CD) to their logical conclusion. When a change implementation is successfully processed through all previous stages, this new artifact is automatically deployed to the production environment (JETBRAINS, 2023).

This results in a decreased feedback cycle, reducing the time between a code change and its use in production. While automating deployment to production may not be suitable for all products and companies, it is important to consider the necessary steps to achieve it, as each individual component has its own value (JETBRAINS, 2023).

To make the practices of Continuous Integration, Continuous Delivery, and Continuous Deployment more understandable, reference is made to the figure below, with the aim of elucidating these concepts:


Source: Jetbrains

Microservices

Microservices represent an approach in application development where it is decomposed into small independent services, each operating as its own process and communicating primarily through an HTTP resource API. These services are built around specific business functionalities and can be deployed autonomously, using fully automated implementations. Centralized management is kept to a minimum, allowing for the use of various programming languages and distinct data storage technologies (MARTIN FOWLER, 2023).

Source: Freepik (Edited by the Author)

Although the microservices architecture is increasingly present in the market, it requires a broad understanding of its concept and complex characteristics, paying attention to the principle of evolutionary architecture for product development based on necessity.

"Any successful product or organization will necessarily evolve over its lifecycle" - Jez Humble

Since this is not an article about architecture and microservices, only the basic concept, importance, and the need for a broader understanding of this subject are highlighted. This practice, like others, is directly related to the increased capacity for distributing systems and services, as a decoupled architecture promotes productivity, testability, and security through isolated and smaller change implementations (KIM, HUMBLE and DEBOIS, "et al", 2018).

For a better understanding of microservices architecture, it is recommended to read the article Microservices, written by Martin Fowler, a widely recognized software engineer, author, and speaker known for his contributions to software architecture, design patterns, and agile methodologies.

Infrastructure as Code (IaC)

To create a fast and reliable workflow, it is necessary to ensure that equivalent environments to the production environment are always used at each stage of the development cycle. These environments should be created in an automated manner, ideally on-demand through scripts and configuration information stored in version control, and be entirely self-service, with no manual work required from operations (KIM, HUMBLE and DEBOIS, "et al", 2018).

Infrastructure as Code is a technique for automating infrastructure based on software development practices. Through code, consistent and repeatable routines are created for the provisioning and management of infrastructure (MORRIS, 2021).

Source: Hashicorp

In the market, several Infrastructure as Code tools are available, some of the most well-known and widely used include: Terraform, Ansible, Puppet, Chef, and AWS CloudFormation.

In addition to enabling a fast and reliable workflow, this technique also provides a series of significant benefits such as: documentation, versioning, history, reproducibility, auditing, testing and validation, standardization, shared collaboration, quick response time, and cost reduction (MORRIS, 2021).

To facilitate understanding and promote the practice of Infrastructure as Code, a Terraform Prototype is made available on GitHub, allowing for exploration and experimentation of the concept in a practical environment.

Monitoring and Observability

Monitoring is a technical practice that allows teams to monitor and understand the state of systems. This is achieved through the collection of predefined sets of metrics or logs (GOOGLE, 2023).

This practice enables the identification of issues across the system in a broad and centralized manner, including applications, environments, and databases. It allows for quicker and more precise corrections. Additionally, it enables the detection of potential issues through anomalies, making it possible to find and fix adversities before they have implications for the system and its users (KIM, HUMBLE and DEBOIS, "et al", 2018).

Observability, on the other hand, is the technique that allows teams to actively debug the system. The concept is based on exploring properties and patterns not predefined (GOOGLE, 2023).

This technique enables understanding the internal state of a complex system based on external indicators, making it possible to identify the root cause of a problem by analyzing the data produced by it (IBM, 2023).

Source: Freepik

There are several monitoring and observability tools available in the market. The most well-known and widely used ones include Prometheus, Grafana, and Zabbix for monitoring, and Elasticsearch, Kibana, New Relic, and Dynatrace for observability.

In essence, monitoring and observability complement each other, as they are methods that allow discovering the underlying cause of problems. Monitoring alerts about failures when they occur, while observability can provide details about what is happening (IBM, 2023).

Commercial Value

There is definitive evidence of the commercial value of DevOps. The State of DevOps Report is a project initiated in 2011, conducted by the company Puppet Labs, with contributions from the authors of the DevOps Handbook, Jez Humble and Gene Kim. The project consists of surveys on DevOps conducted in companies and by technology professionals. Over the years, surveys from more than 40,000 IT professionals around the world have been recorded.

The State of DevOps Report 2017 analyzed surveys conducted that year by 3,200 participants. An increase of 27% in new DevOps teams was observed, as shown in the following figure:

State of DevOps Report 2017

The increase is related to the benefits that DevOps can provide in organizations. After all, the same survey showed a surprising increase in organizational performance, measured by statistics on code deployment frequency, system stability, change implementation, and failure recovery.

Thus, it was found that in high-performance companies, compared to low-performance ones, the following results were obtained:

  • 46 times more frequent change implementation
  • 440 times faster lead time for changes
  • 96 times faster failure recovery
  • 5 times lower change failure rate

The statistical data that contributed to the comparison and the results described above are presented in the following figure:

State of DevOps Report 2017

Based on the data presented, it can be concluded that the adoption of DevOps cultural norms and technical practices has a significant impact on organizational performance. Companies that effectively implement DevOps strategies demonstrate better results, including a higher frequency of change implementation, faster implementation time, quicker failure recovery, and a noticeable reduction in the failure rate during changes.

Conclusion

Based on the presented article, it can be concluded that DevOps provides a greater capacity for service distribution, quality, reliability, stability, security, and reduced costs. Companies achieve better results in the market through faster and higher-quality product implementations.

The inherent conflict between development and IT operations, known as the basic chronic conflict, is reduced. Communication and collaboration are promoted, allowing teams to work together, sharing responsibilities and goals for the success of the company.

There is an encouragement for continuous improvement of processes, systems, tools, and collaboration, identifying opportunities for optimization and correction of flaws in systems, improving efficiency and ensuring gradual and consistent improvements over time.

Corrections of any system flaws can be made quickly and precisely. The technical practices and tools enable the detection of potential problems through anomalies, allowing the identification and correction of adversities before they cause implications in the system and for its users.

Companies gain support for rapid innovation through automation and simplification of the development flow and infrastructure management.

The adoption of cultural norms and technical practices demonstrates a significant impact on the performance of organizations, statistically proven by research conducted by renowned and recognized companies.

Finally, the fundamental concepts of cultural norms and technical practices are presented in a clear, concise, and well-founded manner by recognized authors who have contributed to the DevOps methodology.

References

Atlassian. DevOps. Available at: <https://atlassian.com/devops>. Accessed on November 5, 2023.

AWS. What is DevOps. Available at: <https://aws.amazon.com/devops/what-is-devops/>. Accessed on November 5, 2023.

BEYER, Betsy. et al. Engenharia de Confiabilidade do Google: Como o Google Administra Seus Sistemas de Produ??o. Translation: Lúcia A. Kinoshita. 1st ed. S?o Paulo: Novatec, 2016. 632 p. Original title: Site Reliability Engineering: How Google Runs Production Systems.

Google. DevOps capabilities. Available at: <https://cloud.google.com/architecture/devops>. Accessed on November 18, 2023.

HUMBLE, Jez; FARLEY, David. Entrega Contínua: Como Entregar Software de Forma Rápida e Confiável. Translation: Marco Aure?lio Valtas Cunha. 1st ed. Porto Alegre: Bookman, 2014. 496 p. Original title: Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation.

IBM. What is observability. Available at: <https://ibm.com/topics/observability>. Accessed on November 10, 2023.

JetBrains. Continuous Integration vs. Delivery vs. Deployment. Available at: <https://jetbrains.com/teamcity/ci-cd-guide/continuous-integration-vs-delivery-vs-deployment/>. Accessed on November 12, 2023.

KIM, Gene. et al. Manual de DevOps: Como obter agilidade, confiabilidade e seguran?a em organiza??es tecnológicas. Translation: Jo?o Torello. 1st ed. Rio de Janeiro: Alta Books, 2018. 464 p. Original title: The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations.

Martin Fowler. Microservices. Available at: <https://martinfowler.com/articles/microservices.html>. Accessed on November 15, 2023.

MORRIS, Kief. Infrastructure as Code: Dynamic Systems for the Cloud Age. 2nd ed. [S. l.]: O‘Reilly Media, 2021. 427 p. ISBN 1098114671.

Puppet. DevOps State Report 2017. Available at: <https://puppet.com/resources/report/2017-state-devops-report>. Accessed on November 22, 2022.

André Carvalhoé, I found your article on DevOps really informative! If you had to pick one cultural norm that has had the biggest impact on your work, what would it be?

回复
Arthur Santana

DevOps Engineer | SRE Analyst @ Strada ? Science & Technology Bachelor @ UFABC

1 年

Top demais André, artigo excelente!

回复

要查看或添加评论,请登录

André Carvalho的更多文章

社区洞察

其他会员也浏览了