ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Vigilance, Guide Rails, and Architecture Hoisting

David Max

Senior Software Engineer at Datadog

å‘å¸ƒæ—¥æœŸ: 2015å¹´9æœˆ22æ—¥

On September 23, 1999, the unmanned NASA Mars Climate Orbiter reached Mars after cruising for 10 months and 416 million miles. It fired its rockets to maneuver itself into orbit around Mars in preparation for a planned 687 day mission. Instead, the spacecraft swung behind Mars and was never heard from again.

A Simple Math Error

The $125 million orbiter disappeared because a simple math error in the spacecraftâ€™s software that did not convert English units to metric.

The navigation team at the Jet Propulsion Laboratory (JPL) used the metric system in its calculations, while Lockheed Martin Astronautics, which designed and built the spacecraft, provided crucial acceleration data in the English system of inches, feet and pounds. The error had affected the orbiter mission from its launching, yet the problem was never caught and corrected.

Too Little Testing? Or Too Much to Test?

NASA performed an immediate learning review, identified the causes of the failure, and made several recommendations. The error that caused the failure was trivial. The need to use consistent units was always part of the specification and was well understood. What led it to be a cause of failure was the sheer complexity of the entire system, not the individual task.

There are two ways of constructing a software design:
One way is to make it so simple that there are obviously no deficiencies, and
the other way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult.

â€” Sir Tony Hoare, The Emperorâ€™s Old Clothes

The traditional approach to flight software for spacecraft at JPL was to program very conservatively and test everything very thoroughly because the recovery options are very limited if a program crashes 55 million miles away. Accounting for every possible risk in such a dynamic system takes a bit of magic, a lot of wisdom, experience, confidence, creativity, and attention to detail. Itâ€™s an art that is very difficult to teach.

Guide Rails

For many years most spacecraft programs were written in C. To reduce risk, they identified certain features that might be more prone to lead to bugs and created coding guidelines to restrict themselves from using those features. George Fairbanks calls these sorts of self-imposed architectural constraints guide rails (see Architectural Hoisting, IEEE Software, vol.31, no. 4, pp. 12-15, July-Aug. 2014, doi:10.1109/MS.2014.82).

The rules would say things like, no dynamic memory allocation, no recursion, and every switch statement must have a default. With code reviews, tool-based compliance checkers, etc. to enforce the guide rails, the intent was that with sufficient application of vigilance the risk could be reduced to tolerable levels.

Vigilance is an effective technique for reducing risk. Itâ€™s ingrained into the habits of software developers defensive programming habits such as always checking return values, validating parameter values, handling exceptions, etc.

Eternal Vigilance is Exhausting

While vigilance is often sufficient when programming in the small, there are many problems with scaling up vigilance as the primary method of reducing risk. It has to be continuously sustained and only grows more difficult as the system gets more complex. A developer only has limited cognitive bandwidth, and every additional item the developer must be vigilant about increases the mental burden of maintaining that vigilance and decreases the remaining bandwidth available for implementing features.

Also, while the coding guidelines can act as the guide rails, they arenâ€™t easily visible in the code because there is no need for an explicit representation there. A new developer joining the team might not easily figure out what code the previous developers had refrained from writing in order to comply with the coding guidelines.

Architectural Hoisting

The term, architecture hoisting, was coined by the Mission Data System (MDS) project at NASA JPL to describe their methodologies originally designed for spacecraft flight control software design. These systems have multiple sensors and control systems that must be monitored and reacted to. Developers would expend considerable effort ensuring that one activity, say transmitting a block of data back to Earth, didnâ€™t interfere with any of these other critical tasks whose code was often distributed in many different modules.

What they did instead was hoist into the system architecture a model of the spacecraft sensors and components, and the constraints that had to be followed. The model could then be transformed into code that had enforcement of the constraints explicitly built in. The system would then enforce the priorities so that it wouldnâ€™t allow a situation where, say, the spacecraft is busy taking pictures when it should be making a course correction.

Architectural Hoisting:
A design technique where the responsibility for a guide rail is moved away from developer vigilance into code, with the goal of achieving a global property on the system.

Example: Hoisting Memory Management

To take a simple example, garbage collection and smart pointers in C++ (like unique_ptr) can be seen as hoisting memory management, making the task easier for developers to handle. Hoisting generally comes with some constraints and costs, so for example, hoisting memory management into the architecture with automatic garbage collection might make it more difficult to achieve the same level of performance.

Example: Hoisting Scalability and Concurrency

A recent Wired article, Why WhatsApp Only Needs 50 Engineers for Its 900M Users, explains why the company builds its service using a programming language called Erlang. David Chisnall explains (What Language I Use forâ€¦ Building Scalable Servers: Erlang) some of the reasons why Erlang is well-suited for building highly scalable systems.

There is a pattern in Chisnalâ€™s explanation. He describes the various ways Erlang limits the developer and says how this makes it easier to write concurrent programs that scale. For example, â€œIf you want to write scalable, maintainable, parallel code, there is one rule that you must abide by: No data may be both shared and mutable. Erlang enforces this because within a process it has an (almost) purely functional model. All variables are immutable, with just one exception: the process dictionary.â€

This pattern where the architecture enforces a guide rail in order to achieve a desired system attribute is the essence of architectural hoisting.

Hoisting Everywhere!

Once you get the idea of architecture hoisting your start to realize that all of the various frameworks and libraries used for building large scale systems are all examples of the application of architecture hoisting. For example, Rest.li is a â€œframework for building robust, scalable RESTful architectures using type-safe bindings and asynchronous, non-blocking IO.â€ Each one of the italicized terms is a quality attribute that Rest.li hoists for the developer. You can try the same exercise looking at the overview descriptions for other frameworks:

Apache Kafkaâ€” fast, scalable, durable, distributed.
Apache Samza â€” simple API, managed state, fault-tolerant, durable, scalable, pluggable, processor isolation.
Project Voldemort â€” replicated, partitioned, tunable, pluggable, distributed, fault-tolerant.
Espresso â€” horizontally scalable, indexed, timeline-consistent, document-oriented, highly available NoSQL data store.

So That's What That's For

Learning to view a software framework as a way of applying various kinds of architecture hoisting to a system is like discovering that each tool in your toolbox has a little label on it like, â€œHammer: Use flat end for pounding in nails and the other for removing them.â€ Each tool has characteristic abilities, constraints and tradeoffs. The more tools one knows how to use, the better a developer is able to select and use the best tool for the job.

A Common Language

Also, when one is able to summarize a complicated framework or component in terms of what it hoists and what constraints it imposes, that is a potent shorthand for communicating to all the developers on the team the rationale that lies behind the design choices so that the rest of the implementation will fulfill the desired quality attributes.

Technical Debt

Software developers have a great advantage if they are aware of the architecture they are using and what qualities it hoists for them. As the system evolves, a portion of what is called technical debt is an accretion of decisions to manage risk through the application of vigilance instead of refactoring the handling of that risk into the design and the architecture. The more a system depends on vigilance, the more fragile it becomes and the harder it is to maintain.

As a system evolves, developers should be looking out for situations where the reliance on vigilance is slowing progress and raising the risk of introducing bugs. Before the risk gets too large, seek ways to establish guide rails or refactor the system to hoist those quality attributes into the design.

For Further Research...

View George Fairbanks's talk (Mar 18, 2012) about guide rails, vigilance and architecture hoisting.

Please join the conversation...

Have you found this to be true in your experience? Please comment below.

Thanks for reading. Please like and share. You can find my previous LinkedIn articles here (https://www.dhirubhai.net/today/author/davidpmax).

PHOTO: The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Photography Collection, The New York Public Library. "Construction workers and crane seen from below" The New York Public Library Digital Collections. 1931. https://digitalcollections.nypl.org/items/510d47d9-a902-a3d9-e040-e00a18064a99

Guy H.

Section Manager and Harbourmaster

9 å¹´

Well written David, very enjoyable.

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

David Maxçš„æ›´å¤šæ–‡ç«

Said The Engineer, â€œLet Me Tell You a Storyâ€¦â€

2020å¹´2æœˆ4æ—¥

Said The Engineer, â€œLet Me Tell You a Storyâ€¦â€

Have you ever found yourself reading a book, sitting cozy on the couch, only to look up after who-knows-how-long andâ€¦

5 æ¡è¯„è®º
Slow Motion Change in Engineering Education

2019å¹´12æœˆ9æ—¥

Slow Motion Change in Engineering Education

I wondered in my previous post, why have I met so many engineers who started out thinking that engineering wasnâ€™t forâ€¦

3 æ¡è¯„è®º
How Do You Know If Youâ€™re an Engineer?

2019å¹´11æœˆ7æ—¥

How Do You Know If Youâ€™re an Engineer?

Iâ€™m an engineer, and Iâ€™ve met a lot of them. One thing Iâ€™ve noticed is that many of the engineers I know started outâ€¦

6 æ¡è¯„è®º
What Makes a Good Online Group?

2019å¹´7æœˆ22æ—¥

What Makes a Good Online Group?

Online groups have been around for longer than web browsers. If you're like most people reading this, you've visited orâ€¦
Embedding Content in LinkedIn Posts Using oEmbed

2017å¹´5æœˆ23æ—¥

Embedding Content in LinkedIn Posts Using oEmbed

One of the more expressive features of LinkedInâ€™s Publishing Platform is the ability to embed content from another siteâ€¦

74 æ¡è¯„è®º
Coders Arenâ€™t Assembly Line Workers

2017å¹´3æœˆ27æ—¥

Coders Arenâ€™t Assembly Line Workers

Clive Thompson wrote a thought-provoking piece in Wired, The Next Big Blue Collar Job is Coding. The usual definitionâ€¦

461 æ¡è¯„è®º
DONâ€™T Follow Your Passion

2016å¹´8æœˆ23æ—¥

DONâ€™T Follow Your Passion

One of the most typical pieces of advice youâ€™re likely to get for how to find a job that will bring you success andâ€¦

303 æ¡è¯„è®º
The Job You Already Have Could Be The Job You Want (With a Few Tweaks)

2016å¹´8æœˆ16æ—¥

The Job You Already Have Could Be The Job You Want (With a Few Tweaks)

I was sitting on a bus chatting with a couple. The husband is a software engineer like me, and his wife is a nurse whoâ€¦

64 æ¡è¯„è®º
What is Software Craftsmanship?

2016å¹´7æœˆ11æ—¥

What is Software Craftsmanship?

I first started hearing about software craftsmanship when I started working at LinkedIn. It wasnâ€™t a familiar term toâ€¦

27 æ¡è¯„è®º
Single-Purpose Concepts, Single-Concept Purposes

2016å¹´5æœˆ9æ—¥

Single-Purpose Concepts, Single-Concept Purposes

Have you ever encountered a confusing computer program? Just to take one example of many, the following questionâ€¦

1 æ¡è¯„è®º

See all articles

Vigilance, Guide Rails, and Architecture Hoisting

David Max

Senior Software Engineer at Datadog

A Simple Math Error

Too Little Testing? Or Too Much to Test?

Guide Rails

Eternal Vigilance is Exhausting

Architectural Hoisting

Example: Hoisting Memory Management

Example: Hoisting Scalability and Concurrency

Hoisting Everywhere!

So That's What That's For

A Common Language

Technical Debt

For Further Research...

Please join the conversation...

David Maxçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Pulsar Positioning System: A quest for evidence of extraterrestrial engineering

Protective Coatings For Space

Boeingâ€™s Starliner spacecraft finally carries astronauts to orbit

Exploring the Technological Evolution of Chandrayaan Missions By ISRO

Commercial Lunar Propellant Architecture - A Collaborative Study of Lunar Robotic Services - o5 Introduction

Customer Needs for a Modern Day Space Elevator - 2021

How sustainable is Elon Musk's Starship?

COTS in Space: What are the Challenges for a New Space Startup?

From Orbit to Earth: The Engineering behind Spacecraft Re-entry

A Simple Math Error

Too Little Testing? Or Too Much to Test?

Guide Rails

Eternal Vigilance is Exhausting

Architectural Hoisting

Example: Hoisting Memory Management

Example: Hoisting Scalability and Concurrency

Hoisting Everywhere!

So That's What That's For

A Common Language

Technical Debt

For Further Research...

Please join the conversation...

David Maxçš„æ›´å¤šæ–‡ç«

Said The Engineer, â€œLet Me Tell You a Storyâ€¦â€

Slow Motion Change in Engineering Education

How Do You Know If Youâ€™re an Engineer?

What Makes a Good Online Group?

Embedding Content in LinkedIn Posts Using oEmbed

Coders Arenâ€™t Assembly Line Workers

DONâ€™T Follow Your Passion

The Job You Already Have Could Be The Job You Want (With a Few Tweaks)

What is Software Craftsmanship?

Single-Purpose Concepts, Single-Concept Purposes

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Pulsar Positioning System: A quest for evidence of extraterrestrial engineering

Protective Coatings For Space

Boeingâ€™s Starliner spacecraft finally carries astronauts to orbit

Exploring the Technological Evolution of Chandrayaan Missions By ISRO

Commercial Lunar Propellant Architecture - A Collaborative Study of Lunar Robotic Services - o5 Introduction

Customer Needs for a Modern Day Space Elevator - 2021

How sustainable is Elon Musk's Starship?

COTS in Space: What are the Challenges for a New Space Startup?

From Orbit to Earth: The Engineering behind Spacecraft Re-entry

Said The Engineer, â€œLet Me Tell You a Storyâ€¦â€

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†