登录查看更多内容

Drawing an Owl

Matt Provost

Engineering Leader | Author

发布日期: 2024年3月17日

"How to Draw an Owl" is a classic Internet meme from 2010 showing how to draw an owl in two steps. I find a lot of technical writing unintentionally follows a similar style, where a problem is presented followed by a solution, often with very little explanation. As a writer, it's easy to fall into this trap when you're already an expert on the a topic. It can be difficult to put yourself back in the position of someone learning from scratch. In my own writing, I always try my best to explain everything as I go.

Each chapter of Rust From the Ground Up teaches Rust concepts by rewriting a classic Unix utility from the original BSD source code in C into modern, idiomatic Rust. The chapter that I'm editing now implements the cut program, which selects and prints specific fields or characters from each input line. While editing I came across some simple Rust syntax which doesn't work, so I went on a deep dive into understanding why. Now that I know what is going on, instead of just skipping straight to a working solution, I updated the chapter to explain the reasons to the reader.

The rest of this article follows my journey into understanding a piece of Rust code.

The original BSD implementation of cut stores which positions it should select from each line using an array of integers. The -f (fields) and -c (characters) command-line options take a list argument specifying the positions, and the get_list() function parses the list and updates the positions array.

For example, selecting the first and second fields from the /etc/hosts file:

% grep localhost /etc/hosts
127.0.0.1       localhost
% grep localhost /etc/hosts | cut -f1
127.0.0.1
% grep localhost /etc/hosts | cut -f2
localhost
% grep localhost /etc/hosts | cut -f1,2  
127.0.0.1       localhost

The original BSD version uses a fixed-size 2049 byte array to store the selected positions. This is because the POSIX standard specifies that the maximum length of a line of text that it expects Unix text processing utilities like cut to handle is 2048 bytes (or characters, back in the ASCII days). And the positions that cut selects start with 1, not 0, so it needs space for one extra position at the start of the array.

My Rust implementation uses an expandable Vec (vector) to store the positions, without an arbitrary limit on the length of a line. The Rust get_list() function takes a new, empty Vec and resizes it to the highest value in the list, plus one.

I'm also using this chapter to introduce Test Driven Development (TDD). (Of course all of the original BSD programs predate TDD which was popularised in the late 90s by Kent Beck. The first BSD version of cut is from 1989!)

The naive approach to testing the output of get_list() with a value larger than POSIX requires (2049) is to write a test function which loops 2049 times checking that each element of the Vec updated by get_list() is false (including the extra first byte) and then checks that the final element is true.

Collecting the loop of 2049 false values into a Vec and then extending it to add a true element at the end would allow the test to compare the expected Vec to the positions updated by get_list() using the == operator. But it's not efficient to allocate and mutate a Vec just to use it once. I avoid using collect() in the book.

Idiomatic Rust programs use iterators instead of loops where possible. Instead of allocating a Vec with 2049 false values and one true, Rust makes it easy to create an iterator at runtime by repeating false values with repeat(), taking 2049 of them with take(), and chaining one additional true value with chain().

At this point I could draw the owl by instructing the reader to write a test function using the Iterator trait's eq() method to compare the expected iterator with the positions updated by get_list():

#[test]
fn test_get_list_2049() {
    let mut positions = Vec::new();
    let expected = repeat(false).take(2049).chain([true]);
    //                    ^^^^^       ^^^^        ^^^^^^

    get_list("2049", &mut positions).unwrap();
    //       ^^^^^^
    assert!(expected.eq(positions));
    //               ^^
}

And the test would pass:

% cargo test --bin cut test_get_list_2049
    Finished test [unoptimized + debuginfo] target(s) in 0.04s
     Running unittests (target/debug/deps/cut-c32a352ad939e0ce)

running 1 test
test test_get_list_2049 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.02s

But why do iterators need their own eq() method when the language already provides the == operator for testing equality?

First, the == operator would need two iterators to compare. By itself, a Vec isn't an iterator. Rust tries to be explicit with memory allocations, and iterating over a Vec requires a variable to store the current index. The into_iter() method converts a Vec into an Iterator, allocating memory to track the current index.

Converting positions into an iterator with into_iter() and comparing it with the expected iterator using the == operator seems like it should work:

#[test]
fn test_get_list_2049() {
    let mut positions = Vec::new();
    let expected = repeat(false).take(2049).chain([true]);
    
    get_list("2049", &mut positions).unwrap();
    assert!(expected == positions.into_iter());
    //               ^^           ^^^^^^^^^^^
}

But the compiler generates an error trying to apply the == operator to the complex type it inferred from the chain of iterators:

error[E0369]: binary operation `==` cannot be applied to type `std::iter::Chain<std::iter::Take<std::iter::Repeat<bool>>, std::array::IntoIter<bool, 1_usize>>`
   --> src/bin/cut.rs:156:22
    |
156 |     assert!(expected == positions.into_iter());
    |             -------- ^^ --------------------- std::vec::IntoIter<bool>
    |             |
    |             std::iter::Chain<std::iter::Take<std::iter::Repeat<bool>>, std::array::IntoIter<bool, 1_usize>>

Understanding the reason why == cannot be applied to the iterator requires diving into Rust's memory model.

Rust provides memory safety through a system of ownership. Each value in a Rust program is only able to have one owner at a time. This prevents entire categories of bugs that affect memory unsafe languages such as C or C++.

领英推荐

Automate writing Dockerfiles today? with docker init

Akhilesh Mishra 1 年前

On QML Code Protection

Sandro Andrade 11 个月前

The Art of Writing Clean Code: 10 Principles That…

Kumar Sethi 2 个月前

By default, a Rust function (or method) takes ownership of its parameters, so the caller isn't able to use them anymore. A function is also able to temporarily borrow a parameter with the borrow operator (&), giving ownership back to the caller when it returns. The borrow checker validates this at compile time.

Under the hood, the == operator calls an eq() method, which borrows the left (self) and right-hand side (other) values with the borrow operator (&):

fn eq(&self, other: &Rhs) -> bool

Because the eq() method borrows the left and right-hand side values, the rest of the program is able to continue using them after the comparison.

However, comparing two iterators requires yielding items from both. Yielding the next item requires mutating the iterator - for example incrementing its index variable.

Another key feature of Rust's memory safety is that values are immutable by default. The mut keyword designates a mutable value.

Iterators work by repeatedly calling next() to yield the next item. The next() method mutably borrows the iterator (self) with &mut to yield the next item:

fn next(&mut self) -> Option<T>

To compare two iterators by calling next() on each, the == operator would have to either take ownership of, or mutably borrow, the values on both sides. But its eq() method only immutably borrows the values. Which makes sense - it would be unexpected for an equality comparison to modify the values that it was comparing!

A common mistake in languages (such as C) where variables are mutable by default is confusing assignment (=) with equality (==). For example, accidentally using = instead of == assigns n the value 42, instead of checking if its value equals 42:

if (n = 42) { ... }

Rust's borrow checker prevents directly comparing two different iterators for equality because the == operator doesn't take ownership of the values to mutate them.

To compare two iterators, the Iterator trait defines a different eq() method which takes ownership of its parameters (self and other) and consumes both, comparing each pair of items (with ==!) and returning true if they are all equal:

fn eq<I>(self, other: I) -> bool
where
    I: IntoIterator,
    Self::Item: PartialEq<<I as IntoIterator>::Item>,

Once the iterator eq() method has taken ownership of and consumed both iterators, the program isn't able to use them again.

Updating the test to assert that the expected and positions iterators are equal with eq():

#[test]
fn test_get_list_2049() {
    let mut positions = Vec::new();
    let expected = repeat(false).take(2049).chain([true]);

    get_list("2049", &mut positions).unwrap();
    assert!(expected.eq(positions.into_iter()));
    //               ^^
}

Now the test successfully validates that they both yield sequences of 2049 false values followed by a single true:

% cargo test --bin cut test_get_list_2049
   Compiling rftgu v0.1.0 (/home/me/git/rftgu)
    Finished test [unoptimized + debuginfo] target(s) in 0.56s
     Running unittests (target/debug/deps/cut-399cc4b2a5e5a5f1)

running 1 test
test test_get_list_2049 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 0.00s

This test function is almost the same as the first example (where I "drew the owl"), except that it calls into_iter() on the positions Vec. Instead of requiring its caller to allocate an iterator with into_iter(), eq() does it itself, making it more ergonomic to use. Idiomatic Rust methods take anything that can be converted into an iterator (such as a Vec), and then do the conversion themselves.

But doesn't calling into_iter() on an already existing iterator needlessly allocate another iterator? Looking at the source code, calling into_iter() on an existing iterator just returns itself!

fn into_iter(self) -> I {
    self
}

This article demonstrates how I try to approach teaching Rust in my book. My premise is that people learn better (and retain the knowledge) when I give them the chance to follow the same process I do in discovering how things work. Leaping directly to a final, idiomatic solution without explaining all of the fundamentals first doesn't actually teach anything (except memorisation).

And that's how you draw an owl!

Matt Provost的更多文章

Continuously Changing Systems

2015年9月10日

Continuously Changing Systems

Here's a great post from Jeff Sussna about building systems for continuous change. This is something that I've worked…

2 条评论

Drawing an Owl

Matt Provost

Engineering Leader | Author

领英推荐

Matt Provost的更多文章

社区洞察

其他会员也浏览了

How to Download All PDF Files from a Website: A Comprehensive Guide

Implementing FastAPI for High-Performance Applications

Best Practices For Writing Clean Code

My First Experience of Open Source Software - Porting a Prolog Interpreter (or, so you think ChatGPT is new?!)

Writing WebAssembly (Wasm) in Rust: An Introduction with Code Example.

Content Roundup 11/18/24

Devs' Bread & Butter - 1. Naming

Lisp: A language that calls me

Write Code Faster

Super developer: The Art of the Start

领英推荐

Matt Provost的更多文章

Continuously Changing Systems

社区洞察

其他会员也浏览了

How to Download All PDF Files from a Website: A Comprehensive Guide

Implementing FastAPI for High-Performance Applications

Best Practices For Writing Clean Code

My First Experience of Open Source Software - Porting a Prolog Interpreter (or, so you think ChatGPT is new?!)

Writing WebAssembly (Wasm) in Rust: An Introduction with Code Example.

Content Roundup 11/18/24

Devs' Bread & Butter - 1. Naming

Lisp: A language that calls me

Write Code Faster

Super developer: The Art of the Start