Getting started with Dafny: Your first formal proof

Getting started with Dafny: Your first formal proof

This post is a beginner's walk through of how to use Dafny to build formally verified software. It is written for at a junior or beginner developer level and should be approachable by anyone with a little experience in C#, Java, Javascript or other similar languages.

If you are not familiar with Formal Methods or the Dafny Verification Language see my previous article for an introduction.


You might remember from last time that I promised to walk through my formal proof of a basic communication protocol. That's not gonna happen. Instead we are going to do something simpler than that. Today we are going to walk through how to formally prove a basic Mutex, more commonly called a thread lock. If you are not experienced with multithreaded programming, don't worry, we are not going to get deep into that, it's just used as a toy example.

Getting Started

You can follow along by downloading VSCode and installing the Dafny extension . Then create a empty file called Lock.dfy and you are ready to go! Dafny is a huge language, but it is well documented.

Source code for this post is available here.

What is a Lock?

Locks are critical to writing multi-threaded code correctly. They are also called Mutexes, which stands for "Mutual Exclusion". The idea is that you have a important piece of code called a critical section what if multiple threads (or users) were to try to access at the same time, "Bad Things?" would happen. (When we say "Bad Things?", we are talking about crashes, infinite loops, corrupt or incorrect data and other things that would negatively impact our software's operation.

So we use a lock to check if someone else is currently accessing the critical section before we enter it. Now that we have introduced the concept, lets give it a definition.

Specifying our Lock

The first, and perhaps most important step of writing designing formal systems is specifying how it should and should not behave. From there, we can design the abstraction that we need to fulfill that contract of behavior. For our lock, this will be pretty brief.

  1. A lock may exist in either of two possible states: (Acquired, Unacquired)
  2. A lock when first created, is in the unacquired state by default
  3. A thread may transition a lock from an unacquired state into an acquired state or from acquired into unacquired.
  4. If a thread attempts to acquire a lock that is already in the acquired state, then the operation fails

From here we can build a implementation that we expect to work.

Designing our Lock class

A lock, for our purposes, is a structure that is equipped with two methods:

  • TryAcquireLock(threadId)->bool
  • ReleaseLock().

TryAcquireLock accepts the id of the thread that is attempting to acquire the lock and returns a bool that indicates if the lock was successfully acquired. A thread may acquire a lock if and only if no other thread has acquired it.

ReleaseLock, predictably, returns the lock to the default state, permitting another thread to subsequently acquire it. In almost all circumstances, this method is only called by the thread that successfully acquired the lock.

We are going to implement the lock with something called a compare-and-swap operation, also known as a compare-and-exchange. This is typically a special operation provided by the CPU that checks a variable and then either chooses to keep it or swap it out, if the value matches some target value or not and the CPU does this as a single atomic operation. Our compare-and-swap isn't the actual CPU instruction, but it will behave the same for purposes our proof.

Those familiar with locks might ask about advanced scenarios such as reentrancy, which means that a thread can re-acquire the lock, while it has already acquired it. We are not going to support it for now. If a thread tries to enter a lock twice, we will reject it.

Technically this implementation is slightly different than the spec because it stores the owner of the thread in it's state. But we can treat it the same from a verification standpoint and it sets us up nicely for extending it in the future.

Understanding the Basics of Dafny

Rather than spending time walking through writing the Lock class, I am going to provide the implementation up front. The focus of this post is on proving that it is correct. Take a look:

The starting point of our proof

"ThreadId" and how numbers work in Dafny

Ok lets unpack this a bit.

newtype ThreadId = i: nat | i <= 0xffff_ffff        

First we see the newtype declaration; this declares a new type that is convertible to and works the same as an underlying type. This is useful for keeping various values from getting mixed up. In this case the underlying type is "nat" with a restriction that only permits values up to 2^32, which is the domain of a uint32. So basically a ThreadId is an uint32 in our program.

In most programming languages we specific sized variables like uint32 and uint64 to store numbers. This is because in a real program, we need to know how much space to use to store the variable in order to process data efficiently.

But Dafny operates in an abstract space where we are not constrained to finite datatypes. Specifically, nat refers to an abstract "natrual number". Natrual numbers are defined as either 0 or any of the positive integers up to but not including positive infinity, so 0, 1, 2, 3,...?. We then restrict it down to a uint32, not because we "have to", but because we want this spec to model the real way that a C# or Java program would operate. This matters because weird bugs can happen at the lower and upper values of an integer's range (0 and 0xFFFFFFFF) and if we allow unlimited sized numbers in Dafny, we will not catch bugs at the upper edge of the value range that might exist in C# because there is no upper edge of to "infinite".

The Lock Class

Next up, we have a class declaration. Classes in Dafny behave much the same way that they do in most programming languages like C# and Java. They are objects that contain fields and may define methods. One big change from most OOP languages is that there are no accessibility keywords such as "private" or "protected". In Dafny, everything is public. This has to do with the concept of "transparency" that is important to how Dafny reasons about whole programs (but more on that later). In you don't intend for a method to be called from outside of the class, then don't do that ??.

Moving on to the class definition, we define:

  • A static const "NULL_THREAD_ID", which defines what "ownerThread" should be when the Lock object is not currently acquired by any thread.
  • A mutable instance field "ownerThread", which stores the current thread that has acquired the lock.
  • A method TryAcquireLock, which attempts to acquire the lock
  • A method ReleaseLock which releases the lock.

The syntax of the method signatures is a little different then other languages, but it is still pretty clear. One important detail is that methods in Dafny can return multiple values. In C# we would do this with out parameters, but the concept is the same. (Actually return values in Dafny are always "out parameters"!)

The next oddity is the "modifies this" statement. I won't dwell on this, but it essentially declares that a method is permitted to modify the state of the parent class. This is important to Dafny so that it can reason about when state is changing.

    method TryAcquireLock(threadId : ThreadId) returns (success : bool)
        modifies this //<- kinda weird ??        

The Beginnings of the Proof

A big mental jump that we need to make with Dafny is how we think about testing our code. In C#, when we write a test, we write a small program that walks through specific steps and add some asserts along the way and then run that program and see if it explodes.

Dafny can do this too, but the power of Dafny is by specifying the program in such a way that the program is "always" correct without us having to actual execute tests. But first, lets write a traditional looking test.

Our basic test, currently failing badly

Ok, here we have a basic test, much like a unit test. This creates a lock, then defines two different threads and then tries to do a few things with the lock. What might be a bit unusual is that we see some compiler errors before we have ever run the program yet. Just by writing the test we already get feedback from Dafny about it's correctness.

The message for the first Error

Here, Dafny has determined that our program "might" not work. Put another way, Dafny can't prove for certain that it will work, but this is not a damning assessment of our code....yet. In most cases, we need to begin helping Dafny (and ourselves!) by being more specific about the intent of our program.

An important thing to mention is that methods are "opaque" to the verifier. That means that Dafny does not look directly inside of the body of methods that are called to determine their behavior and side effects. Instead, it uses the preconditions and postconditions (hereafter called the method's specification or spec) declared on each method to understand the intent and behavior of the method.

This works in two directions:

  • First Dafny will compare the method's spec/signature to the internal body of the method to prove that the method works as designed.
  • Second when other code calls that method, the spec tells Dafny how the class or other variables will change after that method call has completed.

If we look at the first failing assert, Dafny cannot prove that the method will always return success=true. This is because the method spec doesn't say anything about what the method actually returns. In fact the method spec doesn't say anything yet!

Lets add a rule to say when the method returns success:

The function should return success if the lock is not currently held by another thread at the start of the method call.

That is a plain English rule that we are going to transform into a spec rule. Since this rule involves the method return value (which only matters after the method has finished), we are going to specify a postcondition using the ensures keyword.

Our first method spec!

This new rule is basically equivalent to our English rule. The old(ownerThread) means that we are referring to the value of ownerThread just BEFORE the method executes. We are saying that if that value is equal to our null thread id value, then that "implies" that the return value (called "success") is true.

The "fat arrow" ==> is the "implies" operator. It means that is the left side of the operator is true, then the right side is true. BUT, the reverse is not necessarily true; that is to say that if the right side is true, the left side may or may not be true. So to recap, success is true when ownerThread is "null" (actually NULL_THREAD_ID, which is zero) when the method starts. Neat!

Oh wait...

A new challenger appears!

We have a new error on the return statement of the method now.

The message from the new error

Hmm, "a postcondition could not be proved"? So Dafny now says "I see that you want this method to return true when this condition is met, but how are you gonna do that?". Ok, so we need to prove this somehow.... Lets work backward.

The condition depends on the return value "success", which this method defines as casResult == threadId. So that means we have to somehow prove that this expression is true when we start with ownerThread is null. The variable threadId comes from the method argument, so we can't predict that, but casResult we can interpret. So perhaps if we tell Dafny what values CompareAndSwap returns and when, it can make this leap of faith with us. Time for a new rule:

If valueToCompare is equal to ownerThread when CompareAndSwap begins, then CompareAndSwap will return the value of valueToSwap
Our rule for CompareAndSwap

Detour: Is Dafny really this dumb?

I'm going to pause here to answer a question that might be bubbling up already: "Why do I have to spell everything out for Dafny? Can't it just 'figure it out'?". Great question! There are two main reasons:

First, it helps the underlying engine by giving it guardrails on how much of the universe it needs to search and understand. By treating method as "opaque" and only paying attention to the explicit rules that we have written, the number of things the solving engine has to keep track of is reduced by a massive amount. Without this, every possible observable behavior of your program would have to be considered by the engine regardless of whether it matters or not. The complexity explodes very quickly into infinity and it is computationally infeasible to analyze even small programs .

The next reason is more subtle. A distant cousin of Dafny is F#. F# is a language that has a powerful feature called "type inference" that allows us to just write code without specifying the type of each variable or method parameters. This is really cool because it gives us the ease of languages like JavaScript and Python, but with the type safety of languages like C# and Java. The compiler just "notices" your changes and you get correct compiler errors that tell you the other changes you need to make to bring your program back into a consistent state.

But it turns out that is too much of a good thing. Don Syme, the creator of the F# language still recommends explicitly specifying the types of variables in most situations. This is a choice done to give "structure" to a program and make the types used explicit parts of the program contract. Because without this structure, making a tiny change somewhere might cause a cascade of changes in behaviors and errors that can become overwhelming to understand.

The situation is the same in Dafny. We annotate our methods with static specification not only to help the engine figure things out, but also to make sure that the implementation code we are writing is actually doing what we want it to according to our method specifications. Many times, we make subtle mistakes in the implementation of our code and a key tactic is to add more preconditions and postconditions to our program until we notice the inconsistency. And that is exactly what we are going to do next!

Working with Counterexamples

So now Dafny can understand that after CompareAndSwap executes, it's return value will end up fulfilling the contracts as we intended. And at this point the CompareAndSwap and TryAcquireLock methods both turn green! But... our first test assertion is still failing :(. Dafny still doesn't believe that TryAcquireLock will always return true after the first time it is called. I'm actually kind a stumped. When would this not be true? Well it turns out Dafny can tell us by producing a counterexample! A counterexample is when Dafny shows us a concrete example that violates our specifications. We can study them to understand what Dafny is thinking.

You can enable counter examples in VSCode by using the command pallet.

Dafny Counterexample commands in VSCode

Once we do, Dafny throws a lot of stuff at us

Dafny displaying counterexamples for the error

This is overwhelming at first, but lets unpack it. Each blue line is telling us the state of the program just before that line executes. We can also tell Dafny to copy these to the clipboard using the command pallet. lets look at the first one.

At "{" ...
    defass:bool = false
    lock:? = ?
    thread1Id:int = 0
    thread2Id:int = thread1Id        

Ok, everything is uninitialized here because it is the opening curly brace of Main. The '?' means that the lock variable is uninitialized and the threadIds are both zero at this point. The variable thread2Id is "equal" to thread1Id because the underlying engine consolidates all variables with identical values. This looks a little confusing can help a lot with with keeping track of complicated states of our program and determining when they are the same.

The "defass" variable is special and refers to definite assignment . This indicates if Dafny can prove if all of our variables have been assigned values yet. Don't worry too much about this one for now.

Lets skip forward to the next one:

At "var lock := new Lock();" ...
    defass:bool = true
    lock:_module.Lock = (NULL_THREAD_ID := 0, ownerThread := 39, lock#0 := true)
    thread1Id:int = 0
    thread2Id:int = thread1Id        

Hmm, that's weird. it says that the ownerThread in our Lock object is equal to 39!? How the heck did that happen!? As a proving tactic, Dafny will sometimes try to plug in a random number as the value of a field or variable and see if it works.

So what Dafny is saying, is that after the constructor has finished, ownerThread could be anything because we forgot to specify that it will always be zero in the postcondition!

Recall Dafny does not look inside of methods (and constructors) to see how they work unless we add a specification to that method and we haven't done that for our constructor, so Dafny assumes that anything could be happening in there. This is a little annoying, but easy to fix.

Adding a postcondition to our constructor

And now, like magic, our first assertion passes! But the second one is still failing ??.

The first assertion passes

Back to the counterexamples!

At "var success := lock.TryAcquireLock(thread1Id);" ...
    defass:bool = true
    lock:_module.Lock = (NULL_THREAD_ID := 0, ownerThread := 7720, lock#0 := true)
    thread1Id:int = 1
    thread2Id:int = 2
    success:bool = defass        

Hmm, ownerThread ended up with a crazy value again after we acquired the lock. I guess we need to constrain the behavior of TryAcquireLock some more. Lets add a new rule:

If TryAcquireLock succeeds, ownerThread will be equal to the threadId argument
More postconditions...

*Sigh*...that leads to a new error. I guess because CompareAndSwap doesn't specify it's return value. Wait! Doesn't our postcondition on CompareAndSwap already says that it's return value is specified correctly? So what gives? Shouldn't Dafny be able to understand that CompareAndSwap will make TryAcquireLock work correctly? Well, lets look at the counter example:

At "var casResult := CompareAndSwap(NULL_THREAD_ID, threadId);"
    this:_module.Lock = (NULL_THREAD_ID := 0, ownerThread := 38)
    threadId:int = 39
    casResult:int = threadId        

Ok, so in this example we are in TryAcquireLock and Dafny has started us out with the threadId argument set to 39 and ownerThread equal to 38. So thread 39 is trying to acquire the lock, but it is already held by thread 38. This should fail. But wait:

At "return casResult == threadId;" :
    this:_module.Lock = (NULL_THREAD_ID := 0, ownerThread := 38)
    threadId:int = 39
    casResult:int = threadId
    defass:bool = true
    success:bool = defass        

Dafny says that CompareAndSwap returned the value of threadId into casResult. But that should not have happened because of the post condition on CompareAndSwap that controls the return value....right? Nope. Lets look at our English post condition for CAS again:

If valueToCompare is equal to ownerThread when CompareAndSwap begins, then CompareAndSwap will return the value of valueToSwap

This only defines what happens when CAS succeeds! (valueToCompare == ownerThread). We haven't covered what happens when it doesn't! I'm going to skip some steps also and mention that we didn't constrain that ownerThread must not change if CAS fails. And several other things... are you getting tired yet?

The Completed Proof

My final version looks like this:

The completed proof

Dafny has features that allow us to "pick" a random possible value for a variable. This is superior to manually specifying values such as "1" and "2" for our thread ids. So I have generalized those variables to be random, which forces Dafny to prove that our tests work for all possible values instead of just the few that we can manually think of. This uncovered another bug where a bad user could pass 0 as the threadId to TryAcquireLock and think that they held a lock when it could still be taken from them ??.

As a note, this is different than randomized testing or property-based testing , which run tests again and again with multiple values. But they have to stop running combinations eventually and cannot guarantee that they have checked everything. But Dafny can!

As long as we have defined the desired behavior of the class in the test, then this code will work. Period. No corner cases, no surprises.

The Takeaway

Programming correct code requires designing behavior and then implementing that behavior without deviating from the design. Dafny can help with both sides of this.

That being said. Writing formal proofs is a humbling experience. The code you thought was simple turns out to be very subtle and nuanced. And modeling more complicated systems like, loop processing, client-server protocols and timer based functionality can be very tricky to even know how to start.

Even with "simple" code, the size of the proof specification can dwarf the actual code. Our Lock class took 1 precondition and 8 post conditions to prove correct. You will also notice that there are 9 lines of specification to cover just 5 lines of implementation code. That is an unsurprising statistic in the world of formal methods.

This is frustrating, time consuming and expensive work to apply to a project and should be generally treated as an extreme step after significant investments in other testing methods have been made. This example overall took me around 4 hours. Partly because I was rusty at Dafny but mostly because it is hard. Damn hard.

But at the end of the day, Formal Verification can go places beyond any software testing technique and Dafny is the most approachable way to achieve this that I have yet seen. Give it a shot, you just might like it. Or maybe one day you will use it to prevent a bug that could mean life or death.



要查看或添加评论,请登录

社区洞察

其他会员也浏览了