Trees in Modern C++

Trees in Modern C++

Introduction

Hello peeps,

In my last article I wrote about how I used "deducing this" for my ongoing compiler project in my lexer to reduce unnecessary copies. As I was researching on deducing this while writing the article I came across a fabulous technique in Sy Brand 's c++ blog about deducing this( Sy Brand if you are reading this let's connect :D) which can be used to represent trees in modern c++ and with the help of deducing this and std::visit can be used to even traverse and operate on these trees.

How I used this in my compiler project? Well for starters, I used it to design my AST and also traverse it and create different kind of visitors for it, one for printing the AST(with std::format its insane how easy formatting also turns out to be), one for evaluating expressions like add, subtract etc. and others, like another one for emitting the byte-code for the virtual machine of my programming language. I will be writing about them in my next article(s), this article will be focused mostly on the basics, how to create the tree, how to traverse it, operate on it etc. and trust me there's a lot of content to digest, at least it was for me when I was developing and improving on this method so you will not be left disappointed even if you end up never using this.

so tighten your seat-belts cuz here we go...

Tree Representation

For every tree the basic building blocks are:

  1. The non-leaf nodes
  2. The leaves

Let's first lay these out

We have a generic Node and Leaf type, to hold data of any generic type but we haven't yet provided a definition for the structs themselves. We will do this later and also see why we delay the definition.

The Tree type that we are creating contains these nodes and leaves so suppose we are standing at a particular node, what can we say about this node? It can either be a non-leaf Node or it could be a Leaf. We can extend this definition to its children as well where each child can be a Tree, i.e., either a Leaf or a Node, thus,

Tree -> Leaf or Node -> Leaf U Node

So essentially a Tree is a union type of Leaf and Node, either it can be a Node or Leaf, how do we represent this in c++? That's right we have unions in C++ or rather their super powerful alternative std::variant introduced in C++17

We can now provide the definitions of the Node and Leaf types, as for why we couldn't do so previously will become evident after seeing the code

As you can observe that we needed the definition of Tree to represent our Node's left and right child which being an alias of std::variant itself needs the types being added in the variant, i.e., Node and Leaf so we needed to create a forward declaration of both before using them in the Tree type alias, also note we are using pointer to the forward declared types Node and Leaf because both are incomplete types as they don't have their definition while a union needs complete type information at the time of creation(union's size is the size of the largest type) we can avoid this by using pointer to the types as pointer types have fixed size of 8 bytes. Well we could actually fully define Leaf type since it doesn't depend on the Tree alias type but, well for consistency I just do both in this way.

Although this is almost correct, there's something we are missing. Suppose our tree has structure like this, a root node and a left node but no right node. What should the right node be in such a case? There is definitely no Node there and it is also not a Leaf. So we need to represent this empty state somehow in our tree definition also, for which C++ also provides a very convenient std::monostate type, hence our Tree definition now becomes

Before moving on to the tree operations here's the entire code for the tree representation, take your time and let it sink in

Tree Operations

We have successfully represented the tree now how do we operate on it? variants, templates, pointers, surely there can't be a straight forward way to deal with this nightmare, we are talking about C++ after all, isn't it? Well, not quite. It turns out C++23 deducing this and C++17 std::visit actually have made this very easy to do in just a few lines of code.

Although before we get into it, we do need some prerequisite knowledge so skip these if you already know these.

Overload Lambda Set

The overload lambda set trick is how we will create a custom Visitor functor object which will contain multiple lambdas or functor objects and inherit each of their call operator so that when we do... we will see what we will do for now lets see the code

What we are doing is essentially creating an "overload"ed version of multiple functor objects so that the final derived object can be used as a functor object that has the behavior of all its base functors.

If you carefully observe this means we can use this Visitor functor type to contain individual functor object that handles each type of the variant independently and put them collectively in the Visitor object and use this Visitor object to operate on the variant object so that the Visitor object can suitably choose from its overload resolution which base functor to call based on the currently held type in the variant.

However there is no straight-forward way to just call our visitor object with the variant object. This is where we need std::visit which does this overload resolution for us based on the currently held type of the variant and calls the appropriate base functor.

Inorder Traversal

The first function that we will create to operate on the tree is an inorder traversal function - we will print the inorder traversal of the tree.

First lets create the tree

We hard-code the tree for now, later we will create an insertion function but our tree will look something like the one above and its inorder traversal should print: 10 5 1 11

Now for the next part first I will first show the code and then we will see the explanation on what it does, that way it will be much clearer

  1. We can see our Visitor functor has 3 base functors one for dealing with Node<int> type, one for dealing with Leaf<int> and other for dealing with std::monostate
  2. If we visit a Node<int> we first visit the left child sub-tree recursively, then deal with our current node - print it, then visit the right child recursively, as you can observe our code is inherently inorder
  3. If we visit a Leaf node we simply print it. We can also define custom function for what to do if we reach a leaf node.
  4. For empty states where its neither a leaf or a node like right child of node 5 we simply return
  5. Every functor takes a const ref to the current object present in the variant we could take anything we want here copy, ref etc. For the unique_ptr however, we take const ref because we do not want the ownership of the elements

In order to visit the left child of the tree we need our inorder_print functor object to pass it into std::visit, before C++23 deducing this there was no straight forward way to access the lambda objects themselves in a recursive context like this and that too of the final object type of inorder_print(Visitor<Vs...>) and not the individual base lambda types. With C++23 deducing this we can simply reference inorder_print using self as the "this" object for the final derived type. As for std::visit it helps in choosing the correct overloaded lambda functor based on the current node's type being held in the variant.

Finally we call inorder_print on the root object which calls the Node<int> overload, and finally it prints

10 5 1 11        

Here's the godbolt link to the full code .

P.S. Clang19(and below) had a weird bug where it was unable to compile Visitor {} object as it was unable to infer the return type of the Node<int> functor I had to explicitly add `-> void` return type after which it compiled, GCC had no problem with it.

Height of Tree

Improvements before moving on

All previous functions were made using only Tree<int> type its time to improve upon this and make it more generic here's the inorder and height functions with generic paramters

We want each of these functions to be callable on a Tree<T> type something like

Tree<int> root {};

// build the tree

apply_fn(root, func);        

Essentially our apply_fn will take a Tree (root) and apply the function `func` on it. Here's how it looks

We take 3 arguments

  1. The tree itself
  2. The callable object
  3. The arguments to the callable

Then we see whether the callable returns a non void type (return_t) if so we return this to the caller else we simply ignore it. This is done at compile time using if constexpr

Inside the std::visit we visit each type contained in Tree<T> variant, this is what U&& node achieves by expanding our lambda at compile time and replacing U with all the types of Tree<T> variant we could create a Visitor object like before and provide all the lambdas separately and call our func on each one but this does it in one go.

Now we can simply do this

Inserting values in the Tree

For our insert function we need to consider 3 cases:

  1. When our tree is empty root node holds std::monostate
  2. When our tree has a single Leaf<T> node
  3. When our tree contains a non-leaf Node<T> which contains children that are again a new Tree<T> -> Leaf<T>, Node<T> or monstate

Now before moving on we need to make a slight change in the Tree definition because std::monostate doesn't store the T type of the current Tree<T> so in an empty Tree<T> we won't know what type of node to create, so we will create our own None<T> type that is just like std::monostate but is templated which allows us to capture the type in the template paramter, this will be useful in our insert function which we will see next

For each of the 3 cases described above:

  1. When our tree is empty we create a new Leaf<T> node and return this
  2. When our tree has a single Leaf<T> node we create a new Node<T> from Leaf<T>'s data and then move this Leaf<T> to left or right child of the Node<T> and assign our value to Leaf<T>, then return our new Node<T>
  3. When our tree has Node<T> we simply traverse the tree until we reach a Leaf node or None node and then we do the above 2. However as we traverse, since we mutate the tree, so we assign the new mutated tree each time we traverse and return

Again the `auto& child` lambda inside std::visit is a short-hand way to write our Visitor {} for all 3 tree types which calls self(insert) on each type in the Tree<T>: Node<T>, Leaf<T> and None<T>

This is how we use insert and other functions on the root node, you could further extend this to create search and erase functors as well but I will leave that as an exercise (I am feeling lazy as well so pardon me). If you are wondering what the S type is its just a helper type that I created to test my functions and see the copies and moves being done

You can check out the entire source code and the output here , fiddle around and try to see what else you can do from it but that's it for this one... until next time ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了