Ordered sets in GO with treaps

Leandro Leon

Senior Software Engineer - Delivery & Engineering Lead at Genius Sports Group

发布日期: 2020年11月18日

Binary search trees are one of the main options for handling sets in which you have to insert, search and delete quickly. Besides, binary search trees allow obtaining the sequence ordered in O(N).

As many know, binary search trees are straightforward to implement, and their performance can be spectacular if the insertion key order is random. Unfortunately, insertion randomness cannot always be guaranteed. For this reason, algorithm researchers have searched for and found schemes to balance binary search trees for more than 50 years. We can classify the schemes into three types:

Randomized balance: we do things randomly in such a way that any bias in operations is “filtered”. The foremost exponents of these types of tree are randomized trees and treaps.
Guaranteed balance: we impose on the tree some condition that makes the imbalance between the branches of the nodes subject to such a guarantee. Examples of these trees are AVL and red-black; both guarantee the maximum height of the tree to O(log n ); where n is the number of keys. However, despite this guarantee, the algorithms are complicated and the constant cost to guarantee the balance is rewarded when you have many, many keys.
Amortized balance: we carry out some operations that in the long run, yield an average cost of log(n). The most famous exponent of this type of structure is the splay-tree. It is relatively simple and fast when there is a reference location. However, the constant cost for amortized operations is significant, and it rewards in applications that exploit the reference location.

In this post, I will present treap, which, after evaluating factors such as coding time, understanding of algorithms, performance, and constant operating costs for low scales, is the first best compromise as a binary search tree. If in uncertainty, someone asked me which of them to choose, without hesitation, my answer would be the treaps.

This post is structured in an introduction in which I will present the binary search trees, for those who do not remember it well, and the heap. Both structures are of interest to understand the treap. Then I will present the treap, the idea, the algorithms, their analysis and other considerations. So that no one accuses me of a lack of rigour, in the end, I put the study of the binary search trees.

Introduction

To understand treap it is necessary to understand the two data structures that support it: binary search tree and heap.

Binary search tree

Based on the fact that T = ? is a binary search tree (BST), a binary tree T = <LLINK(T), ni, RLINK(T)>, with root ni it is called “binary search tree” if y only if:

? n < ∈ LLINK(T) ? KEY(n<) < KEY(n)
? n > ∈ RLINK(T) ? KEY(n>) > KEY(n)

Said colloquially, for every node nI with key KEY(n i ), then all the keys contained in its left branch are less than KEY(n i ), while those included in its right branch are greater. This example from BST with randomly generated keys will help to verify:

It is worth noting a few things:

The infix (or preorder) traversal of the tree is completely ordered. To corroborate this, it projects the nodes of the previous tree perpendicularly onto a horizontal line below the tree; you will see that the keys are ordered.
The smallest element is the leftmost node.
The largest element is the node furthest to the right.

Every node has the following structure:

type Node struct {
	key      interface{} // generic key
	priority uint64
	count    int // number of nodes that I, as tree, contain
	llink    *Node // left child
	rlink    *Node // right child
}

Let us focus for now in the fields key, llink, rlink. We explain later the field count.

Search

A BST aims to emulate the binary search. The root divides the set of keys; the smaller ones on the left, the bigger ones on the right. One way to understand the resemblance to binary search is to look at the search procedure on a BST:

func (tree *Treap) search(key interface{}) interface{} {


	root := *tree.rootPtr
	for root != NullNode {


		if tree.less(key, root.key) {
			root = root.llink
		} else if tree.less(root.key, key) {
			root = root.rlink
		} else {
			break // key found!
		}
	}


	if root == NullNode {
		return nil // key not found
	}


	return root.key
}

This method operates on an abstract data type called Treap, which internally contains all the necessary stuff for handling a Treap. One of the interesting fields is the less comparison function. The structure of the process is designed for only using less function.

If key < root.key then we look for the key in the left branch; if not, then we compare root.key < key, which is equivalent to performing key > root.key. Otherwise, we conclude key == root.key and return the key. If the loop terminates without finding the key, then we return nil for indicating that the key was not found.

Visually we can have an immediate idea of how much a BST emulates binary search. We simply look at the nodes and how well their branches balance. The tree above is not perfect, but on overall, it is more or less balanced.

If a binary tree is evenly balanced, then the search is O(log n ), where n is the number of nodes the tree contains.

Insertion

Insertion is slightly more complicated, but still very simple:

func insertNode(root, p *Node, less func(i1, i2 interface{}) bool) *Node {
	
	if root == NullNode {
		return p
	}
	
	resultNode := NullNode
	if less(p.key, root.key) {
		return insertNode(root.llink, p, less)
	} else if less(root.key, p.key) {
		return insertNode(root.rlink, p, less)
	}
	
	return NullNode // key is already in tree ==> insertion fails

insertNode() is too much like search() because in the background it is a search, but failed, to find an empty tree in which the new node would be placed. For instance, for the following BST:

The next drawing shows the bracketed ranges in which new keys would be placed:

If we are going to insert, for example, 33, then we put it in the range [31..49] which is to the right of node 30. So we look for 33, which will find us the empty tree corresponding to the range [ 31..49] where we will place 33.

Exclusive union

Deletion is more complicated than insertion but still simple enough if we go in parts, the first of these consists in studying how two trees with disjoint ranks join.

Suppose we have two trees T< and T>. When we say that their ranges are disjoint, we mean that the largest key of T < is less than the smallest of T>. Having clarified this, and assuming that T< = tg and T> = tg, then T< ∪ T> can be implemented as follows:

func __joinExclusive(tsRootPtr, tgRootPtr **Node) *Node {
	
	if *tsRootPtr == NullNode {
		return *tgRootPtr
	}
	
	if *tgRootPtr == NullNode {
		return *tsRootPtr
	}
	
	(*tgRootPtr).llink = __joinExclusive(tsRootPtr, &(*tgRootPtr).llink)
	(*tsRootPtr).rlink = *tgRootPtr
	
	retVal := *tsRootPtr
	
	*tsRootPtr = NullNode
	*tgRootPtr = NullNode
	
	return retVal
}

The routine is called joinExclusive () to express with the suffix "exclusive" that the union (join) is disjoint. joinExclusive() returns the root of a new BST containing the join; the ts and tg trees become empty.

The meaning is much better explained pictorially. Suppose the operation is outlined as follows:

So the exclusive union is expressed like this:

That is, we choose T< (ts) and its right branch T> (tg) as the root of the result.

We could have chosen T > instead of T < as the root. The figure would be symmetrical.

Deletion

Now let's go to the second part of the elimination. If we have a BST and we want to remove a key, then we can do it as follows:

func remove(rootPtr **Node, key interface{}, less func(i1, i2 interface{}) bool) *Node {
	
	if *rootPtr == NullNode {
		return NullNode
	}
	
	var retVal *Node
	if less(key, (*rootPtr).key) {
		retVal = remove(&(*rootPtr).llink, key, less)
	} else if less((*rootPtr).key, key) {
		retVal = remove(&(*rootPtr).rlink, key, less)
	} else { // key found
		retVal = *rootPtr // this node will be deleted
		*rootPtr = __joinExclusive(&(*rootPtr).llink, &(*rootPtr).rlink)
		return retVal
	}
	
	if retVal == nil {
		return NullNode // key not found
	}
	
	retVal.llink = NullNode
	retVal.rlink = NullNode
	
	return retVal

Again we note the similarity with the search, which is explained by the fact that to eliminate the key, we must first look for it (and find it). If the key is not present in the tree, then we do nothing and return NullNode. Otherwise, the recursion will sometime stop at a root node whose key is key. At this moment, everything is straightforward: root will be the exclusive union of its branches.

BST performance?

The algorithms on BST are extraordinarily simple, but not their analysis, which, although it is not excessively complicated, because it is outside the purpose of this writing I will put it for you as an annexe at the end of this writing some references. For now, the essential thing to know is:

If the insertion sequence is random, then the average number of nodes visited in a random failed search is O(log n ).
Based on the previous result, we know that a search, successful or unsuccessful, will have an expected performance of O(log n ).
An insert pays for a failed search; the rest is constant. Consequently, an insert also costs on average O(log n). If subsequent insertions are random, then expectations are around O(log n ).
Eliminations break expectations. This is intuited if it is apprehended that the exclusive union executed by the elimination is asymmetric; in our case, the left branch of the eliminated node T< is arbitrarily chosen.

In short, if the inserts are random and we do not have deletions (or we do few), then BST is a great structure to operate very efficiently on sets of keys.

Heaps

A few years ago CACM published a beautiful column called “Programming Pearls” by Jon Bentley. All were excellent, but in memory of this writing it is worth remembering as a paraphrase of this text, called "Thanks Heaps", which, of course, was about the heap, a kind of complete binary tree with the following, Bentley Dixit, two fundamental properties on a T_H heap:

Order

? n k ∈ T_H ? KEY (LLINK ( n i )) > KEY(n i ) ∧ KEY(RLINK(n i )) > KEY(n i ), KEY (?) = ∞. That is, for every node nk, its two children have keys greater than KEY( n k ). For simplicity, we assume KEY (?) = ∞.

Shape

A heap is always triangular in shape; that is:

In other words, a heap is always perfectly balanced.

The following tree is an example of a heap:

As can be seen, the two properties are satisfied.

Since the claim of this writing is about the treap, not the heap, we will not get involved in analyzing the heap. Suffice it to say that the heap retrieves the smallest of all elements in a set in O(1) and inserts and removes in O(log n); all three operations are deterministic; that is, guaranteed. A spectacular performance then!

In case you don't know about heaps, then anything better than Bentley's own column: "Programming pearls: thanks, heaps." Jon Bentley. Communications of the ACM. Volume 28 Issue 3, March 1985. Pages 245-250

They can also be found in the following compendium: "Programming Pearls". Jon Bentley. Addison-Wesley. ISBN 0201657880.

Treap

The treap was discovered in 1980 by Jean Vuillemin and was rigorously studied by Cecilia Aragon and Raimund Seidel. The name is due to a combination of tree and heap; that is, it is a binary search tree but with some of a heap, specifically the order property.

A treap is a binary tree T in which each node nI has two fields, namely:

KEY(n i ) ∈ K is the search key, where K is any sortable set.
PRIO(n i ) ∈ P is the priority of the node, where P is any sortable set.

A binary tree whose nodes have this structure is a treap if and only if:

Condition order: T is a binary search tree as the keys K. Given a node ni, KEY(ni ) refers to its key. The K keys are arranged as in a binary search tree.
Priority condition: ? n i ∈ T , PRIO(ni ) ≤ PRIO(LLINK(ni )) and PRIO(ni ) ≤ PRIO(RLINK(ni )). Given a node ni, PRIO ( n i ) refers to its “priority”. In this sense, T is a heap: for each node, the priorities of its children are higher.

Nothing better than an example to finish apprehending the concept:

The upper fields of the nodes are the keys of the binary search tree, while the lower fields are the priorities of the pseudoheap.

As we will informally demonstrate later, the enormous goodness of a treap is that if the priorities are chosen at random, then it does not matter the order in which the keys appear, it does not matter if eliminations are interspersed, a treap always, always! , it will be equivalent to a binary search tree constructed from a random insertion sequence.

Recall that a randomly constructed binary search tree tends to be fairly balanced; consequently, search and insert are O(log n ) expected. In a treap, this expectation extends to disposal and other operations.

Operations on treaps?

Rotations?

To a treap to tend to be balanced as a binary search tree would be, it is necessary to make some adjustments from time to time. Several types of search trees make adjustments for balancing purposes; virtually all of them use "rotations" to make these adjustments.

A rotation is quickly understood graphically. This BST:

is equivalent to this one:

The tree with root p can be rotated clockwise, and the tree with root q is obtained. Similarly, if the tree with root q is rotated counterclockwise, then the original root p is obtained.

The infix path of the tree with root p is (α A β) B χ, while that of the tree with root q is α A (β B χ); that is, they are the same, which shows that the rotation does not affect the order property of a binary search tree.

When the tree with root p is rotated to the right, the resulting tree with root q loses one unit of height for its left branch, which translates into a gain of one unit of height per right branch. Symmetrically occurs with rotation to the left. This is the characteristic that makes rotation the fundamental operation for balancing trees.

Right rotation is implemented as follows:

func rotateRight(p *Node) *Node {
	q := p.llink
	p.llink = q.rlink
	q.rlink = p
	return q

The rotation to the left is symmetric:

func rotateLeft(p *Node) *Node {
	q := p.rlink
	p.rlink = q.llink
	q.llink = p
	return q
}

Essential to note that the rotation is O(1).

Given a treap with the general algorithm for inserting a node p can be summarized as:

p.priority = random number (priority is chosen randomly)
Insert p precisely as in a binary search tree.
Rotate p so that it levels up until its p.priority is higher than its parent.

Note that the difference between this general algorithm and the insertion algorithm in a binary search tree is only the third line.

Let us suppose the following treap to which a node (38,26) has just been inserted, as in a binary search tree.

As the node was inserted as in a binary search tree, it does not violate the order condition, but it does violate the priority condition, since 26 <54. So what we do is rotate the node (28,54) towards the left, which makes (28,54) be below (38,26):

But (38,26) violates his father's priority (39,30), which requires (39,30) to be rotated to the right:

Again, (38,26) violates the priority condition of its parent (12,28), which causes its rotation to the left, which results in the following definitive result:

Which is already a treap.

And how is this procedure implemented? Basically, the same way as the generic method: insert into a binary search tree and then rotate until the priority condition is restored:

func insertNode(root, p *Node, less func(i1, i2 interface{}) bool) *Node {
	
	if root == NullNode {
		return p
	}
	
	resultNode := NullNode
	if less(p.key, root.key) {
		resultNode = insertNode(root.llink, p, less)
		if resultNode == NullNode {
			return NullNode
		}
		
		root.llink = resultNode
		if resultNode.priority < root.priority {
			root = rotateRight(root)
		}
		return root
	}
	
	if less(root.key, p.key) {
		resultNode = insertNode(root.rlink, p, less)
		if resultNode == NullNode {
			return NullNode
		}
		
		root.rlink = resultNode
		if resultNode.priority < root.priority {
			root = rotateLeft(root)
		}
		return root
	}
	
	return NullNode // key is already in tree ==> insertion fails

?For the analysis, it is essential to note the similarity between the algorithm for inserting a binary search tree and this one that we have just presented.

Deletion?

Elimination can be approached in two ways. The first is structurally identical to the elimination we presented of the binary search tree based on the join () operation. The second, which we will develop, is based on the following general algorithm:

Search and find the key to delete. Let p be the node found containing the key to remove (if the key is not found, the algorithm ends).
Rotate p to level it up until it becomes a leaf. In this step, it does not matter that p violates the order property, since it will be eliminated. However, if for the Puritans, it is enough to assign it p priority ∞ so that the rotations reestablish the priority condition.
Once it has become a leaf, removal is simply to "cut" it from the tree.

Surprisingly, the place where a node is finally removed is exactly where it would be inserted. To do this, let's see how we eliminate the node (38,26) from the following treap:

Here you should realize that the direction of rotation is not arbitrary. If we rotate (38,26) to the left, then node (39,30) will violate the priority condition concerning the node (12,28). Therefore, we must rotate it to the right, so that the lowest of the priorities of the children of p is the one that rises. Thus, the state of the treap after this rotation is:

The general rule of thumb for deciding the direction of rotation is to the opposite side of where your lowest priority child is. Thus, the following rotation is made to the left, that is, to the opposite side where its right child is (39,30), which gives us the following result:?

In this situation we find that (38,26) “he has only one son”; so if we rotate it to the side where it does not have its child then (38,26) becomes a leaf:

Here (38,26) is already a leaf, so we only have to cut it so that we can complete the elimination.

A fascinating trick?

One way to simplify the instrumentation of the elimination algorithm, which saves us an if to verify whether or not any of the children is NullNode (or empty tree) is to use a sentinel node to represent the empty tree. Thus, the constant NullNode will represent the null node and will have the highest possible priority value. In this way, the elimination performs the rotations in the correct directions without noticing whether or not any of the children is the empty tree.

With this trick in hand, we are ready to present the instrumentation of the elimination algorithm:

func remove(rootPtr **Node, key interface{}, less func(i1, i2 interface{}) bool) *Node {
	
	if *rootPtr == NullNode {
		return NullNode
	}
	
	var retVal *Node
	if less(key, (*rootPtr).key) {
		retVal = remove(&(*rootPtr).llink, key, less)
	} else if less((*rootPtr).key, key) {
		retVal = remove(&(*rootPtr).rlink, key, less)
	} else { // key found
		retVal = *rootPtr // this node will be deleted
		*rootPtr = __joinExclusive(&(*rootPtr).llink, &(*rootPtr).rlink)
		return retVal
	}
	
	if retVal == nil {
		return NullNode // key not found
	}
	
	retVal.llink = NullNode
	retVal.rlink = NullNode
	
	return retVal

The routine returns the pruned node of the treap if the actual key is in the tree, or NullNode otherwise.

The routine can be separated into two parts:

Search and find the node to remove (the one containing the key)
Replace the node with the join exclusive of its subtrees.

Note that before pruning the node to be removed, the state of the treap is precisely the same as it would have been if the node to be removed had just been inserted.

Now, the previously studied join exclusive does not work because it does not respect the priorities. Also, it causes an asymmetry that tends to unbalance the tree. We need a particular version of join exclusive that take into account the priorities, which break the bias and make the tree random.

Join on treaps?

It is straightforward, and its principle is: choose as root the node with the lowest priority. In this way, we keep the heap order condition:

func __joinExclusive(tsRootPtr, tgRootPtr **Node) *Node {
	
	if *tsRootPtr == NullNode {
		return *tgRootPtr
	}
	
	if *tgRootPtr == NullNode {
		return *tsRootPtr
	}
	
	if (*tsRootPtr).priority < (*tgRootPtr).priority {
		(*tsRootPtr).count += (*tgRootPtr).count
		(*tsRootPtr).rlink = __joinExclusive(&(*tsRootPtr).rlink, tgRootPtr)
		return *tsRootPtr
	}
	
	(*tgRootPtr).count += (*tsRootPtr).count
	(*tgRootPtr).llink = __joinExclusive(tsRootPtr, &(*tgRootPtr).llink)
	return *tgRootPtr
}

Handling Ranks (extended trees)?

We can deal with a treap, and in general with any BST, as it were a kind of array. That is, we can quickly know the inorder position of any key or access the key given a position. For that, we use a counter in every node which stores the number of nodes that the subtree has. Let us see the situation in a general way:

Given any subtree root, the value root.llink.count indicates the number of nodes that in the inorder traversal precede to root. So this number gives us the position of root respect to the ordered sequence. We can use this knowledge for locating any node given a specific and desired position in the order:

func __choose(root *Node, pos int) *Node {


	for i := pos; i != root.llink.count; {
		if i < root.llink.count {
			root = root.llink
		} else {
			i -= root.llink.count + 1
			root = root.rlink
		}
	}
	return root

The routine stops when the root position is the desired one pos; that is when root.llink.count == pos, which is handled with the variable i. If i is less then we go to the left branch. Otherwise, we goto to the right but taking into account that we are leaving root.llink.count + 1 nodes behind.

An essential observation is that we put zero nodes as the counter value in the sentinel node nullNodePt. In this way, the first node as position zero, as expected.

Many more operations involving position are possible and somewhat simpler, especially split and elimination by position. Let us see the split:

func __splitPos(root *Node, i int) (l, r *Node) {
if i == root.llink.count {
	l = root
	r = root.rlink
	l.rlink = nullNodePtr
	l.count -= r.count
	return
}


if i < root.llink.count {
	lp, rp := __splitPos(root.llink, i)
	l = lp
	r = root
	r.llink = rp
	r.count -= l.count
} else {
	lp, rp := __splitPos(root.rlink, i-(root.llink.count+1))
	r = rp
	l = root
	l.rlink = lp
	l.count -= r.count
}
return
}

The routine splits the tree is two. Tree l contains the keys until position i -1, while tree r the remaining ones. Note that l and r and exclusive and they could be joined producing exactly the original tree.

Golang package

Full implementation of treaps with ranks can be found here

The truth about treaps (analysis)

What makes a treap extraordinarily interesting is the following theorem.

Equivalence theorem between a treap and a binary search tree ?

If the priorities of a treap T are chosen at random, then T is always wholly equivalent to a randomly constructed binary search tree.

Before proceeding to the demonstration, let us reflect on the consequences of the truth of this knowledge.

We know that if the insertion order is random, then a binary search tree has a performance of O(log n) for search, insert, and at least one delete. But if the theorem is true, then in a treap it does not matter the order of insertion, since this will be equivalent to a binary random search tree. Better yet, in the treap, we can make as many eliminations as we want without losing hope of having an O(log n) time.

But all this is not true until it is proven. So let's get down to it.

Treap uniqueness lemma?

Let K = { k 1 , k 2 ,…, k n } be a set of keys and P = { p 1 , p 2 ,… p n } a set of priorities. Then, the set {( k 1 , p 1 ), ( k 2 , p 2 ),… ( k n , p n ),} ? K × P has a single treap.

The proof is by induction on the number of nodes n :

n = 1: In this case, there is a single treap made up of a single pair. The lemma is true for n = 1.
n > 1: Now we assume that the lemma is true for all n and we check its veracity for n +1. Since the priorities are unique, the n +1 node treap can only have one root whose priority is the lowest of all. Let ( ki, pi ) be the root node of the treap. Once the root of the treap has been determined and because of the order property of a BST, the set of nodes of the left and right branches is determined in K < = {( k 1 , p 1 ),…, ( K i ?1 , p i ?1 )} and K > = {( k i +1 , p i +1 ),…, ( k n , p n )}, respectively. By the inductive hypothesis, K < and K >, whose cardinalities are less than n +1, have unique treaps. The lemma is true for all n ?

The lemma of uniqueness explains why the tree through which a new key is inserted is identical to the tree through which the same key is removed: the treap is always unique.

Proof of the equivalence theorem?

Let us assume a set KP = {( k 1, p 1 ), ( k 2, p 2 ),…, ( k n, p n )} of pairs (key, priority), in which the keys are assigned randomly.

Note that according to the uniqueness lemma the treap resulting from the set KP is unique, regardless of the order in which the pairs are inserted. Under this, we can assume a particular insertion sequence KP ′ = {( k 1, p 1 ) ′, ( k 2, p 2 ) ′,…, ( k n, p n ) ′ } which is ordered by priority; that is, the priority p 1 in ( k 1, p 1 ) ′ is minimum and p n in ( k n, p n ) ′ is maximum. This assumption is possible because anyway, checked by the uniqueness lemma; the resulting treap will be the same as any other insertion sequence. But the sequence KP ′ is of great interest under the following observations:

KP ′ is a random sequence, regardless of whether it has been ordered by priority. Let us informally test this proposition through an imagination exercise. Consider the order in which we will arrange for a group of n people to sit in a room. To do this, suppose that the sits are numbered from 1 to n according to order in the row, then in the column. If we do nothing, then the order in which the students sit is biased by various factors such as the order of arrival, the custom of each student and other things, but in any case, the arrangement of the students at the n desks will be random. Now suppose that when a person arrives, they sit number, where they will be placed, is uniformly drawn. In this case, the location is decided strictly by chance, not by some other factor. Now if we see the sits numbered from 1 to n, then the permutation of names of people seated by this random arrangement mechanism is a random permutation. In short, the sits are numbered and ordered, but the people are placed at random. The sequence of people is random or what we are interested in is the same: the sequence KP ′ is random. Thus, for any treap, in the exercise of ordering KP by priority and obtaining KP ′, the latter (KP ′ ) is random. Well, let us remember or insist that we know from the lemma of uniqueness that the resulting treap will be the same. KP ′ can always be viewed as a random sequence. It remains for us to show that its treap is equivalent, identical, to what would be obtained with a binary search tree.
The sequence KP ′ causes absolutely no rotation in the treap. Indeed, since KP ′ is ordered by priority, each insertion of pair ki, pi will have its parent with a lower priority, which is why there would never be a need to rotate it. AHA! But it turns out that in the absence of rotations, inserting into a treap is computationally identical to inserting into a binary search tree! That is, the third part of the algorithm presented in § 2.1.2 ("Rotate p so that it rises in level until its PRIO priority(p) is higher than its parent") will never be executed. Without this part of the algorithm, the result is precisely the insertion in a binary search tree.

Since KP ′ is a random insertion sequence, which does not cause rotations in the insertion of the treap, then the treap is equivalent to a randomly constructed binary search tree; with the significant addition that it does not matter, the order of insertions or that deletions occur.

Performance of randomly built BST?

As we have mentioned, it can be proven that a randomly build BST will have a height of O(log n). Consequently, the insertion and search have an expected performance of O(log n). This proof is more or less complex and tricky and is outside of the scope of this post.

The simplest approach that I know is shown in the book "Data Structures and Their Algorithms" of Harry R. Lewis and Larry Denenberg. Also, the famous Knuth "The Art of Computer Programming" volume I shows essentially the same, although more rigorous and complex.

Space consumption?

Perhaps the main criticism against treap is the consumption of space due to the priority that must be put in each node. We can say that the only types of BST that beat to the treap in space are the pure and traditional BST, without balance guarantee when the insertions are not random or if there are no deletions or the top-down splay trees. Splay trees are fantastic if the application exhibits locality reference, but this is not always the situation. Besides, this locality of reference is not too big, maximum of four tree levels, which gives 16 keys, which is not really much.

A red-black tree or an AVL tree require additional space for storing the height difference of the node colour. Nominally, in both cases, a byte is enough. However, modern architectures require that the full node layout is aligned with the hardware word size. Some authors profit the unused bits of the child pointers for storing this information. I have not found this approach convenient because (1) the implementation complexity increased along with the constant time, and (2) anyway the node layout could result equivalent because it will depend on the key size.

There is, however, a way to not save the priority and thus have the same cost in space as the AVL or red-black.

Priorities as a hash function?

The trick is straightforward: instead of randomly selecting a priority value, it is calculated by a hash function h(k ): - → N that transforms the key to a priority value. Each time the priority is desired, the function is recalculated.

Of course, the quality of the treap depends on how good the hash function is.

Conclusion?

Treaps are my first choice when choosing a binary search tree. What follows is my why.

In general, the manipulation structures on key sets based on binary trees handle times per operation of O(log n). Besides, binary trees have the extreme goodness of preserving the order between the keys, allowing an iterator to look at the keys in order, without sacrificing O(log n ) performance.

Traditional trees are elementary, but the O(log n ) expectation for classical trees holds if and only if the insertion sequence is random and no deletions occur, which is not always the case. For that reason, equilibrium conditions are added to deal with biases. In the case of treap, it is the priority condition coupled with its random selection that breaks the bias and which provides an O(log n ) expectation for all situations, as long as the priority selection is random or, if it is the case, the hash function emulates a random distribution. The following tree of 512 keys illustrates the balancing of a treap:

In contrast, other deterministic equilibrium approaches produce better trees. For example, above tree like AVL looks like this:

As you can see, it is about half the height, which results in better searches. However, while better as a tree, AVL carries a constant cost for insertions and deletions, which is by far more expensive than the treap. Only insertions and deletions are faster for AVL or red-black tree when we handle millions and millions of nodes (about 2^32).

With these approaches made, I can say that my preference for treaps is summarized in that their approach to maintaining balance is the simplest I know. Other trees, the randomized ones, splay, AVL and red-black, for example, have equilibrium conditions that are more difficult to implement but, above all, with higher constant costs.

If space is determinative, then I would use treaps that calculate their priority with a hash function.

Only if the number of keys is huge or the memory consumption is decisive, then I could choose a deterministic tree; an AVL or red-black, for example.

要查看或添加评论，请登录

Leandro Leon的更多文章

Seven deadly sins of a technological startup

2018年3月29日

Seven deadly sins of a technological startup

The absence of a specific and realistic strategic goal or the deviation from it. If this fails, you are practically…

Ordered sets in GO with treaps

Leandro Leon

Senior Software Engineer - Delivery & Engineering Lead at Genius Sports Group

Introduction

Binary search tree

Search

Insertion

Exclusive union

Deletion

BST performance?

Heaps

Order

Shape

Treap

Operations on treaps?

Golang package

The truth about treaps (analysis)

Equivalence theorem between a treap and a binary search tree ?

Treap uniqueness lemma?

Proof of the equivalence theorem?

Performance of randomly built BST?

Space consumption?

Priorities as a hash function?

Conclusion?

Leandro Leon的更多文章

社区洞察

其他会员也浏览了

Navigating the Depths of Data Structures and Algorithms: A Comprehensive Exploration in Plain English

How we handle billion-scale graph data (and you can too)

Building an LLM-Driven Stock Price Forecast (Prediction) with the S&P 500 Public Dataset

PostFix Expression Evaluation | How Mathematical Expression Evaluate In Backend | Practical Application of Stack | DSA

Graph Search: Essential Techniques for Developers

A Sunday Evening Activity with my drink: Bringing LLM to My Personal Data

The Memory Machine: Setting Up Your Vector Database with Weaviate

Embracing Uncertainty in Data: The Power of Fuzzy Support Vector Machines ????

Safetensors vs GGUF

Some Algorithms Are Just Beautiful. Don’t Believe Me? Read This!

Introduction

Binary search tree

Search

Insertion

Exclusive union

Deletion

BST performance?

Heaps

Order

Shape

Treap

Operations on treaps?

Golang package

The truth about treaps (analysis)

Equivalence theorem between a treap and a binary search tree ?

Treap uniqueness lemma?

Proof of the equivalence theorem?

Performance of randomly built BST?

Space consumption?

Priorities as a hash function?

Conclusion?

Leandro Leon的更多文章

Seven deadly sins of a technological startup

社区洞察

其他会员也浏览了

Navigating the Depths of Data Structures and Algorithms: A Comprehensive Exploration in Plain English

How we handle billion-scale graph data (and you can too)

Building an LLM-Driven Stock Price Forecast (Prediction) with the S&P 500 Public Dataset

PostFix Expression Evaluation | How Mathematical Expression Evaluate In Backend | Practical Application of Stack | DSA

Graph Search: Essential Techniques for Developers

A Sunday Evening Activity with my drink: Bringing LLM to My Personal Data

The Memory Machine: Setting Up Your Vector Database with Weaviate

Embracing Uncertainty in Data: The Power of Fuzzy Support Vector Machines ????

Safetensors vs GGUF

Some Algorithms Are Just Beautiful. Don’t Believe Me? Read This!