The complexity of Sorting a Binary Tree

Introduction

A binary tree is a data structure, that is, a way of storing information in an organized form. Knuth defines them as follows:

A binary tree is a finite set of nodes that either is empty or consists of a root and two disjoint binary trees called the left and the right subtrees of the root.

The generic structure of a binary tree is:

  A
 / \
B   C
   / \
  D   E
     / \
    F   G
   / \ / \
  H  L M  N

There are of course non-binary trees. However, it is important to note that a binary tree is not a special case of a tree, but is a different concept. For example, trees

  A   A
 /     \
B       B

are identical when considered as ordinary trees but different when analyzed as binary trees.

In a binary search tree, each node is identified by a key, which is stored respecting the following property:

Let \(x\) be a node of a binary tree. If \(y\) is a node in the left subtree of \(x\) then \(key[y]\leq key[x]\). If \(y\) is a node in the right subtree of \(x\), then \(key[y]\geq key[x]\).

Basic operations in binary search trees

Suppose a set of data, for example, a database \(D\) which contains information in ASCII format. Each row or record in the database is made up of a series of distinct fields or information identified by a key. Let \(n\) be the number of records of the database, each consisting of \(N\) fields. Then, we have a key field and \(N-1\) fields containing the associated information. Suppose that the key is unique for each record. It is possible to store \(D\) organized as a binary search tree based on the property mentioned above.

Basic or primitive operations in the binary search trees are SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, and DELETE. Computational complexity depends on the concept of tree height \(h\), which we can informally define as the number of levels of which the tree is composed. We will illustrate this concept using as an example the SEARCH operation.

Suppose we have a key \(k\) and we want to retrieve the associated fields of \(D\) for \(k\). The problem consists in the identification of the node \(x\) such that \(key [x] = k\). So, we move into the tree, starting from the root node, comparing our key with the keys of the nodes we visit. Note that each move involves the descent of a level in the tree. If the key is unique, it should be clear to the reader that the computational complexity of the search is at most equal to \(h\) and the search can be done in time \(O (h)\). This behavior is also satisfied by the other primitive operations, so we have the following important and intuitive result:

Theorem 1. SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT and DELETE operations can be performed in a search binary tree of height \(h\) in time \(O (h)\).

Sorting and construction of binary trees. Balancing

Not all binary search trees are equally efficient when performing a primitive operation. The key to improving efficiency is given by the fact that computational complexity depends on \(h\) and not on \(n\). The way the elements are arranged in the binary tree affects its height. In general, we can enunciate the problem of the optimal construction such as the search for the arrangement of the nodes that leads to the tree with the minimum height.

The worst-case scenario is a database \(D\) already sorted by key. If we build a binary tree through insertions of the \(n\) records in the original order we will obtain a tree that contains only right or left subtrees, depending on whether the order of the keys is respectively ascending or descending. In this case \(k = n\) and according to Theorem 1 a primitive operation occurs in time \(O (n)\).

If \(D\) is disordered, building a binary tree based on INSERT operations produces a structure with \(h <n\). In this case, the tree is said to be balanced, and the following result can be demonstrated:

Theorem 2. The average height of a randomly constructed binary search tree with \(n\) distinct keys is \(O (\log_ {2} n)\).

From the Theorems 1 and 2, we conclude that any primitive operation performed on a binary search tree takes time \(O (n)\) in the worst case and \(O (\log_ {2} n)\) in the average case. The construction of a tree based on the insertion of the \(n\) records of \(D\) therefore requires time \(O (n ^ {2})\) in the worst case and \(O (n \log_ {2} n)\) in the average case.

One of the typical uses of binary trees is that of ordering \(D\), build a procedure that generates the data sorted by key. Such an algorithm is called a tree sort, and the orderly traversing of the tree generates the sorting of the original data. The problem can be solved with a simple recursive algorithm called visit in symmetric order, based on the property we have previously enunciated, which implies that the key of the root of a node is included between the values of the keys of its left and right children. The following pseudocode lists the keys of a binary tree in ascending order:

BINARY-TREE-ORDER (x)
    if x! = NIL
    then BINARY-TREE-ORDER (leftt [x])
        print key [x]
        BINARY-TREE-ORDER (right [x])

Given that a database of \(n\) records is stored in a binary tree of \(n\) nodes, and since after the initial call the procedure is called recursively twice for each node of the tree, one for the left child and another for the right child, the number of elementary operations of the BINARY-TREE-ORDER function is \(2n\) and the computational complexity of the procedure is \(O (n)\).

Conclusions

Binary search trees are used in many IT procedures. However, the basic theory illustrated in this article is not without problems. For example, when the height of the tree becomes very large, the performances become comparable to those obtained with a linked list. Some variants solve these drawbacks. Examples are self-balancing binary search trees and RB-trees (Red-Black).

RB-trees are used within many database engines. Compared to standard binary trees, they also contain a binary field called color. Through precise rules of coloring the nodes, it can be obtained that no path is more than twice as long as any other.

All these variants of the binary trees are designed to pursue the same goal: the optimal construction that allows obtaining an optimal balance and results in a tree of minimum height.

References

  1. Donald Ervin Knuth. Fundamental Algorithms. The Art of Computer Programming, Vol. 1, 2. ed. Addison-Wesley, 1990. ISBN: 0-201-03821-8.
  2. Thomas H. Cormen. Introduction to Algorithms. 2nd ed. MIT Press, 2001.