A high performance alternative to Dictionary and Collection: Treaps in Visual Basic (VB6)

VB6 Collection Performance

 

The Collection is the single data structure provided by VB5 and VB6, and was designed with the “one size fits all” philosophy. It allows you to access the elements in it by a key and also sequentially, and as most multi-use tools, doesn’t perform very well in either scenario.

 

Much worse than the performance issues, are the serious limitations in Collection: as soon as you add a key, there’s no way to change it, and even worse, there’s no way to know which keys have been entered.

 

Dictionary

 

 

 The Dictionary object came with the MS Scripting runtime dll and provides a much better implementation than the Collection. Even though much better than the humble Collection, Dictionary’s iteration methods are still a bit primitive and wasteful (from a performance point of view) when it contains many items.

 

While delivering applications that used Dictionary (EasyJob Resume Builder http://www.easyjob.net), I encountered several serious issues that forced me to ditch it:

 

·         The MS Scripting Runtime dll (sccrun.dll) wasn’t correctly registering in some systems, making the app crash.

·         On some systems, the creation of a Dictionary object was being blocked by over enthusiastic security software that tried to prevent any script from running.

 

Enter the Treap

 

(hardcore nerd-talk ahead: you may skip this section if you want)

 

I have no idea how the Dictionary or Collection are implemented, but they are probably based on a Balanced Binary Tree or a Hash Table.

 

The Treap is also a Binary Balanced Tree, but with a twist that makes it very simple (if compared to other BBT’s, such as AVL Trees or Red-Black Trees) and very efficient in terms of both speed and memory.

 

AVL and Red-Black trees are cool, because they guarantee O(log n) insert, lookup and deletion time, but are complicated to implement, as their rebalancing routines are complex.  However, if we are willing to accept a probabilistic assurance of good performance, Treaps provide a much simpler solution.

The stroke of genius behind the Treap is to use randomness to keep a binary tree balanced. Binary search trees are great as long as they remain balanced. But if the elements are inserted in order, then the tree will turn into a linked list with very poor O(log n) performance. On the other hand if the elements are inserted in random order, then the tree will remain balanced.

 

So here’s the magic cookie: no matter how you add the elements to the Treap, it will have the same structure it would have if the elements had been inserted in random order into a binary search tree.

 

The Treap achieves this by using a random priority field in every node. The Treap behaves as a binary search tree with respect to the keys in the nodes, and also as a Heap with respect to the priorities. Hence the name: Treap = Tree + Heap.

 

The algorithm to insert a new node is similar to that of red-black trees:

1.      Find the unique leaf where the node can be inserted while preserving the Binary Search Tree property.

2.      Perform a series of tree rotations to satisfy the Heap ordering property.

 

Treap for Visual Basic 6

 

The main class is CTreap.

 

Values can be anything you can put into a Variant (strings, longs, singles, objects, other Treaps, whatever).

Keys can be almost anything, except Arrays: dates, strings, numbers, objects, etc….

If you try to add an Array as a key, you will get a Wrong Key Type error.

 

Methods

 

add(ByVal aKey As Variant, ByRef aValue As Variant) As Boolean

 

Adds a value to the Treap. Unlike a Collection, the key is mandatory. Returns true if the key wasn’t already present and returns false if the key was already present (it won’t add the value in this case).

 

clear()

 

Zaps the whole Treap, leaving it empty.

 

getValue(ByVal aKey As Variant, ByRef aValue As Variant) As Boolean

 

Retrieve a value by its key.

Since VB's assignment syntax is different for objects and non-objects, you can't just return the value, as the user won't know, a priori, which syntax (set x = y or x = y) to use. So we take a ByRef parameter and assign it to the value. This way the utter ugliness remains relatively hidden.

 

Whoever designed this should be sent to Abu Graib.

 

Returns true if the value was found, and false if it was not found.

setValue(ByVal aKey As Variant, ByVal aNewValue As Variant) As Boolean

 

Sets the new value for an existing node. Returns True if the node exists, and False if it doesn't.

 

changeKey(ByVal oldKey As Variant, ByVal newKey As Variant) As Boolean

 

Change the key of an existing node. If the node exists, return true, otherwise, return false.

 

exists(ByVal aKey As Variant) As Boolean

 

Detects the presence of a key.

 

isEmpty() As Boolean

 

Returns True if the Treap is empty.

 

remove(ByVal aKey As Variant)

 

Remove an element.

 

Keys(Optional ascendingOrder As Boolean = True) As IIterator

 

This is quite different from Collection and Dictionary. It returns an instance of CKeyIterator ( a class that implements the IIterator interface).

 

The only methods of CKeyIterator that you have to worry about are hasMore() (returns True until you reach the last key in the Treap) and getItem(item As Variant).

 

Example:

 

‘Iterate through the keys forwards and backwards

‘at the same time

 

Dim vKey As Variant

Dim rForwards As IIterator

Dim rBackwards As IIterator

 

 

Set rForwards = m_rtreap.keys(True)

Set rBackwards = m_rtreap.keys(False)

 

Do While rForwards.hasMore

    Call rForwards.getItem(vKey)

    Debug.Print CStr(vKey)

    Call rBackwards.getItem(vKey)

    Debug.Print CStr(vKey)

Loop

 

Values(Optional ascendingOrder As Boolean = True) As IIterator

 

Similar to Keys(). Returns an instance of CValueIterator. Allows you to iterate the values in a Treap.

Properties

 

Count

 

Read only. Returns the number of elements in the Treap.

 

Events

 

If you declare it with withevents, you can observe the changes in the treap with these events:

 

Public Event onAdd(ByVal Sender As CTreap,

         ByVal newKey As Variant,

         ByVal newValue As Variant)

 

Public Event onKeyChange(ByVal Sender As CTreap,

ByVal oldKey As Variant,

ByVal newKey As Variant)

 

Public Event onValueChange(ByVal Sender As CTreap,

ByVal Key As Variant,

ByVal newValue As Variant)

 

Public Event onRemove(ByVal Sender As CTreap,

  ByVal oldKey As Variant)

 

Public Event onEmpty(ByVal Sender As CTreap)

 

 

 

Contact

 

If you find any bugs ore have any suggestions, please drop me a line.