Recently I have to implement prefix searching for a project. The searching need to be scalable and the result are to be returned as quickly as possible. The front end for this search was a Ajax suggestion like search. The screen shot below shows the prefix searching in action.
Ok this pretty simple Ajax stuff, nothing fancy and nothing new. What I am going to discuss is the backend not the frontend for such a prefix searching component.
One of the most common implementation used for prefix searching is:
select * from some_table where some_field like ‘prefix%’
In other word put all the words to be searched in a table and use the like clause. This works and has reasonable performance too for small number of queries. But if you have continuous deletion and insertion of new search words then this solution does not scale well for large number of users.
So let put these words in memory for faster response time. How do we search prefixes if we have all the words in the memory?
Put them in a list and do a sequential search? Ok you guess that too slow….. Put them in a hash table? What key we are going to search in the hash table we have only part of the key (prefix)!!
You must agree prefix searching is a specialized problem which requires a special data structure that is geared toward prefix searching. You have read the title so you know the name of that data structure. Yes Radix tree is one of such data structure that is suitable for prefix searching. A radix tree, Patricia trie/tree, or crit bit tree is a specialized set data structure based on the trie that is used to store a set of strings.
Trie is a multi-way tree structure useful for storing strings over an alphabet. The idea is that all strings sharing a common stem or prefix hang off a common node. The elements in a string can be recovered in a scan from the root to the leaf that ends a string. All strings in the trie can be recovered by a depth-first scan of the tree.
Below is a trie for keys “to”, “tea”, “ten”, “i”, “in”, and “inn”.
You can consider a Radix tree a condense form of Trie, in which intermediate nodes are merged together to save space. Below is shown the insertion process in a Radix tree.
I wrote a simple Java implementation for radix tree for the prefix search application mentioned above. You can find the code at http://code.google.com/p/radixtree/
I like tree structures because of their recursive nature, looks really natural to me
Most of the operation on Radix tree like insert, delete, find, etc can be easily implemented a recursive manner. Apart from the insert method another interesting method in implementation is visit() method it is there to help with other operation like delete, find and contains
/**
* recursively visit the tree based on the supplied "key". calls the Visitor
* for the node those key metches the given prefix
*
* @param prefix
* The key o prefix to search in the tree
* @param visitor
* The Visitor that will be called if a node with "key" as its
* key is found
* @param node
* The Node from where onward to search
*/
private void visit(String prefix, Visitor visitor,
RadixTreeNode parent, RadixTreeNode node) {
int i = 0;
int keylen = prefix.length();
int nodelen = node.getKey().length();
// match the prefix with node key
while (i < keylen && i < nodelen) {
if (prefix.charAt(i) != node.getKey().charAt(i)) {
break;
}
i++;
}
// if the node key and prefix match, we found a match!
if (i == keylen && i == nodelen) {
visitor.visit(prefix, parent, node);
} else if (node.getKey().equals("") == true // either we are at the
// root
|| (i = nodelen)) { // OR we need to
// traverse the childern
String newText = prefix.substring(i, keylen);
for (RadixTreeNode child : node.getChildern()) {
// recursively search the child nodes
if (child.getKey().startsWith(newText.charAt(0) + "")) {
visit(newText, visitor, node, child);
break;
}
}
}
}
Off course you can have better look at the implementation by visiting the online code repository at http://radixtree.googlecode.com/svn/trunk/RadixTree/
Update: I have remove the link to the old download distribution, Please download the latest code from http://code.google.com/p/radixtree/ As there are some bug fixes and improvements.