Homework 3: Exact String Matching

Assigned: 12/1/2014

Due: 12/9/2014

In this assignment, you will implement the Knuth-Morris-Pratt and Boyer-Moore exact string matching algorithms, and perform some simple experiments to learn about their performance characteristics. You will also choose a third string matching algorithm, implement it, and prepare a short presentation (≈5 minutes long) describing how it works.

Part 1: Implementing Knuth-Morris-Pratt and Boyer-Moore

Implement the Knuth-Morris-Pratt and Boyer-Moore algorithms, as described in Chapter 2 of the Gusfield text.

I have provided a Python file (matching.py) that implements the necessary pre-procesing routines, and also handles file I/O and timing for Part 2 of this assignment. In this file, you will find a completed function that implements the simple naïve matching algorithm, and also has two empty methods for Knuth-Morris-Pratt and Boyer-Moore. You should complete the empty methods.

Note that the file contains a class called PatternPreprocessor, which when given a string will pre-process the string and compute the various values (Z, sp', KMP failure function, etc.) needed for the KMP and Boyer-Moore algorithms.

What to turn in:

Part 2: Evaluation

While both Knuth-Morris-Pratt and Boyer-Moore are technically linear-ish-time algorithms, their performance characteristics are different depending on the kinds of text and patterns. test_data.tgz contains several sets of test data. Each set consists of a text file as well as a file containing several patterns to search for. The data sets are as follows:

In its current form, matching.py times how long each matching algorithm takes to match each pattern. Run it against each of these three data sets, and note the differences in how long the different algorithms take on the various sets. You should see some noticeable differences between Knuth-Morris-Pratt and Boyer-Moore!

What to turn in:

Part 3: Another Algorithm

Visit the Exact String Matching Algorithms site by Christian Charras and Thierry Lecroq, of the Laboratoire d'Informatique de Rouen at the Université de Rouen. Pick one of the other algorithms listed there, implement it inside the matching.py framework (as an additional search function), and measure its performance alongside Knuth-Morris-Pratt and Boyer-Moore.

What to turn in:

Note: because you will be presenting your work to the class, please let me know which algorithm you will be implementing as soon as possible. If another student has already claimed your choice, you will need to pick a different one.