COMPX301-24A -Assignment Three
Boyer-Moore string search
Due: Thursday, 16 May 2024-11:30pm
Overview: You are to implement the BM (Boyer-Moore) string search algorithm for plain-text (i.e.l searching for plain character strings inside plain text documents, like searching for the stringl "whale" in our MobyDick.txt) using the method described in lectures.
Your solution will have two parts: a program called MakeBMTable. java and another calledlBMSearch. java.
Existing partnerships can divide the labour however they like, but those without partners will bel assigned both a partner and one of two the programs, which are roughly equal in difficulty. In thisl way, assigned partners can, if they want, more or less work independently (not the best plan, though), and only need to make sure the output format of the skip array from the first program isl correct and consistent as expected input for the second prgram. Note that a test skip array (alsol called a skip table) for a simple target string can be composed by hand at the outset so that whoever is working on BMSearch can proceed immediately and alone, while their assigned partner worksl on the other program to produce such a skip array. Thus, each program can be tested independently, so if the partnership fails to function well then each student can still be marked for their work.
Part One: MakeBMTable
The first program accepts a string as its first command-line argument; then computes a correspondingl BM skip array for that string and stores it in a file whose name is given as a second command-linel argument. The format of the output is a plain-text comma-separated-version of a skip table similarl to that deseribed in lecture, with one row for the target string, then one row for each uniquel character in the target string, and finally one more row for all characters not found in the target string. Each cell, indexed by the row and column, gives an integer specifying how far to move thel search forward when looking for the target character at the position corresponding to this columnl and finding the character corresponding to the matching row.
The rows for characters found in the string should be in alphabetical order (according to the resultl of a java character comparison), but the first row (corresponding to the pattern) and last rowl (corresponding to all characters not in the pattern) are prefixed with an asterisk in the firstl column just so that every row has the same number of non-empty fields, separated by commas. An example of usage for the first program is as follows:
% java MakeBMTable "kokako" table. txt
which computes the skip array for "kokako" and puts it in the file called "table, txt" The contentsl of the file, in this instance, should be:
*, k, o, k, a, k, o
a, 4, 4, 4, 0, 6, 2
k, 0, 4, 0, 4, 0,1
o, 4, 0, 4, 4, 6,0
*, 4, 4, 4, 4, 6, 6
Once again, the asterisks at the start of the first and last rows are just to have a uniform. inputlformat for each line. The first row is the pattern, and the last row has the skip distances forl characters not in the pattern anywhere.
Note that putting double-quote characters around the target string on the command-line is not alwaysl necessary, but ensures the entire contents of that string makes its way into the String args of thel java static main method. They are stripped off by the shell, but do allow you to include in yourl target pattern characters that may otherwise have special meaning to the shell ... such as a plus sign or a bracket.
Part Two: BMSearch
This program performs the string search, and accepts two filenames as command-line arguments; the first being the name of the file with the skip array, and the second being the plain text file tol search through. The output should be every line of the text that contains the string; but just once,l regardless of how many times the target string occurs in that line (e.g. same as default behaviourl of Unix grep command).
An example of usage for this program is as follows:
%java BMSearch table.txt MobyDick. txt
which would produce no output if table. txt was constructed for "kokako" because that word does notl appear in Moby Dick, but should produce four lines from Moby Dick if the table had been constructedl for "gull".
Submission: Submission is done electronically via Moodle in the same was a previous assignments. l Student names and IDs should be in the header comments of all source code as appropriate (i. e. ifl you worked on it, your name should be listed as an author; and if you didn't, then it shouldn't) andl any ideas borrowed from online sources must be appropriately cited (i.e. what was obtained froml where and when), although it is expected that the entire solution can be developed from scratchl based solely upon our course materials/lectures.
Be sure all your code compiles and runs as expected on R Block linux lab machines!!
Place copies of only your source code (i.e. .java files) and an optional README. txt in an otherwise empty directory whose name is your two student ID numbers separated by an underscore. No compiledl code, no test files.. just the source code and whatever you want to say in the README. Only onel partner needs to upload the folder by the due date, and the other partner should check that thel upload was successful. This may take a little communication.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。