Write a Java program called WordMatch.java. This program takes four command-line arguments. For example: java WordMatch in1.txt out1.txt in2.txt out2.txt
1. The ﬁrst is the name of a text ﬁle that contains the names of AT LEAST TWO text ﬁles (each per line) from which the words are to be read to build the lexicon (The argument is to specify the input ﬁles). 2. The second is the name of a text ﬁle to which the words in the lexicon are to be written (The argument is to specify the ﬁle containing the words and the neighbors in the lexicon). 3. The third is the name of a text ﬁle that contains ONLY ONE matching pattern (The argument is to specify the file containing the matching pattern). 4. The fourth is the name of the text ﬁle that contains the result of the matching for the given pattern (The argument speciﬁes the ﬁle containing the output).
For this version, the efﬁciency with which the program performs various operations is a major concern, i.e. the sooner the program performs (correctly), the better. For example, the ﬁles read in can be quite long and the lexicon of words can grow to be quite lengthy. Time to insert the words will be critical here and you will need to carefully consider which algorithms and data structures you use. You can use any text ﬁles for input to this program. A good source of long text ﬁles is at the Gutenberg project (www.gutenberg.com) which is a project aimed to put into electronic form older literary works that are in the public domain. The extract from Jane Austen’s book Pride and Prejudice used as the sample text ﬁle above was sourced from this web site. You should choose ﬁles of lengths suitable for providing good information about the efﬁciency of your program. A selection of test ﬁles have been posted on LMS for your efﬁciency testing. You can consider additional test ﬁles if you wish. As expected, the deﬁnition of a word, and the content of a query’s result and display of this result are exactly the same as what described in Assignment Part 1. All the Java ﬁles must be submitted. The program will be marked on correctness and efﬁciency. Bad coding style and documentation may have up 5 marks deducted.
Task 2 (CSE5ALG students only)
Consider the B-trees of order M . Assume that we have the following result, which we will refer to as Lemma 1.
⌉.Lemma 1: The barest B-tree of height H contains N = 2K H − 1 elements, where K = ⌈M 2
Determine the height’s upper bound for a B-tree of order 23 which has 10, 000, 000 = 107 elements. You must give an integer value as the height’s upper bound for the B-tree. You are not allowed to use the result given in the lecture regarding the upper bound for Btree’s height. Instead, you must work out the answer using Lemma 1 above.
Note: The total mark for Part 2 will be 100 for CSE2ALG students and 125 (100 for Task 1 and 25 for Task 2) for CSE5ALG students. The percentage of contribution to the ﬁnal will be the same, i.e. 20%.
In your solution to Task 2, as well as in every Java class, you must include your student ID and name, and the subject code.
How to submit your solution to Task 2: Your solution should be a PDF ﬁle named Task2.pdf, and be submitted using the same command submit ALG, i.e. submit ALG Task2.pdf