Get Help With a similar task to - Programming Assignment in Java

Login to view and/or buy answers.. or post an answer
Additional Instructions:

EECS 233 Programming Assignment #3: Hash Tables Due April 14, 2020 before 11:59pm 100 points Web search engines use a variety of information in determining the most relevant documents to a query. One important factor (especially in early search engines) is the frequency of occurrences of the query words in a document. In general, one can try to answer a question how similar or dissimilar two documents are based on the similarity of their word frequency counts (relative to the document size). A necessary step in answering these types of questions is to compute the word frequency for all words in a document. In this assignment, you will write a method wordCount(String input_file, String output_file) that reads a file (document) and prints out (into another file) all the words encountered in the document along with their number of occurrences in the document. Please use output format such as “(father 30) (fishing 12) (aspirin 45) …”. For simplicity, assume any derivative words to be distinct, e.g., “book” and “books”, “eat” and “eating” are all considered distinct. Assume that words are defined to be simply strings of characters between two delimiting characters, which include a space and punctuation characters. Assuming that something like “Father’s” is two words (“Father” and “s”, because they are separated by delimiters) is OK for our purposes. You can use Java class StringTokenizer (which is sometimes viewed as deprecated but it’s not, it’s considered “legacy” class) or String.split() to extract words from an input string to save yourself some programming. Do not distinguish words that only differ in upper or lower case of their characters, e.g., “Father” and “father” is one word. You can use appropriate methods of the String class handle this easily (e.g., using toLowerCase method). In implementing wordCount, please implement (yourself, don’t use java’s hash-related classes) a hash table with separate chaining to keep the current counts for words you have already encountered while you are scanning the input file. Your general procedure would include the following steps: 1. Scan in the next word 2. Search for this word in the hash table 3. If not found, insert the new entry with this word and the initial count of 1. Otherwise increment the count. 4. If you inserted a new word, check if the hash table needs to be expanded. After you process the entire file, loop through the entire hash table and print out, sequentially in any order you like, the list of words and their counts. Also, report the average length of the collision lists in the final state of your hash table (across all hash slots, so empty slots also contribute). You also need the main method that accepts the names of the two files above and passes them to the wordCount method. Please run your program on the same input file you used for Programming Assignment 2 (still truncated to 50Kbyte size). If you skipped that assignment, please refer to it for instructions on how to obtain a realistic input file. Additional instructions: 1. In implementing your hash table, you can use Java’s hashCode function on strings, so that your hash function will be h = Math.abs(word.hashCode()) % tableSize. But you obviously cannot use built-in hash tables like HashMap in Java. Note that we take the absolute value of the hash because hashCode returns an int, which can be negative. 2. Please use separate chaining to resolve collisions in your hash table. Using separate chaining, you do not need to have tableSize to be prime number. Any number will work as long as it is not a multiple of 31 (see lecture for the reason why). For example, starting with tableSize as a power of 2 and then doubling if you need to expand will ensure you do not have a multiple of 31. Deliverables: 1. Source code including comments necessary to understand it; 2. Input file; 3. Output result: word counts and average length of the collision lists. 4. A “toy” test file and output produced on the toy file (see below). Grading: • Implementation of the hash table class: 50 pts, including: o Correct hashing (with proper comments): 30 o Resizing/rehashing as needed (with proper explanation in comments): 20 • Application program utilizing the hash table, along with a “toy” test file and output produced on a toy file (Important: DO THIS FIRST, before working with a real file!): 25 pts • Programming style: 10 pts • Producing output on a real file: 15 pts.

EECS 233 Programming Assignment #3: Hash Tables Due April 14, 2020 before 11:59pm 100 points Web search engines use a variety of information in determining the most relevant documents to a query. One important factor (especially in early search engines) is the frequency of occurrences of the query words in a document. In general, one can try to answer a question how similar or dissimilar two documents are based on the similarity of their word frequency counts (relative to the document size). A necessary step in answering these types of questions is to compute the word frequency for all words in a document. In this assignment, you will write a method wordCount(String input_file, String output_file) that reads a file (document) and prints out (into another file) all the words encountered in the document along with their number of occurrences in the document. Please use output format such as “(father 30) (fishing 12) (aspirin 45) …”. For simplicity, assume any derivative words to be distinct, e.g., “book” and “books”, “eat” and “eating” are all considered distinct. Assume that words are defined to be simply strings of characters between two delimiting characters, which include a space and punctuation characters. Assuming that something like “Father’s” is two words (“Father” and “s”, because they are separated by delimiters) is OK for our purposes. You can use Java class StringTokenizer (which is sometimes viewed as deprecated but it’s not, it’s considered “legacy” class) or String.split() to extract words from an input string to save yourself some programming. Do not distinguish words that only differ in upper or lower case of their characters, e.g., “Father” and “father” is one word. You can use appropriate methods of the String class handle this easily (e.g., using toLowerCase method). In implementing wordCount, please implement (yourself, don’t use java’s hash-related classes) a hash table with separate chaining to keep the current counts for words you have already encountered while you are scanning the input file. Your general procedure would include the following steps: 1. Scan in the next word 2. Search for this word in the hash table 3. If not found, insert the new entry with this word and the initial count of 1. Otherwise increment the count. 4. If you inserted a new word, check if the hash table needs to be expanded. After you process the entire file, loop through the entire hash table and print out, sequentially in any order you like, the list of words and their counts. Also, report the average length of the collision lists in the final state of your hash table (across all hash slots, so empty slots also contribute). You also need the main method that accepts the names of the two files above and passes them to the wordCount method. Please run your program on the same input file you used for Programming Assignment 2 (still truncated to 50Kbyte size). If you skipped that assignment, please refer to it for instructions on how to obtain a realistic input file. Additional instructions: 1. In implementing your hash table, you can use Java’s hashCode function on strings, so that your hash function will be h = Math.abs(word.hashCode()) % tableSize. But you obviously cannot use built-in hash tables like HashMap in Java. Note that we take the absolute value of the hash because hashCode returns an int, which can be negative. 2. Please use separate chaining to resolve collisions in your hash table. Using separate chaining, you do not need to have tableSize to be prime number. Any number will work as long as it is not a multiple of 31 (see lecture for the reason why). For example, starting with tableSize as a power of 2 and then doubling if you need to expand will ensure you do not have a multiple of 31. Deliverables: 1. Source code including comments necessary to understand it; 2. Input file; 3. Output result: word counts and average length of the collision lists. 4. A “toy” test file and output produced on the toy file (see below). Grading: • Implementation of the hash table class: 50 pts, including: o Correct hashing (with proper comments): 30 o Resizing/rehashing as needed (with proper explanation in comments): 20 • Application program utilizing the hash table, along with a “toy” test file and output produced on a toy file (Important: DO THIS FIRST, before working with a real file!): 25 pts • Programming style: 10 pts • Producing output on a real file: 15 pts.

Related Questions

Similar orders to Programming Assignment in Java
6
Views
0
Answers
Creating a DNS server (written in C)
WANT >=50% due to being swamped (only standard option minimal, don't care about cache or non-blocking). I require periodic updates of code with a description as there is a Git commit tracking. Also require a makefile according to the specifications and a g...
20
Views
0
Answers
Create inheritance project for a restaurant
Projects must include: -at least three different levels of inheritance - at least nine classes total -the highest superclass must have at least two methods -every subclass must contain a unique method that was not present in its super...
16
Views
0
Answers
Python Code for Suggesting Pets
Must have ___init___ ; ___str___ ; for loops ; while loops ; and must define a function which returns something, I wrote down what idea I had in that form so its best to follow it, this is an entry-level computer science project so it should be pretty easy...
22
Views
0
Answers
Quick HW Computer Science on LL(1) and object creation
Due at 12:00pm today. The question details are in the files attached. One question is on the impact of a parser being LL(1) or not. Another is on imagining a mistake in object creation. The answers are probably max 2 lines long....