Hire Experts For Answers
Order NowRelated Study Services
- Homework Answers
- Coursework writing help
- Term paper writing help
- Writing Help
- Paper Writing Help
- Research paper help
- Thesis Help
- Dissertation Help
- Case study writing service
- Capstone Project Writing Help
- Lab report Writing
- Take my online class
- Take my online exam
- Do my test for me
- Do my homework
- Do my math homework
- Online Assignment Help
- Do my assignment
- Essay Writing Help
- Write my college essay
- Write my essay for me
DESCRIPTION
Posted
Modified
Viewed
15
need this assignment done asap, figuring out two functions for the data mining course using bigrams:
1. Take some starting tokens and produce the most likely token that follows under a bi-gram model
2. Train a n-gram language model as specified by the argument "n"
Attachments
assignment1_part2
September 30, 2021
[ ]: version = "REPLACE_PACKAGE_VERSION"
1 Assignment 1 Part 2: N-gram Language Models (Cont.) (30
pts)
In this assignment, we’re going to train an n-gram language model that is able to “imitate” William
Shakespeare’s writing.
[ ]: # Configure nltk
import nltk
nltk_data_path = "assets/nltk_data"
if nltk_data_path not in nltk.data.path:
nltk.data.path.append(nltk_data_path)
[1]: # Copy and paste the functions you wrote in Part 1 here and import any␣
↪→libraries necessary
# We have tried a more elegant solution by using
# from ipynb.fs.defs.assignment1_part1 import load_data, build_vocab,␣
↪→build_ngrams
# but it doesn't work with the autograder...
def load_data():
sentences = []
with open ('assets/gutenberg/THE_SONNETS.txt') as fin:
pattern = '[0-9]'
list = [re.sub(pattern, '', i) for i in fin]
for line in list:
line = line.lower()
if((line.strip()) and not (line.isnumeric())):
sentence_list = line.strip().split(" ")
last_char = sentence_list[-1][-1]
last_word = sentence_list[-1]
1
last_word = last_word[:-1]
sentence_list[-1] = last_word
sentence_list.append(last_char)
sentences.append(sentence_list)
print(len(sentences))
return sentences
def build_vocab(sentences):
vocab = []
for x in sentences:
for y in x:
if y not in vocab:
vocab.append(y)
vocab.append('<s>')
vocab.append('</s>')
for x in vocab:
print(x)
return vocab
def build_ngrams(n, sentences):
all_ngrams = []
padded_zip_codes_pattern = [pad_both_ends(zcp, n=n) for zcp in sentences]
all_ngrams = [ngrams(pzcp, n=n) for pzcp in padded_zip_codes_pattern]
return all_ngrams
1.1 Question 4: Guess the next token (20 pts)
With the help of the three functions you wrote in Part 1, let’s first answer the following question
as a review on n-grams.
Assume we are now working with bi-grams. What is the most likely token that comes after the
sequence <s> <s> <s>, and how likely? Remember that a bi-gram language model is essentially a
first-order Markov Chain. So, what determines the next state in a first-order Markov Chain?
Complete the function below to return a tuple, where tuple[0] is a str representing the
mostly likely token and tuple[1] is a float representing its (conditional) probability
of being the next token.
2
[ ]: def bigram_next_token(start_tokens=("<s>", ) * 3):
"""
Take some starting tokens and produce the most likely token that follows␣
↪→under a bi-gram model
"""
next_token, prob = None, None
# YOUR CODE HERE
raise NotImplementedError()
return next_token, prob
[ ]: # Autograder tests
stu_ans = bigram_next_token(start_tokens=("<s>", ) * 3)
assert isinstance(stu_ans, tuple), "Q4: Your function should return a tuple. "
assert len(stu_ans) == 2, "Q4: Your tuple should have two elements. "
assert isinstance(stu_ans[0], str), "Q4: tuple[0] should be a str. "
assert isinstance(stu_ans[1], float), "Q4: tuple[1] should be a float. "
# Some hidden tests
del stu_ans
1.2 Question 5: Train an n-gram language model (10 pts)
Now we are well positioned to start training an n-gram language model. We can fit a language
model using the MLE class from nltk.lm. It requires two inputs: a list of all n-grams for each
sentence and a vocabulary, both of which you have already written a function to build. Now it’s
time to put them together to work.
Complete the function below to return an nltk.lm.MLE object representing a trained
n-gram language model.
[ ]: from nltk.lm import MLE
def train_ngram_lm(n):
"""
Train a n-gram language model as specified by the argument "n"
"""
lm = MLE(n)
# YOUR CODE HERE
raise NotImplementedError()
3
return lm
[ ]: # Autograder tests
stu_n = 4
stu_lm = train_ngram_lm(stu_n)
stu_vocab = build_vocab(load_data())
assert isinstance(stu_lm, nltk.lm.MLE), "Q3b: Your function should return an␣
↪→nltk.lm.MLE object. "
assert hasattr(stu_lm, "vocab") and len(stu_lm.vocab) == len(stu_vocab) + 1,␣
↪→"Q3b: Your language model wasn't trained properly. "
del stu_n, stu_lm, stu_vocab
FINALLY, are you ready to compose sonnets like the real Shakespeare?! We provide some starter
code below, but absolutely feel free to modify any parts of it on your own. It’d be interesting to
see how the “authenticity” of the sonnets is related to the parameter n. Do the sonnets feel more
Shakespeare when you increase n?
[ ]: # Every time it runs, depending on how drunk it is, a different sonnet is␣
↪→written.
n = 3
num_lines = 14
num_words_per_line = 8
text_seed = ["<s>"] * (n - 1)
lm = train_ngram_lm(n)
sonnet = []
while len(sonnet) < num_lines:
while True: # keep generating a line until success
try:
line = lm.generate(num_words_per_line, text_seed=text_seed)
except ValueError: # the generation is not always successful. need to␣
↪→capture exceptions
continue
else:
line = [x for x in line if x not in ["<s>", "</s>"]]
sonnet.append(" ".join(line))
break
# pretty-print your sonnet
print("\n".join(sonnet))
4
Assignment 1 Part 2: N-gram Language Models (Cont.) (30 pts)
Question 4: Guess the next token (20 pts)
Question 5: Train an n-gram language model (10 pts)
Explanations and Answers
0
No answers posted
Post your Answer - free or at a fee
NB: Post a homework question for free and get answers - free or paid homework help.
Get answers to: Data Mining Homework Using Bigrams And N Grams- Prediction or similar questions only at Tutlance.
Related Questions
- Use Java To Add And Subtract Exact Time
- Using A Data Set, Answer Some Basic Questions About The Dataset Using Jupyter Notebook
- Excel Homework About Data Processing And Statistics
- Sas Entrprise Miner And Predictive Modeling
- This Assignment I Need To Complete A Lab In Arcgis
- Basic Programing C++, Enter Your First Name And Last Name C++
- Data Cleaning And Also Regression
- Python Code Using While Loop To Calculate Accrued Interest *Due 9/20/21 10:00 Pm Cdt*
- Creative Coding Assignment Using P5.Js And P5,Play.js Libraries
- Using Python Ml Models And/Or Text Mining Models To Address An Interesting Research Question
- Anova Assistance And Data Analysis Assistance
- Data Science/ Machine Learning System Design
- Doing My Homework With Python Programming
- Email Campaign Data Analysis, Report And Feedback.
- Need Help With Writing Python Programming Code
- C Programming Assignment Stating 3 Scanf Functions
- Matlab Code Adjustment For Co2 Measurements- Adjust Which Dataset Is Read
- Python Question - Longest Run In Data.
- Solar System Exam
- Software Development Web Application
- Principles Of Supply Chain Management
- C++ Homework Need Done Now***************************************
- I Need The Out As Student Information In Tkinter
- Small Refactor Of Legacy Python Code
- Code In P5.Js Please. Reply If You Can Help
- Code In P5.Js For Three Questions.
- Beginner Python Programming Assignment
- Python Code For Homework Help
- Using Functions And Formulas In Excel In Order To Answer Questions From Dataset
- Write My One Page Doctoral Assignment
- Predictive Maintenance With Machine Learning Techniques To Reduce Cost
- Nle Exam Help - Nltk, Hmm, Corpus, Python, No Programming
- Python Code Help Please - Flowchart And Code
- Create A Password Checker For A Class Project. I Need Help.
- Write A Function In R Programing With The Requirement In The Attached File
- Python Tv Series Info Generator
- Very Simple Python Code - Panda Package - Won't Take You More Than 1H
- Weeks 6-8 -Gui Programming Assignment
- I Have A Graduate Data Science Exam To Complete In Python
- My Python Assignment Data Mining
- Python Programming - Create A Solar System
- Hangman Programming Assignment (Python)
- Building A Cnn Binary Image Classifier To Differentiate Animal Face From No Animal Face.
- Fractals, I Will Attach Some Photos Of The Questions
- Business Pricing Excel Test Sample Date & Questions
- C++ Backtracking Program- Word Search
- Simulate Zombie Apocalypse On Python
- C++ Wordsearch- Backtracking Algorithm
- Log Parsing And Counting Using Python And Jupyter Notebook
- Write A Java Program And Server