Estimation Algorithms assignment
Algorithms homework attached
Get Help With a similar task to - Estimation Algorithms assignment
Additional Instructions:
cs4102 Fall 2020 2 Randomness and Sorting Exercise 1 A Matching Pair For Problems 1 and 2 below we will be exploring algorithms to solve the following: Professor Brunelle has n socks in his drawer, but he never pairs up his socks after washing them. Let’s help him find the best way find a matching pair in the morning. Fortunately, Professor Brunelle knows that exactly 3n4 + 1 of his socks are identical, and so picking any two from that group would make a matched pair. Unfortunately, none of his other socks will match at all! (Where on earth could those other socks have gone?!). We’ll call M the set of all the identical socks of Professor Brunelle’s. We’ll call U the set of all unique socks. Note that |M |> 3 · |U | and |M |+|U |= n. You may assume n is a multiple of 4. Creative Commons BY-NC 4.0 Nathan Brunelle and David Wu Hasan Mukati Hasan Mukati cs4102 Fall 2020 3 Randomness and Sorting Exercise 1 Problem 1: Linear Time Expected Currently, this is the algorithm Professor Brunelle is using to select socks: repeat until there’s a match: s1 = random sock repeat n4 + 1 times: s2 = a random sock if s1 matches s2 then wear them, break return all socks to the drawer In other words, he picks a first sock at random, checks n4 + 1 different socks with it to see if any match. If there is no match, he replaces all of the socks, picks a new first sock, and repeats. We will show that this algorithm will require an expected Θ(n) comparisons by considering separately the expected number of iterations of the inner loop and the outer loop (i.e. the expected number of socks chosen as s2 and s1 respectively). Inner Loop: We’ll begin by focusing on how many times the inner loop might iterate for each choice of s1 (i.e., the expected number of times we select a different sock for s2 for a single choice of s1). Note that we expect different behaviors for the inner loop depending on if s1 ∈ U vs. if s1 ∈M . Consider these cases separately: Case 1: Assuming s1 ∈ U , show that the expected number of times the inner loop will iterate is Θ(n). Case 2: Assuming s1 ∈ M , show that the expected number of times the inner loop will iterate is Θ(1). Hint: Bound the probability that the algorithm continues after iteration i (i.e., the first time that s2 ∈M occurs after iteration i). Outer Loop: Next we’ll take a look at the outer loop. Observe that this outer loop will continue iterating until s1 ∈M . Once this event occurs, that is guaranteed to be the last iteration through the outer loop. For this reason, to determine the expected number of iterations of the outer loop, we will focus on finding the expected number of times s1 will come from U . Show that the expected number of times the outer loop will iterate is Θ(1). Hint: Bound the probability that the algorithm continues after iteration i (i.e., the first time that s1 ∈M occurs is after iteration i). Putting it together: Use all three of your answers above to find the overall expected running time of this algorithm in terms of the number of comparisons. Creative Commons BY-NC 4.0 Nathan Brunelle and David Wu Hasan Mukati Hasan Mukati cs4102 Fall 2020 4 Randomness and Sorting Exercise 1 Problem 2: A Better Algorithm Suggest a new algorithm that will allow Professor Brunelle to find a matching pair in a constant number of comparisons (in expectation). Algorithm: Describe your algorithm here. Make it clear where randomness is being used. Pseudo-code is acceptable if that’s what you prefer. Correctness: Justify how you know your algorithm is going to find a matched pair (eventually). Running Time: Show that the expected number of comparisons that will be done by your algorithm is Θ(1). Creative Commons BY-NC 4.0 Nathan Brunelle and David Wu Hasan Mukati Hasan Mukati cs4102 Fall 2020 5 Randomness and Sorting Exercise 1 Problem 3: The Count-Min Sketch and Frequency Estimation Google is interested in collecting some frequency statistics on user search queries and identify trending queries over time.1 One way that Google can do so is to maintain a hash table mapping search queries to counts. However, if there are n unique queries, this will require maintaining a table of size Ω(n), which at Google-scale, is rather infeasible. In this problem, we will explore the concept of a count-min sketch data structure which allows us to compute approximate frequencies (i.e., counts) with much less space. Let U be the universe of possible search queries (i.e., the set of all strings up to length 1000). We begin by allocating an array of counters (c1, . . . , cM ) where each counter is initialized to 0. We also sample a hash function h from a family of universal hash functions from U → {1, 2, . . . , M}. Then, whenever a user makes a search query q ∈ U , we compute i = h(q) and increment ci = ci + 1. To estimate the frequency of a query q ∈ U , we again compute i = h(q) and return the count ci. While this method is very simple, we show in this problem that it can be used to provide good frequency estimates. Let q1, . . . , qn be the queries that have been made so far, and for each i, let fi > 0 denote the number of times query qi has been made. Let N = f1 + · · ·+ fn be the total number of queries that have been made. Let Xi be a random variable for the count of qi reported by the above algorithm (i.e., Xi = ch(qi)). Expected value of estimate. Show that fi ≤ E[Xi] ≤ fi + N/M . Hint: Try defining an indicator random variable Xij = 1 if h(xi) = h(xj) and 0 otherwise. This shows that the expected error of this approach is roughly N/M . Quality of estimates. Show that Pr[Xi ≥ fi + 2N/M ] ≤ 1/2. Hint: You may use without proof that for a random variable X taking on non-negative values, Pr[X ≥ t] ≤ E[X]/t. This shows that with probability 1/2, the ’‘error” of this approach is within an additive factor of 2N/M . Reducing the error. We can reduce the error by using multiple independent copies of the above data structure. Namely, we initialize multiple arrays of counters C1 = (c1,1, . . . , c1,M ), . . . , Ck = (ck,1, . . . , ck,M ) and multiple independent hash functions h1, . . . , hk (each sampled independently from the family of universal hash functions). When we obtain a query q, we increment all of the counters c1,h1(q), . . . , ck,hk(q). To obtain an estimate for a query q, we take the minimum count across all of the arrays of counters: Xi = mini=1,...,k ci,hi(q). Show that Pr[Xi ≥ fi + 2N/M ] ≤ 1/2k. This shows that we can decrease the error probability exponentially using only linearly more space! The count-min sketch data structure has many applications for estimating frequencies in settings where it is infeasible to store all of the data (e.g., streaming environments). 1Google has implemented such a service: https://trends.google.com/trends/trendingsearches/daily. Creative Commons BY-NC 4.0 Nathan Brunelle and David Wu Hasan Mukati Hasan Mukati
Related Questions
Tutlance Experts offer help in a wide range of topics. Here are some of our top services:
- Math homework help
- Nursing homework help
- Statistics homework help
- Nursing coursework help
- Capstone project writing services
- Essay writers for hire
- Case study writing help
- Buy college papers online
- Buy college research papers
- College homework help
- Professional resume writing services
- Programming homework help
- Coursework writing help
- Term paper writing help
- Biology homework help
- Do my physics homework
- Dissertation data analysis help
- PhD Dissertation writing services
- Chemistry homework help
Post your project now for free and watch professional experts outbid each other in just a few minutes.