POLI 328: Quantitative Analysis (Winter 2022)
Prof. Ethan Busby
Problem Set #6
207 points total
This problem set will be due on Learning Suite before 11:59pm Thursday, February 24th. Remember that no late assignments will be accepted. It is divided into four parts. You must submit your answers to each part separately, as we will have a different TA grade each part. Make sure that your name, section number as well as the problem set and part number (e.g. Problem Set 1, Part 1) are clearly listed on each part. Students who fail to do so may be penalized on the assignment. If necessary, re-read the section in the syllabus on group work to make sure you are giving proper credit to those you work with and/or the text(s) you use for each problem.
Part 1: Basic regression problems (12 points, 4 points each part)
Suppose that a researcher, using data on apartment building size (ABS) and average health outcomes for residents from 100 apartment buildings, estimates the OLS regression. Apartment building size is measured in number of apartments per building and health outcome is measured in an average health outcome score for all of the residents of that building in points, where 0 points indicates extremely poor health and 100 points indicates extremely good health.
Note: Please reference the correct units for the variables in your responses.
a. An apartment building has 20 apartments in it. What is the regression’s prediction for that building’s residents’ average health score?
b. Last year a building had 13 apartments and now, after some renovations, this year it has 19 apartments. What is the regression’s prediction for the change in the building average health score?
c. Explain, in words, the R2 value for this regression.
Part 2: Regression by hand (65 points, 5 points each)
2.1 It's time to calculate a regression ``by hand.'' For this assignment, you need to show your work (including relevant formulas). Take a look at the table which gives data from a sample on the relationship between people’s opinions of democracy and their level of education.[footnoteRef:1] The purpose of this exercise is to illustrate the mechanics of ordinary least squares (OLS) regression. For purposes of this regression please make views of democracy the dependent variable, Y (positive numbers indicate greater positivity towards democracy), and the respondent’s level of education the independent variable, X (measured in years completed). You are certainly welcome to simply input this data into a computer and calculate the following values (I would strongly encourage this), but you should then write out the formula and solution by hand. [1: These data were once real but have been modified to make changes from semester to semester.]
When one question uses the solution from a previous question, you may simply plug in the value from the previous question rather than showing all of your work over again. For example, if part b uses the answer from part a, you can just use the answer from part a without repeating all the detailed work from part a.
a) The sample means for X and Y
b) The standard deviations of X and Y
c) The covariance for X and Y
d) correlation coefficient, r, for X and Y
e) The OLS estimated slope coefficient,
f) Interpret your answer from part e for
g) The OLS estimated intercept coefficient,
h) , the predicted value for each observation
i) , the residual for each observation
j) The sum of squared residuals, SSR.
k) The TSS, total sum of squares.
l) the ESS, explained sum of squares.
m) The R2 of the regression.
Years of Education
Views of Democracy
Part 3: Decoding Stata Output (50 points, 10 points each)
Refer to the “states.dta” dataset found on Learning Suite. Examine the variables demstate (which is the percentage of state legislators who are Democrats), and the variable dempct_m (which is the percentage of self- described Democrats in the mass public in that state).[footnoteRef:2] [2: These data were once real but have been modified.]
a) Produce a beautiful scatter plot of demstate (y-axis) and dempct_m (x-axis). Place a line of best fit on this graph. In a sentence or two, describe what the scatter plot shows.
b) Using Stata, calculate the correlation between demstate and dempct_m.
c) Run a regression with demstate as the dependent variable and dempct_m as the independent variable. Using your Stata results, what is the estimated coefficient on dempct_m? In one or two sentences interpret the meaning of this coefficient.
d) Using your Stata results, what is the estimated constant of the regression? In one or two sentences interpret the meaning of this coefficient.
e) Identify and interpret two measures of fit from the Stata results
Part 4: Regression and RCTs (80 points, 10 points each)
Professors Preece, Monson, and Karpowitz at BYU conducted an experiment looking at how to get more women into elected office. The experiment took place at Republican neighborhood caucus meetings in Utah in 2014. In these meetings Republican neighbors gather together and elect delegates that go to a statewide convention where they select candidates that go on to run in the general election in November. In the experiment, a randomly selected group of caucuses were assigned the “treatment” of having a letter read by the precinct chair before the meeting that emphasized the importance of having gender diversity in delegates who attend the state convention. Caucuses in the “control” condition had no letter read before the meeting. You will use the dataset women_student_altered2022.dta for this question.[footnoteRef:3] [3: The dataset you are using is slightly different from the data used in the actual study. It has been altered to make it harder for students to share answers from semester to semester, and to better illustrate the concepts we are learning in this class. However, the general results from this problem set align with the actual study. If you want more information about this study, you should talk to Professors Preece, Monson, and Karpowitz in our department. They are really smart!]
a) Verify that the treatment was randomly assigned by checking for balance between the treatment and control conditions on the following variables. Do you see evidence of imbalance across the conditions? Make a table showing your results. This table should resemble the balance table you made in the previous problem set
I. The average age of the caucus attendees
II. The number of people attending the caucus meeting
III. The proportion of women attending the meeting
b) Using a t-test, show whether the treatment had any effect in electing more women as state delegates. Explain your answer in a few sentences. Be sure to state your null and alternative hypotheses, the t-statistic, and p-value from your results. Note that the dependent variable here is a continuous proportion (takes various values from 0 to 1) and not a binary variable. As such, a proportions test is not appropriate for this variable.
c) Using your t-test results, what proportion of elected delegates were women in the treatment and control conditions? What is the difference in means between the two treatment conditions?
d) Now, run a regression in which the proportion of elected delegates that were women is the dependent variable and the treatment condition is the independent variable. What is the estimated intercept on the regression?
e) How is the estimated intercept related to the result you calculated above in part (c) above?
f) What is the coefficient on the treatment variable in the regression?
g) How is the coefficient on the treatment variable related to the result you calculated above in part (c)?
h) What point does this illustrate about how t-tests and regression compare to one another? When are they the same? When are they different?