Bio Stats: Applied Regression

Need help with this question or any other R homework assignment task or project?

Ask A Question

Bio Stats: Applied Regression

Asked
Modified
Viewed 14
• Write your solutions under each of the questions. • Some problems only apply to PHP 2511 students, so read the HW carefully. • The assignemt was written to mimic what you would have been asked to do in-person. It is intended to take <1 class period to complete. • All interpretations must be in context to the original problem including units. • All R output is in this exam.

This order does not have tags, yet.

Additional Instructions

PHP 1511/2511 HW 2 Due: May 15, 2022 by midnight Name: Total Score (1511): /29 Total Score(2511): /37 Instructions · Write your solutions under each of the questions. · Some problems only apply to PHP 2511 students, so read the exam carefully. · The exam was written to mimic what you would have been asked to do in-person. It is intended to take <1 class period to complete. · All interpretations must be in context to the original problem including units. · All R output is in this exam. The first 3 questions refer to the following scenario. A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution (rank 1-4, with higher=less prestigious), effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable. The following logistic regression model was fit: Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.98998 1.13995 -3.50 0.00047 *** ## gre 0.00226 0.00109 2.07 0.03847 * ## gpa 0.80404 0.33182 2.42 0.01539 * ## rank2 -0.67544 0.31649 -2.13 0.03283 * ## rank3 -1.34020 0.34531 -3.88 0.00010 *** ## rank4 -1.55146 0.41783 -3.71 0.00020 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 499.98 on 399 degrees of freedom ## Residual deviance: 458.52 on 394 degrees of freedom 1. (6 pts) The table above shows the regression parameters from a logistic regression model. Note, they are not exponentiated. Interpret the significant effect in the context of the problem. 2. (1 pt.) Which of the following statements is false about the model results: a. There is a significant effect of school rank on the odds of graduate school admission. b. Rank is treated like a trend in the model. c. The reference category is the highest rank schools. d. GRE and GPA are independently associated with graduate school admission. 3. (6 pts, 2511 only) Identify the following parts of the GLM corresponding to the model above: systematic component, random component and link function. 4. Researchers were interested in identifying risk factors associated with giving birth to a low-birth-weight baby (weighing less than 2500 grams). Data was collected on 189 women, 59 of which had low birth weight babies and 130 of which had normal birth weight babies. Two variables which were thought to be of importance were self-reported race (white, black, other), and the smoking habits of the mother (1=smoker, 0=non-smoker). Note: smoke1=1 if smoker, 0 if non-smoker; raceblack=1 if self-reported race=black, 0 otherwise; raceother=1 if self-reported race=other, 0 otherwise. Consider the following logistic regression model output of the exponentiated coefficients from the regression model: ## term estimate p.value conf.low conf.high ## 1 (Intercept) 0.1000000 1.124386e-05 0.03001716 0.2479494 ## 2 smoke1 5.7575752 3.429174e-03 1.93948909 21.3664428 ## 3 raceblack 4.5454541 4.412016e-02 1.03972939 21.3006300 ## 4 raceother 5.7142851 3.371066e-03 1.94185108 21.0892350 (2 pts) Interpret the effect of smoking on the outcome in the context of the problem. 5. (2 pts) Referring to question 4, what variables are significant risk factors of giving birth to a low birthweight baby? 6. (2 pts) An exercise physiologist wishes to explore the association between group cohesion and exercise performance and whether this differs by biological sex. The following model is fit: How would you test whether Sex was a moderator of the association between Group Cohesion and Exercise Performance? 7. (2 pts) Referring to question 6: An exercise physiologist wishes to explore the association between group cohesion (X) and exercise performance (continuous variable Y) and whether this differs by biological sex. Suppose a=13.25, b=2.14, c=-1.65, d=3.29. Sex=1 for women and 0 for men. What is the mean difference in exercise performance for a one-unit difference in group cohesion for women? 8. A university conducts a study to investigate whether the relationship between student attendance at lectures (X) and their subsequent test scores (Y) is moderated by whether or not the lecturer has a teaching qualification (M). a. (1 pt.) Which is the moderator variable? b. (2 pts, 2511 only) Write out the model you would be fitting in order to test the moderation hypothesis. You can assume Y is a continuous variable. 9. (3 pts) In a multivariable linear regression model, list and briefly explain the possible methods that could be used to assess whether a predictor X has a significant effect on an outcome Y controlling for confounders? 10. (2 pts) An international company is worried that employees in a certain job at its headquarters in Country A are not being given raises at the same rate as employees in the same job at its headquarters in Country B. Using a random sample of employees from each Country, a regression model is fit with: Y = employee salary X1 = length of time employee has worked for the company X2 = 1 if employee is in Country A, and 0 if employee is in Country B. What type of test would compare whether it is necessary to include X2 in a model predicting Y from X1? 11. Suppose you have the following scatterplot of Y vs X that looks like a negatively sloped line (with points clustered tightly around the line).  . a. (1 pt.) Which choice is most likely to be the approximate value of R2 ? a. −99.5% b. 2.0% c. 50.0% d. 99.5% b.(2 pts) What is the interpretation of R2 from part a? 12. (1 pt.) Consider a GLM predicting the outcome Y (binary indicator of smoking cessation status) from X1=concerns about gaining weight, X2=gender and X3=confidence in quitting. What is the systematic component of the model? a. Y b. b1X1 +b2X2 + b3X3 c. logit(p/1-p) 13. (1 pt.) Which choice is not an appropriate description of in a regression equation? a. Estimated response b. Predicted response c. Fitted Response d. Observed response 14. (2 pts) In a linear regression model, a 95% confidence interval for β1 was given as: (-5.65, 2.61). What would a test for H0 (Null hypothesis): β1=0 vs Ha (alternative hypothesis) : β10 conclude? 15. (1 pt.) The economic structure of Major League Baseball allows some teams to make substantially more money than others, which in turn allows some teams to spend much more on player salaries. These teams might therefore be expected to have better players and win more games on the field as a result. Suppose that after collecting data on team payroll (in millions of dollars) and season win total for 2021, we find a regression equation of Wins = 71.87 + 0.101Payroll - 0.060League where League is an indicator variable that equals 0 if the team plays in the National League or 1 if the team plays in the American League If Teams A and B both play in the same league, and Team A’s payroll is $1 million higher than Team B’s, then we would expect Team A to win, on average, a. 0.101 games more than Team B. b. 71.87 games more than Team B. c. 0.060 games more than Team B. d. 0.060 games fewer than Team B.

Answers

0

No answers posted

Post your Answer - free or at a fee

Login to your account tutor account to post an answer

Posting a free answer earns you +20 points.

Login

To get help with a similar task to Bio Stats: Applied Regression, Ask for R Assignment Help online.