Statisical Methods And Programming In R

Posted Under: Statistical Methods

Ask A Question
DESCRIPTION
Posted
Modified
Viewed 22
Solving assignment in statistical methods and programming in R. Two tasks with several problems to solve. See attachements for assignment and downloaded data for the different tasks.
Attachments
Mandatory assignment 3 – STA2004 Oppgave 1 Inference in simple linear regression a) Download data from here and load them in R. Then, set up a linear model with response Y (“converted sugar”) as a function of the regressor x (“temperature”). Find the estimated regression line from the data. b) Find s2 and use it to compute a 95% con�dence interval (CI) for � 2. c) Perform a test with � = 0.05 to determine if �1 = 0. Use the p-value. based on your �ndings, can we say that “converted sugar” is linearly related to “temperature”? d) Compute a 95% both for �0 and �1. e) Compute a CI that with 95% probability contains the true value for expected “converted sugar” (mean response) when x = 1.8. f) Compute a prediction interval (PI) that with 95% probability contains the value “converted sugar” (single response) for a new observation when x = 1.8. g) Compute SST, SSR, SSE and R2. Explain what is the interpretation of these values. h) Explain the logic behind an ANOVA test for �1 = 0. Then, apply the ANOVA method and draw your conclusions in terms of the computed p-value. i) Compute the �tted values ŷi and the residuals (ei). Make a plot of the �tted values against residuals and, based on what you observe, discuss the credibility of the assumptions in the linear model. 1 https://www.dropbox.com/s/kkwax6h4ooc8b5q/Ex11.05.txt?dl=0 Oppgave 2 We will study the relationship between cholesterol content in blood samples (response variable), and various explanatory variables. The following variables are measured for n = 51 men: y: Total blood cholesterol (mg) x1: body mass index BMI (kg/m2) x2: Alchool (g) x3: Fat (g) x4: Fibers (g) The dietary variables x2, x3 and x4 typically indicate daily intake. Download data from here. We are interested in �nding out which of the dependent variables can explain di�erences in cholesterol Y (response variable). We must �rst look at the model with all the explanatory variables. a) Set up the full multiple linear regression model in matrix form. What are the prerequi- sites/assumptions for using such a linear model? Write down the estimated regression equation and explain what such an equation says about the relationship between blood cholesterol and the explanatory variables. We will now investigate whether it is possible to create a better model for blood cholesterol. We will look at three possible models: I) Yi = �0 + �1x1i + �2x2i + �3x3i + �4x4i + "i , i = 1,… , 51 II) Yi = �0 + �1x1i + �2x2i + "i , i = 1,… , 51 III) Yi = �0 + �1x1i + "i , i = 1,… , 51 b) Set up and perform a test to check if it gives better prediction of blood cholesterol to add the explanatory variables x3 and x4 to model II. Then, set up and perform a test to see if adding x2 to model III gives a better prediction of blood cholesterol. Based on these tests, would you chose model I, II or III? We will also investigate whether there is an interaction between x1 (BMI) and each of the other three dietary variables x2, x3 and x4. A backward elimination procedure is performed where the starting point is the complete model model with x1, x2, x3 and x4. c) Brie�y describe the backward elimination procedure in general and then explain the steps in this particular case. We will from now on use the following model: IV) Yi = �0 + �1x1i + �2x2i + �3x3i + �1,3x1ix3i + "i , i = 1,… , 51 d) Write down the estimated model (the regression equation with the estimated values for the coe�cients). The model contains an interaction term between x1 and x3: explain qualitatively or quantitatively what e�ect this has for people with low/high BMI. e) Find the estimated standard deviation of the slope number for the interaction between x1 and x3 in the model. From the obtained result, �nd a 95% con�dence interval for the 2 https://www.dropbox.com/s/9fgdj9e0h7yg2lj/cholesterol.txt?dl=0 slope of the interaction between x1 and x3 in the model. Based on this interval, can you determine whether such an interaction is signi�cant in this model? f) Explain the di�erence in the interpretation of �1 in models I, III and IV. g) Explain the di�erence between (simple) residuals, ei , and studenti�ed residuals, ri . Make plots for both residuals and use them to discuss if the model assumptions are met. h) Are there any indications that any of the observation should be counted as outliers? i) Some observations (data points) might have a great in�uence on the results of a regression analysis. An approach to �nd such observations is to compute for each of them the Cook’s distance. The Cook’s distance for the ith data point is calculated by �rst removing the ith data point from the model and then recalculating the regression. A large Cook’s distance value indicates that the values in the regression model change a lot when the ith observation is removed. Use the R-function cooks.distance and identify as in�uential points those with a Cook’s distance larger than 4n , where n is the number of data points. l) Multicollinearity is the presence of strong linear dependencies between the variables (some variables can be accurately predicted by a linear combination of the others). To detect multicolinearity, one can look at correlation, but this will only reveal whether two variables are particularly strongly linearly dependent. It is better to look at variance in�ation factor (VIF), which estimates how much the variance of a regression coe�cient is in�ated due to multicollinearity in the mode. The VIF for the coe�cient �j in the model is computed as V IF (j) = 1 1 − R2j Compute the VIF values and compare the results with those given by the R-function vif (available through the R library "car") and comment the results. m) What is predicted blood cholesterol for a new person with x1 = 25, x2 = 30 and x3 = 30? Find an interval that with 90% probability will contain the person’s blood cholesterol level. 3 Oppgave 1 Oppgave 2 Temperature Converted.sugar 1.0 8.1 1.1 7.8 1.2 8.5 1.3 9.8 1.4 9.5 1.5 8.9 1.6 8.6 1.7 10.2 1.8 9.3 1.9 9.2 2.0 10.5 "y" "x1" "x2" "x3" "x4" 223 26.7 23.8 23.5 22 179 23.8 0 29 15.2 197 21.8 26.3 20.4 12.5 187 23.1 27.5 26 30.7 325 28.3 63.6 37 23.2 281 26.4 16.9 32 19.5 250 23.6 2.3 27.7 14.2 183 30 4.5 36.8 22.9 211 27.7 63.6 22.3 19.9 248 20.8 63.6 21.8 18 198 22.7 0 28.8 21.3 250 21.9 91.5 17.4 24.3 178 22.1 60.2 26.4 15.9 222 26.6 16.7 29 22.1 205 22.2 34.1 16 29.6 159 21.9 84.8 34.7 24.6 215 29.8 10.6 30.7 19.5 196 21.6 0 33.6 32.8 275 27.3 5.3 32.4 30.1 269 26.9 52.5 33.1 29.8 300 28.7 39.8 38 40.9 220 26 43.7 44 43.3 180 24.9 0 37.6 26.3 226 23.1 31.8 54.4 38.5 202 24.4 9.1 72.8 19.1 185 18.8 9.6 20.7 16.2 172 21 31.8 31.5 17.1 285 28.4 156.7 33.6 21.9 194 23.5 31.8 51.5 46.9 257 24.1 59.8 36.3 35.4 198 23.1 31.8 52.3 69.2 180 24.6 21.2 33.1 44.8 177 27.3 63.5 24.3 20.9 183 20.9 20.5 30.7 30.2 248 26 0 21.9 20.9 167 24.9 0 11 43.3 166 24.2 6 20.5 11.9 197 24.2 32.5 30.4 27.2 191 23.8 36.1 38.5 24.8 183 25.3 98.8 19.3 21 200 29 65 30.5 40 206 20.5 27.7 50.7 39.1 229 25.3 0.7 38.9 27 195 23.2 41.7 42.1 45.1 202 27.8 0 36.6 34.2 273 21.8 57.7 24.2 19.9 220 24.6 43.9 23.3 23.3 155 23.4 53 21.9 20.8 295 25.4 88.6 43.7 16.1 211 28.4 34.6 31.2 34 214 23.8 37 31.6 21.7
Explanations and Answers 1
0
Question: Oppgave 1 Inference in simple linear regression a) Download data from here and load them in R. Then, set up a linear model with response 푌 (“converted sugar”) as a function of the regressor 푥 (“temperature”). Find the estimated regression line from the data. b) Find 푠 2 and use it to compute a 95% condence interval (CI) for 휎 2 . c) Perform a test with 훼 = 0.05 to determine if 훽1 = 0. Use the p-value. based on your ndings, can we say that “converted sugar” is linearly related to “temperature”? d) Compute a 95% both for 훽0 and 훽1. e) Compute a CI that with 95% probability contains the true value for expected “converted sugar” (mean response) when 푥 = 1.8. f) Compute a prediction interval (PI) that with 95% probability contains the value “converted sugar” (single response) for a new observation when 푥 = 1.8. g) Compute SST, SSR, SSE and R2. Explain what is the interpretation of these values. h) Explain the logic behind an ANOVA test for 훽1 = 0. Then, apply the ANOVA method and draw your conclusions in terms of the computed p-value. i) Compute the tted values 푦̂푖 and the residuals (푒푖 ). Make a plot of the tted values against residuals and, based on what you observe, discuss the credibility of the assumptions in the linear model. Answer: Please find attached the completed work in a zip folder. Let me know if you need anything else. Best Regards
$0.00

From 0 reviews

homeworkdoer
homeworkdoer

answered

Answer Reviews

(0)
This answer has not been reviewed yet. Like to add yours?

Post your Answer - free or at a fee

Login to your tutor account to post an answer

Posting a free answer earns you +20 points.

Login

NB: Post a homework question for free and get answers - free or paid homework help.

Get answers to: Statisical Methods And Programming In R or similar questions only at Tutlance.

Related Questions