Statistics 2014, Fall 2001

February 12, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

Short Description

Download Statistics 2014, Fall 2001...

Description

1

The simple linear regression model makes the following assumptions: i) The relationship between the predictor variable and the response variable is linear, apart from random error; ii) The random error terms in the model are independent, and identically distributed, having a distribution that is normal with mean 0 and variance  . In any situation in which we want to use simple linear regression, these assumptions need to be checked so that we can be confident that the model works. 2

We check the first assumption using the scatterplot of the response variable against the predictor variable. We will check the second assumption using the residuals from the model: If the data are collected in a time sequence, we may check the assumption of independence using a time series plot of the residuals. If we see any pattern, then we will not accept the assumption of independence. We will do a normal q-q plot of the residuals to check the assumption of normality. We will plot the residuals against the predictor variable to check the assumption of constant variance. The values of the residuals should be randomly distributed about the 0 line for all x. If we see any pattern, then we will not accept the assumption of constant variance. Example: (stainless steel stress fracture example, continued) We have already done a scatterplot and seen that the relationship between applied tensile stress and time to fracture seems to be linear. The residuals for the model are given in the table below:

2

RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10

Predicted Y 64.16548673 61.91327434 57.40884956 52.90442478 50.65221239 48.4 43.89557522 39.39115044 34.88672566 30.38230088

Residuals -1.165486726 -3.913274336 -2.408849558 8.095575221 11.34778761 -11.4 -5.895575221 5.608849558 11.11327434 -11.38230088

Excel gives residual plots and normal q-q plots as options for regression. The normal q-q plot from this option actually is not very informative. Hence we will do a normal q-q plot of the residuals using the handout. The plots are shown below: X Variable 1 Residual Plot 15

Residuals

10 5 0 0

10

20

30

40

50

-5 -10 -15 X Variable 1

The plot of the residuals v. tensile stress shows no obvious pattern. Hence we will accept the assumption of homoscedasticity.

3

From the normal q-q plot, we see no reason to doubt the assumption of normality. Normal Q-Q Plot of Residuals

Standardized Order Statistics

2 1.5 1 0.5 0 -2

0

-1

1

2

-0.5 -1 -1.5 -2 Standardized Normal Scores

Confidence Intervals in Simple Linear Regression If the error terms in the model satisfy the assumptions of being i.i.d. normal, then we have    ˆ  1 ~ Normal 1 ,  SS xx  , and  2   1 x ˆ   0 ~ Normal  0 ,    . n SS xx  

ˆ1  1 ˆ   MSE ~ t n  2  se  1 Hence, se ˆ , where SS xx ; and 1

 

4 2 ˆ0  0   1 x ˆ   MSE   ~ t n  2  se  0  n SS  . , where se ˆ0 xx  

 

Then a 100(1-)% confidence interval estimate for 1 is

ˆ1  t  2

 

se ˆ1 , and a 100(1-)% confidence interval estimate for ;n 2

 

ˆ ˆ 0 is  0  t  ;n2 se  0 . 2

Example: (stainless steel stress fracture example, continued) A 95% confidence interval estimate for the slope of the line of best

 

ˆ ˆ fit is 1  t ; n  2 se 1  0.900884956  2.3060.242775962 2

 hrs. hrs.   1.4607 ,  0.3410 2 kg / mm kg / mm2 









  , and a 95% confidence interval 

estimate for the intercept of the line of best fit is

ˆ0  t 2

;n 2

 

se ˆ0  66.41769912  2.3065.648129399

53.3931 hrs., 79.4423 hrs.. Sometimes we want an estimate of the conditional mean of the response variable at a particular value of the predictor variable. An unbiased estimator of the conditional mean at x = x0 is

ˆY |x

0

1 x  x    ˆ0  ˆ1 x0 , which has variance V ˆ Y | x    2   0 . n SS 

0



2

xx

 

Then a 100(1-)% confidence interval estimate of the conditional mean at x = x is ˆ Y | x0  t  se ˆ Y | x0 , where 0

2

;n  2





5



se ˆ Y | x0



 1 x0  x 2    MSE   SS xx  . n

Example: (stainless steel stress fracture example, continued) A point estimate of the mean time to fracture when the stress is at 45 kg/mm2 is ˆ Y |45  ˆ0  ˆ1 x0  66.41769912  0.90088495645  25.8779 hrs.

The mean stress for the sample is 20 hrs, and SSxx = (n-1)S2 = 1412.5. Also MSE = 83.25298673. Then a 95% confidence interval estimate for the mean time to fracture when the stress is at 45 kg/mm2 is ˆY | x0  t se ˆY | x0 2

;n 2





2   45  20  25.8779 hrs.  2.306 83.25298673 0.10  1412.5 

 10.3808 hrs., 41.3750 hrs. .

   

Note: The standard error of ˆY |x0  ˆ0  ˆ1 x0 is an increasing function of the the squared difference between x0 and x . Hence the confidence interval will be narrowest at the mean of x, and will increase in width as the distance from the mean increases. (See p. 279).

6

Statistics 2014, Fall 2001

Short Description

Description

Comments

We need your help!