# Lecture 8 (May 29, June 5)

February 11, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

#### Description

Ch12 Analysis of Variance

Outline Completely randomized designs

Randomized-block designs

Analysis of Variance Single Factor Analysis of Variance Single Factor ANOVA One Way Analysis of Variance One Way ANOVA

Background If we have, say, 3 treatments to compare (A, B, C) then we would need 3 separate t-tests (comparing A with B, A with C, and B with C). If we had 7 treatments we would need 21 separate ttests. This would be time-consuming but, more important, it would be inherently flawed because in each t-test we accept a 5% chance of our conclusion being wrong (when we test for p = 0.05). So, in 21 tests we would expect (by probability) that one test would give us a false result. ANalysis Of Variance (ANOVA) overcomes this problem by enabling us to detect significant differences between the treatments as a whole. We do a single test to see if there are differences between the means at our chosen probability level. Assumption: equal variances, independent

populations, random sampling

The scheme of one-way classification Observations Means Sum of Squares n1

Sample 1 : y11 , y12 ,

, y1 j ,

, y1n1

y1

2 ( y  y )  1j 1 i 1

n2

Sample 2 : y21 , y22 ,

, y2 j ,

, y2 n2

y2

2 ( y  y )  2j 2 i 1

ni

Sample i : yi1 , yi 2 ,

, yij ,

, yini

yi

2 ( y  y )  ij i i 1

nk

Sample k : yk1 , yk 2 ,

, ykj ,

, yknk

yk

2 ( y  y )  kj k i 1

Simplify k

k

ni

T .   yij i 1 j 1

k

N   ni i 1

y

n y i 1 k

n i 1

y

yi

i

i

T.  N

i

is the overall mean or grand mean of all observations. is the mean of the measurements obtained by the i-th laboratory.

The statistical analysis leading to a comparison of the k different population means consists essentially of splitting the sum of squares about the overall grand mean into a component due to treatment difference, and a component due to error or variation within a sample.

EX Suppose 3 drying formulas for curing a glue are studied and the following times observed. Formula A: 13 10 8 11 Formula B: 13 11 14 14 Formula C: 4 1 3 4

8

2

4

Each observation can be decomposed as

yij observation

 y

( yi  y )

grand

deviation due

mean

to treatment

 ( yij  yi ) error

Repeating the decomposition for each observation, we obtain the arrays observation yij 1310 8 11 8 131114 14   4 1 3 4 2

grand mean treament effects y

( yi  y )

error ( yij  yi )

 8 8 8 8 8   2 2 2 2 2  3 0  2 1  2   8 8 8 8   5 5 5 5   0  2 1 1       4  8 8 8 8 8 8  5  5  5  5  5  5 1  2 0 1  1

   1

k

treatment sum of square SS (Tr )   ni ( yi  y )2 i 1 ni

k

error sum of square SSE   ( yij  yi )2 i 1 j 1

Degrees of freedom for treatment: k-1 Degrees of freedom for error: N-k Theorem. SST 

k

ni

k

k

2 ( y  y )  ( y  y )  n ( y  y )  ij  ij i  i i 2

i 1 j 1

SST

ni

2

i 1 j 1

SSE

i 1

SS(Tr)

If i denotes the mean of the i-th population and  denotes the common variance of the k populations.

2

Yij  i  ij

Yij    i   ij

Where  is the mean of the i in the experiment and

is the effect of the i-th treatment; hence k

n i 1

i

i

0

The null hypothesis that the k population means are all equal can be replaced by the null hypothesis

1  2 

 k  0

The alternative hypothesis that at least two of the population means are unequal. To test the null hypothesis that the k population means are all equal, we shall compare two estimates of  2 One based on the variation among the sample means, and one based on the variation within the samples.

Each sum of squares is first converted to a mean square.

sum of squares mean square  degrees of freedom

When the population means are equal, both k

treatment mean square 

2 n ( y  y ) i i i 1

k -1 k

and

error mean square  are estimates of  2

ni

2 ( y  y )  ij i i 1 j 1

N k

If the null hypothesis is true, it can be shown that the two mean squares are independent and that their ratio k

F

2 n ( y  y ) /(k  1) i i

i 1 k ni

2 ( y  y )  ij i /( N  k )

SS (Tr ) /(k  1)  SSE /( N  k )

i 1 j 1

has an F distribution with k-1 and N-k degrees of freedom.

A large value for F indicates large difference between the sample means. Therefore, the null hypothesis will be rejected, ifF  F at  level of significance.

One-way ANOVA Source of variance

Sum of squares

Degree of freedom

Mean square

Treatments

SS(Tr)

K-1

SS (Tr ) s  k 1

Error

SSE

K (n - 1)

s2 

Total

SST

nk - k

2 1

SSE k (n  1)

Computed f s12 s2

Solution of EX

Solution One-way ANOVA: A, B, C Source Factor Error Total

DF SS MS F P 2 270.00 135.00 50.63 0.000 12 32.00 2.67 14 302.00

The value of F0.05 (2,12)  3.89 null hypothesis of equal means.

so we reject the

Exercise Assume that we have recorded the biomass of 3 bacteria in flasks of glucose broth, and we used 3 replicate flasks for each bacterium

Replicate Bacterium A 1 12

Bacterium B 20

Bacterium C

2

15

19

35

3

9

23

42

40

Solution One-way ANOVA: A, B, C Source Factor Error Total

DF SS MS F P 2 1140.2 570.1 64.93 0.000 6 52.7 8.78 8 1192.9

The value of F(2,6) = 5.1 in the level of 0.05 so we reject the null hypothesis of equal means.

12.3 Random-Block designs Two way ANOVA

Blocks

1 1 2 3

2

13 7 6 6 11 5

3

4

9 3 3 1 15 5

RCB Randomized Complete Block The randomized block design is an extension of the paired t-test to situations where the factor of interest has more than two levels.

Example 1: Suppose we are interested in how weight gain (Y) in rats is affected by Source of protein (Beef, Cereal, and Pork) and by Level of Protein (High or Low). There are a total of t = 32 = 6 treatment combinations of the two factors (Beef -High Protein, Cereal-High Protein, Pork-High Protein, Beef -Low Protein, Cereal-Low Protein, and Pork-Low Protein) .

Suppose we have available to us a total of N = 60 experimental rats to which we are going to apply the different diets based on the t = 6 treatment combinations. Prior to the experimentation the rats were divided into n = 10 homogeneous groups of size 6. The grouping was based on factors that had previously been ignored (Example - Initial weight size, appetite size etc.) Within each of the 10 blocks a rat is randomly assigned a treatment combination (diet).

The weight gain after a fixed period is measured for each of the test animals and is tabulated on the next slide:

Randomized Block Design Block 1

107 (1)

96 (2)

112 (3)

83 (4)

87 (5)

90 (6)

Block 6

128 (1)

89 (2)

104 (3)

85 (4)

84 (5)

89 (6)

2

102 (1)

72 (2)

100 (3)

82 (4)

70 (5)

94 (6)

7

56 (1)

70 (2)

72 (3)

64 (4)

62 (5)

63 (6)

3

102 (1)

76 (2)

102 (3)

85 (4)

95 (5)

86 (6)

8

97 (1)

91 (2)

92 (3)

80 (4)

72 (5)

82 (6)

4

93 (1)

70 (2)

93 (3)

63 (4)

71 (5)

63 (6)

9

80 (1)

63 (2)

87 (3)

82 (4)

81 (5)

63 (6)

5

111 (1)

79 (2)

101 (3)

72 (4)

75 (5)

81 (6)

10

103 (1)

102 (2)

112 (3)

83 (4)

93 (5)

81 (6)

Example 2: The following experiment is interested in comparing the effect four different chemicals (A, B, C and D) in producing water resistance (y) in textiles. A strip of material, randomly selected from each bolt, is cut into four pieces (samples) the pieces are randomly assigned to receive one of the four chemical treatments.

This process is replicated three times producing a Randomized Block (RB) design. Moisture resistance (y) were measured for each of the samples. (Low readings indicate low moisture penetration). The data is given in the diagram and table on the next slide.

Diagram: Blocks (Bolt Samples) 9.9 10.1 11.4 12.1

C A B D

13.4 12.9 12.2 12.3

D B A C

12.7 12.9 11.4 11.9

B D C A

Table Chemical A B C D

Blocks (Bolt Samples) 1 2 3 10.1 12.2 11.9 11.4 12.9 12.7 9.9 12.3 11.4 12.1 13.4 12.9

The randomized block design (RBD) consists of a twostep procedure: 1. Matched sets of experimental units, called blocks, are formed, each block consists of a units. The b blocks should consist of experimental units that are as similar as possible (to reduce the within-treatments variation) .

2. One experimental unit from each block is randomly assigned to each treatment, resulting in a total of ab responses.

3. If every block has responses from all treatments, the design is complete, randomized complete block design.

RCB For example, consider the situation where three different methods were used to predict the shear strength of steel plate girders. Say we use four girders as the experimental units.

RCB

1 b yi .   yij b j 1

1 a y. j   yij a i 1

1 a b y..  yij  ab i 1 j 1

The total number of responses is ab.

RCB The appropriate linear statistical model:

We assume

• treatments and blocks are initially fixed effects • blocks do not interact •

RCB

The hypotheses of interest are: i.e., there is no treatments effect

RCB

has a-1 df has b-1 df

has (a-1)(b-1) df

RCB The mean squares are:

RCB The expected values of these mean squares are:

RCB

RCB

Minitab

Two-way ANOVA: response versus row, col Source DF SS MS F P row 2 56 28.0000 3.23 0.112 col 3 90 30.0000 3.46 0.091 Error 6 52 8.6667 Total 11 198 The P-value > 0.05 level of significance, we cannot reject the null hypothesis.

The Anova Table for Diet Experiment

Source Block Diet ERROR Total

S.S 5992.41667 4572.88333 3147.28333 13712.58

d.f. 9 5 45 59

M.S. F 665.82407 9.52 914.576666 13.0766586 69.93963

p-value 0.00000 0.00000

The Anova Table forTextile Experiment

SOURCE Blocks Chem ERROR Total

SUM OF SQUARES 7.17167 5.20000 0.53500 12.90667

D.F. 2 3 6

MEAN SQUARE 3.5858 1.7333 0.0892 11

F 40.21 19.44

TAIL PROB. 0.0003 0.0017