Document

February 11, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

Description

Statistical Analysis Professor Lynne Stokes Department of Statistical Science

Lecture #2 Chi-square Tests for Homogeneity, Chi-square Goodness of Fit Test, 1

Chi-square Tests 1.

2.

Tests for independence in contingency tables Tests for homogeneity

2

Binomial Samples (Product Binomial Sampling) Ho: pW = 0.5 vs. Ha: pW

Genetic Theory: 1 60 40 100

Wrinkled Smooth Total

2 108 92 200

3 80 100 180

4 118 90 208

5 165 135 300

6 106 76 182

7 105 125 230

0.5

8 90 110 200

Total 832 768 1600

Assumptions: 8 samples, mutually independent counts 

 

Hypothesis #1: Is pw = 0.5?  Binomial inference on p  Equivalently, overall goodness of fit (known p) Hypothesis #2: Are all the pw equal?  Test for homogeneity (equal but unknown p) Hypothesis #3: Is each pw = 0.5?  Goodness of fit (8 samples, known p) 3

Test of Homogeneity of k Binomial Samples, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj  0.5 for some j Wrinkled Smooth Total

1 60 40 100

2 108 92 200

3 80 100 180

4 118 90 208

5 165 135 300

6 106 76 182

7 105 125 230

8 90 110 200

Total 832 768 1600

Chisquare

4.00

1.28

2.22

3.77

3.00

4.95

1.74

2.00

22.96

k

 2 (k)    2j (1) j1

Does not assume homogeneity (see below)

X2 = 22.96 , df = 8 , p = 0.003 4

Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pj  pk for some (j,k) Wrinkled Smooth Total

1 60 40 100

2 108 92 200

3 80 100 180

4 118 90 208

5 165 135 300

6 106 76 182

7 105 125 230

8 90 110 200

Total 832 768 1600

Expected Wrinkled Smooth

Ri E ij  C jpˆ i , pˆ i  , R i  ith row total, C j  jth column total n

E ij 

Ri  C j n 5

Test of Homogeneity of k Binomial Samples: Unspecified p Ho: p1 = p2 = … =p8 vs. Ha: pj  pk for some (j,k) Wrinkled Smooth Total

Wrinkled Smooth

Wrinkled Smooth

1 60 40 100

52.00 48.00

1.23 1.33

2 108 92 200

104.00 96.00

0.15 0.17

3 80 100 180

4 118 90 208

5 165 135 300

6 106 76 182

7 105 125 230

8 90 110 200

Total 832 768 1600

93.60 86.40

Expected 108.16 156.00 99.84 144.00

94.64 87.36

119.60 110.40

104.00 96.00

832 768

1.98 2.14

Chi-square 0.90 0.52 0.97 0.56

1.36 1.48

1.78 1.93

1.88 2.04

20.43

pˆ  0.52

X2 = 20.43 , df = 7 , p = 0.005

Note : df  8 1  7 (only estimatedone parametersince pˆ 2  1  pˆ 1) Note: Only one of each pair of expected values is independently estimated (k = 8, not 16)

6

Chi-square Tests 1.

2. 3.

Tests for independence in contingency tables Tests for homogeneity Goodness of fit tests

7

Chi-square Goodness of Fit Test: Specified Probabilities Assumptions   

n independent observations k mutually exclusive possible outcomes pj = Pr(outcome j) is the same on every trial Sample size condition All npj  1 At least 80% of the npj  5 8

Goodness of Fit Test: Specified Probabilities Sample size: n Observed count for outcome j : Oj Expected count for outcome j : Ej = npj Ho: Pr(outcome j) = pj Ha: Pr(outcome j)  pj

for j = 1 , ... , k for at least one j

Reject Ho if X2 > Xa2 k

(O j - E j ) 2

j=1

Ej

X2 = 

Xa2 = Chi-Square df = k - 1 9

Cognitive Learning Path Chosen A B C

D Total Number of rats 4 5 8 15 32 Expected number 8 8 8 8 32 1 1 H0 : p j  j  1, 2, 3, 4 vs. H a : p j  for some j 4 4 (4 - 8)2 (5 - 8)2 (8 - 8)2 (15 - 8)2  = + + + 8 8 8 8 = 2.00 + 1.12 + 0.00 + 6.12 = 9.24 2

p = 0.026

Using a significance level of a = 0.05, there is sufficient Sufficient Evidence of evidence (p = 0.026) to reject the hypothesis that rats Cognitive Learning ? choose the 4 doors with equal probability.

10

Mendelian Inheritance Do the genotypes of a cross-breeding occur in the ratio 9:3:3:1 ? 9 3 1 H0 : p1 = , p 2 = p3 = , p 4 = 16 16 16 Genotype Observed Expected

1 150 144

2 46 48

Ha : Some probabilit ies differ

3 40 48

4 20 16

Total 256

Reject Ho if X2 > 7.815 (a = 0.05) 11

Mendelian Inheritance Genotype Observed Expected

(O j - E j ) 2 Ej

1 150 144

:

0.25

2 46 48

3 40 48

4 20 16

0.08

1.33

1.00

Total 256

X2 = 0.25 + 0.08 + 1.33 + 1.00 = 2.66 There is insufficient evidence (p > 0.10) at a significance level of 0.05 to conclude that the genotypes from this type of cross-breeding occur in proportions that differ from 12 those predicted by Mendelian inheritance theory.

Chi-Square Goodness of Fit Test: Unknown Parameters  

Estimate the parameters of the distribution Divide range of data values into mutually exclusive and exhaustive classes  

 

Discrete data: often use the values themselves Continuous data: use k = n1/2 or k = log(n) classes

Estimate the probability of being in each class Compare the observed (Oi) counts in each class with the estimated expected (Ei) counts k

X i 1

Oi - E i 2 ~  2 (k - r - 1) Ei

, r  # estimated parameters 13

Chi-Square Goodness of Fit Test for the Poisson Distribution Number of senders (automated telephone equipment) in use at a given time H0: number ~ Poisson Ha: number not Poisson Reject if X > 20.05(20) = 31.4 df: 22 – 1 (mutually exclusive & exhaustive) – 1 (estimated parameter) = 20

ˆ  10.4378 , X 2  43.16

Number in Use 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Total

Observed Frequency 0 5 14 24 57 111 197 278 378 418 461 433 413 358 219 145 109 57 43 16 7 8 3 3754

Estimated Probability 0.0000 0.0003 0.0016 0.0055 0.0145 0.0302 0.0526 0.0784 0.1023 0.1187 0.1239 0.1176 0.1023 0.0822 0.0613 0.0427 0.0278 0.0171 0.0099 0.0054 0.0028 0.0014 0.0007 0.9995

Expected Frequency 0.11 1.15 5.98 20.82 54.34 113.46 197.41 294.42 384.21 445.67 465.27 441.58 384.16 308.51 230.05 160.11 104.47 64.16 37.21 20.45 10.67 5.31 2.52 3752

(Obs - Exp)2 Exp 11.16 10.74 0.49 0.13 0.05 0.00 0.92 0.10 1.72 0.04 0.17 2.16 7.94 0.53 1.43 0.20 0.80 0.90 0.97 1.26 1.37 0.09 43.16

23 – 1 = 22 Categories

p  0.002 14

Chi-Square Goodness of Fit Test for the Normal Distribution 

Divide the data into mutually exclusive and exhaustive (contiguous) classes   and last classes are  open First ended  ( ,LUj -1y), (L2,U2), U j(L - y3, U3) … (Lk, z Lj  L = U z Uj  ) with js y j-1 sy Estimate the mean and standard deviation Calculate z-scores for the limits of each class

15

Chi-Square Goodness of Fit Test 

Can be applied to any discrete or continuous probability distribution, only probabilities need be specified: Ei = npi Asymptotic chi-square distribution 

 

All Ei > 1 & at Least 80% of the Ei > 5

Does not have the highest power for specific distributions, against specific alternatives Degrees of freedom (k classes) 

If each class represents an independent sample (i.e, k replicate samples) and all parameters are known (i.e., known probabilities), df = k If the classes represent mutually exclusive and exhaustive categories (i.e., expected frequencies must sum to n), data are independent and from a single sample  All parameters are known, df = k – 1  r parameters are estimated: df = k – r – 1 

e.g., (n – 1)s2/s2 ~ 2(n – 1)

16

Goodness of Fit to the Binomial, Known p 

Normal theory approximation Chi-square tests

17

Binomial Sample, Specified p: Normal Theory Approximation Genetic Theory: Wrinkled Smooth Total

Ho: pW = 0.5 vs. Ha: pW  0.5

1 2 3 4 5 6 7 8 Greater Power by Combining Samples 60 108 80 118 165 106 105 90 40 92 100 90 135 76 125 110 100 200(Assuming 180 208 Homogeneity) 300 182 230 200

832  800 z  1.600 1600(0.5)(0.5)

Total 832 768 1600

p = 0.110

18

Alternative to the Binomial Test: Chi-square Goodness of Fit, Specified p Ho: pW = 0.5 vs. Ha: pW  0.5

Genetic Theory: Observed Expected

Wrinkled Smooth 832 768 800 800

Total 1600 1600

2 2 ( 832  800 ) ( 768  800 ) 2    2.56 800 800

p = 0.110

z 2  (1.60)2  2.56

19

Overall Binomial Test vs. Test of Homogeneity, Specified p Ho: p1 = p2 = … =p8 = 0.5 vs. Ha: pj  0.5 for some j Wrinkled Smooth Total

1 60 40 100

2 108 92 200

3 80 100 180

4 118 90 208

5 165 135 300

6 106 76 182

7 105 125 230

8 90 110 200

Total 832 768 1600

Chisquare p-value

4.00 0.046

1.28 0.258

2.22 0.136

3.77 0.052

3.00 0.083

4.95 0.026

1.74 0.187

2.00 0.157

22.96 0.003

X2 = 2.56

, df = 1 , p = 0.110

X2 = 22.96 , df = 8 , p = 0.003

Greater Power if Homogeneous Greater Power if Not Homogeneous

Note : 5 of the pˆ  0.5 and 3 of the pˆ  0.5

20

Binomial Samples Wrinkled Smooth Total

1 60 40 100

2 108 92 200

3 80 100 180

4 118 90 208

Homogeneit y Test of H 0 : p j

5 165 135 300

6 106 76 182

7 105 125 230

8 90 110 200

 0.5

 2  22.96

Overall Test of H 0 : p W  0.5

 2  2.56

Homogeneit y Test of H 0 : p j Note : p j  pW  j

 pW

 2  20.43

Total 832 768 1600

pw unspecified

pij  pi p j  (i, j)

More Common H o : pij  pi p j vs. H a : pij  pi p j Homogeneity, unspecified p equivalent to independence

21

Some Goodness of Fit Tests 

Chi-square Goodness-of-fit test 

Kolmogorov-Smirnov goodness-of-fit test 

Very general, can have little power Good general test, especially for continuous random variables

Wilk-Shapiro test for normality 

Regarded as the best test for normality

22

Comparing Odds Ratios Across Categories

23

Race and Death Penalty Punishment Across Aggravation Levels Victim's Death Penalty Race Yes No Total White 45 85 130 Black 14 218 232 Total 59 303 362

Expected Frequencies Victim's Death Penalty Race Yes No Total White 21.1878 108.8122 130 Black 37.8122 194.1878 232 Total 59 303 362

Column Percentages Victim's Death Penalty Race Yes No White 76.3 28.1 Black 23.7 71.9 Total 100 100

Cell Chi-square Values Victim's Death Penalty Race Yes No Total White 25.6494 4.9944 30.6439 Black 14.3725 2.7986 17.17115 Total 40.02199 7.79306 47.81504

Total 35.9 64.1 100

Chisquare Value p-Value

47.82 < 0.0001

Are the results consistent across aggravation levels ?

24

Mantel-Haenszel Test  

Victim's Race White Black Total

Several 2 x 2 tables Assuming a common odds ratio, test that the odds ratio = 1

Aggravation Level = 1 Death Penalty Yes No Total 2 60 62 1 181 182 3 241 244

Aggravation Level = 4 Victim's Death Penalty Race Yes No Total White 9 3 12 Black 2 4 6 Total 11 7 18

Victim's Race White Black Total

Aggravation Level = 2 Death Penalty Yes No Total 2 15 17 1 21 22 3 36 39

Aggravation Level = 5 Victim's Death Penalty Race Yes No Total White 9 0 9 Black 4 3 7 Total 13 3 16

Victim's Race White Black Total

Aggravation Level = 3 Death Penalty Yes No Total 6 7 13 2 9 11 8 16 24

Aggravation Level = 6 Victim's Death Penalty Race Yes No Total White 17 0 17 Black 4 0 4 Total 21 0 21

25

Race and Death Penalty Punishment Expected frequencies for chi-square test of independence Aggravation Level = 1 Victim's Death Penalty Race Yes No Total White 0.7623 61.2377 62 Black 2.2377 179.7623 182 Total 3 241 244

Victim's Race White Black Total

Aggravation Level = 4 Death Penalty Yes No Total 7.3333 4.6667 12 3.6667 2.3333 6 11 7 18

Aggravation Level = 2 Victim's Death Penalty Race Yes No Total White 1.3077 15.6923 17 Black 1.6923 20.3077 22 Total 3 36 39

Victim's Race White Black Total

Aggravation Level = 5 Death Penalty Yes No Total 7.3125 1.6875 9 5.6875 1.3125 7 13 3 16

Aggravation Level = 3 Victim's Death Penalty Race Yes No Total White 4.3333 8.6667 13 Black 3.6667 7.3333 11 Total 8 16 24

Victim's Race White Black Total

Aggravation Level = 6 Death Penalty Yes No Total 17 0 17 4 0 4 21 0 21

Note: None have sufficient sample sizes for tests of independence

26

Mantel-Haenszel Test Select one cell; e.g., upper-left Calculate the excess for each table

1.

2. • •

Calculate the variances of the excesses

3. •

4.

Excess = Observed – Expected e.g., Excess = O11 – E11 Variance = R1R2C1C2/n2(n-1) z

 Excesses Across Tables

 Variances

Across Tables

27

Race and Death Penalty Punishment Aggrivation Level = 1 Victim's Death Penalty Race Yes No White 2 60 Black 1 181 Total 3 241

Victim's Race White Black Total

Aggrivation Level = 4 Death Penalty Yes No 9 3 2 4 11 7

Total 62 182 244

Aggrivation Level = 2 Victim's Death Penalty Race Yes No White 2 15 Black 1 21 Total 3 36

Total 12 6 18

Victim's Race White Black Total

Aggravation Excess Variance

1 1.238 0.564

Aggrivation Level = 5 Death Penalty Yes No 9 0 4 3 13 3

Total 17 22 39

Aggrivation Level = 3 Victim's Death Penalty Race Yes No White 6 7 Black 2 9 Total 8 16

Total 13 11 24

Total 9 7 16

Victim's Race White Black Total

Aggrivation Level = 6 Death Penalty Yes No 17 0 4 0 21 0

Total 17 4 21

2 0.692 0.699

3 1.667 1.382

4 1.667 1.007

z-Score

3.356

p-Value

5 1.688 0.640

6 0.000 0.000

Total 6.952 4.292

0.0004

Conclusion: Nearly 7 more white-victim murderers received the death penalty than would be expected if the odds were the same for white- and black-victim murderers

28

29

Estimating the Common Odds Ratio n11n 22 / T over all the tables   ˆ  n12n 21 / T over all the tables Death Penalty and Race  ˆ  5.49

30