The Central Limit Theorem

February 11, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Normal Distribution
Share Embed Donate


Short Description

Download The Central Limit Theorem...

Description

Paul Cornwell March 31, 2011

1



Let X1,…,Xn be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large.

2





How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good? What is a ‘good’ approximation?

3



Permits analysis of random variables even when underlying distribution is unknown



Estimating parameters



Hypothesis Testing



Polling

4





Performing a hypothesis test to determine if set of data came from normal Considerations ◦ Power: probability that a test will reject the null hypothesis when it is false ◦ Ease of Use

5



Problems ◦ No test is desirable in every situation (no universally most powerful test) ◦ Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal) ◦ The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected

6



Symmetric



Unimodal



Bell-shaped



Continuous

7



Skewness: Measures the asymmetry of a distribution. ◦ Defined as the third standardized moment ◦ Skew of normal distribution is 0

 X     1  E     

3

  

 X n

i 1

i

X



3

(n  1) s 3

8



Kurtosis: Measures peakedness or heaviness of the tails. ◦ Defined as the fourth standardized moment ◦ Kurtosis of normal distribution is 3

4     x      2  E        n  

 X n

i 1

i

X



4

(n  1) s 4

9



Cumulative distribution function: X

F ( x; n, p)   n C i  p i (1  p) ni i 0



E[ X ]  np Var[ X ]  np(1  p)

10

parameters Kurtosis

Skewness

% outside 1.96*sd

K-S distance

Mean Std Dev

n = 20 p = .2

-.0014 (.25)

.3325 (1.5)

.0434

.128

3.9999 1.786

n = 25 p = .2

.002

.3013

.0743

.116

5.0007 2.002

n = 30 p = .2

.0235

.2786

.0363

.106

5.997 2.188

n = 50 p = .2

.0106

.209

.0496

.083

10.001 2.832

.005

.149

.05988

.0574

19.997 4.0055

n = 100 p = .2

*from R

11



Cumulative distribution function:

xa F ( x; a, b)  ba 

ab E[ X ]  2 (b  a ) 2 Var[ X ]  12

12

parameters

Kurtosis

Skewness

% outside 1.96*sd

K-S distance

Mean Std Dev

n=5 (a,b) = (0,1)

-.236 (-1.2)

.004 (0)

.0477

.0061

.4998 .1289 (.129)

n=5 (a,b) = (0,50)

-.234

0

.04785

.0058

24.99 6.468 (6.455)

n=5 (a,b) = (0, .1)

-.238

-.0008

.048

.0060

.0500 .0129 (.0129)

n=3 (a,b) = (0,50)

-.397

-.001

.0468

.01

24.99 8.326 (8.333) *from R

13



Cumulative distribution function:

F ( x;  ) 1  e 

E[ X ] 

x

1



Var[ X ] 

1

2

14

parameters

Kurtosis

Skewness

% outside 1.96*sd

K-S distance

Mean Std Dev

1.239 (6)

.904 (2)

.0434

.0598

.9995 .4473 (.4472)

n = 10

.597

.630

.045

.042

1.0005 .316 (.316)

n = 15

.396

.515

.0464

.034

.9997 .258 (.2581)

n=5 λ=1

*from R

15



Find n values for more distributions



Refine criteria for quality of approximation



Explore meanless distributions



Classify distributions in order to have more general guidelines for minimum sample size

16

Paul Cornwell May 2, 2011

17







Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases Rate of converge depends on underlying distribution What sample size is needed to produce a good approximation from the CLT?

18





Real-life applications of the Central Limit Theorem What does kurtosis tell us about a distribution?



What is the rationale for requiring np ≥ 5?



What about distributions with no mean?

19



Probability for total distance covered in a random walk tends towards normal



Hypothesis testing



Confidence intervals (polling)



Signal processing, noise cancellation

20



Measures the “peakedness” of a distribution



Higher peaks means fatter tails

4     x      2  E  3       n  

21







Traditional assumption for normality with binomial is np > 5 or 10 Skewness of binomial distribution increases as p moves away from .5 Larger n is required for convergence for skewed distributions

22









Has no moments (including mean, variance) Distribution of averages looks like regular distribution CLT does not apply 1 f ( x)   (1  x 2 )

23







α = β = 1/3 Distribution is symmetric and bimodal Convergence to normal is fast in averages

24





Heavier-tailed, bell-shaped curve Approaches normal distribution as degrees of freedom increase

25







4 statistics: K-S distance, tail probabilities, skewness and kurtosis Different thresholds for “adequate” and “superior” approximations Both are fairly conservative

26

Distribution

∣Kurtosis∣
View more...

Comments

Copyright � 2017 NANOPDF Inc.
SUPPORT NANOPDF