Lecture 6: Normal distribution, central limit theorem

February 5, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Normal Distribution
Share Embed Donate


Short Description

Download Lecture 6: Normal distribution, central limit theorem...

Description

Stats for Engineers Lecture 6

Answers for Question sheet 1 are now online http://cosmologist.info/teaching/STAT/

Answers for Question sheet 2 should be available Friday evening

Summary From Last Time 𝑃(βˆ’1.5 < 𝑋 < βˆ’0.7)

Continuous Random Variables

𝑓(π‘₯)

Probability Density Function (PDF) 𝑓(π‘₯) 𝑏

𝑃 π‘Žβ‰€π‘‹β‰€π‘ =

∞

𝑓 π‘₯ β€² 𝑑π‘₯β€²

π‘Ž

Exponential distribution

𝑓 π‘₯ 𝑑π‘₯ = 1 βˆ’βˆž

πœˆπ‘’ βˆ’πœˆπ‘¦ , 𝑓 𝑦 = 0,

𝑦>0 𝑦 βˆ’0.7) 1 βˆ’ 𝑃(𝑍 > βˆ’0.7) 1 βˆ’ 𝑃(𝑍 < βˆ’0.7)

60%

10%

1

17%

13%

2

3

4

Symmetries If 𝑍 ∼ 𝑁(0,1), which of the following is NOT the same as 𝑃(𝑍 < 0.7)?

1 βˆ’ 𝑃 𝑍 > 0.7

𝑃(𝑍 > βˆ’0.7)

1 βˆ’ 𝑃(𝑍 > βˆ’0.7)

1 βˆ’ 𝑃(𝑍 < βˆ’0.7)

οƒΌ

=

-

οƒΌ

-

-

=

=

οƒΌ

(d) 𝑃 0.5 < 𝑍 < 1.5 = 𝑃 𝑍 < 1.5 βˆ’ 𝑃(𝑍 < 0.5) = Q 1.5 βˆ’ Q(0.5)

= 0.9332 βˆ’0.6915 = 0.2417

=

βˆ’

(e) 𝑃 𝑍 < 1.356 Between Q 1.35 = 0.9115 and Q 1.36 = 0.9131 Using interpolation Q 1.356 = A 𝑄 1.35 + B 𝑄(1.36) Fraction of distance between 1.35 and 1.36: 𝐡=

1.356 βˆ’ 1.35 1.36 βˆ’ 1.35

𝐴 =1βˆ’π΅

= 0.6

= 0.4

Q 1.356 = 0.4𝑄 1.35 + 0.6𝑄(1.36)

=0.9125

(f) 0.8 = 𝑃 𝑍 ≀ 𝑐 = Q(𝑐) What is 𝑐?

Use table in reverse:

𝑧 between 0.84 and 0.85 Interpolating as before 𝑐 = 𝐴 Γ— 0.084 + 𝐡 Γ— 0.085 0.8βˆ’0.7995

𝐡 = 0.8023βˆ’0.7995 = 0.18 𝐴 = 1 βˆ’ 𝐡 = 0.82 β‡’ 𝑐 = 0.82 Γ— 0.084 + 0.18 Γ— 0.085

β‰ˆ 0.0842.

Using Normal tables The error 𝑍 (in Celsius) on a cheap digital thermometer has a normal distribution, with 𝑍 ∼ 𝑁 0,1 . What is the probability that a given temperature measurement is too cold by more than 1.54∘ C?

1. 2. 3. 4.

0.0618 0.9382 0.1236 0.0735

43% 36%

19%

2% 1.

2.

3.

4.

Using Normal tables

The error 𝑍 (in Celsius) on a cheap digital thermometer has a normal distribution, with 𝑍 ∼ 𝑁 0,1 . That is the probability that a given temperature measurement is too cold by more than 1.54∘ C?

Answer: Want 𝑃 𝑍 < βˆ’1.54 = = 𝑃(𝑍 > 1.54) = 1 βˆ’ 𝑃(𝑍 < 1.54) = 1 βˆ’ Q(1.54) = 1 βˆ’ 0.9382 = 0.0618

(g) Finding a range of values within which 𝑍 lies with probability 0.95: The answer is not unique; but suppose we want an interval which is symmetric about zero i.e. between βˆ’π‘‘ and 𝑑.

0.95

So 𝑑 is where Q 𝑑 = 0.975 0.025+0.95 βˆ’π‘‘ 0.025 0.975

𝑑

𝑑 0.05/2=0.025

Use table in reverse: Q 𝑑 = 0.975

β‡’ 𝑍 = 1.96

95% of the probability is in the range βˆ’1.96 < 𝑍 < 1.96

In general 95% of the probability lies within 1.96𝜎 of the mean πœ‡

P=0.025 P=0.025

The range πœ‡ Β± 1.96𝜎 is called a 95% confidence interval.

Question from Derek Bruff

Normal distribution If 𝑋 has a Normal distribution with mean πœ‡ = 20 and standard deviation 𝜎 = 4, which of the following could be a graph of the pdf of 𝑋?

1.

2.

3.

4.

45%

32% 20%

2% 1

2

3

4

Normal distribution If 𝑋 has a Normal distribution with mean πœ‡ = 20 and standard deviation 𝜎 = 4, which of the following could be a graph of the pdf of 𝑋?

1.

2.

Too wide

3.

Correct

4.

Wrong mean

i.e. Mean at πœ‡ = 20, 95% inside (5% outside) of πœ‡ Β± 2𝜎, i.e. 20 Β± 8

Too narrow

Example: Manufacturing variability

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222). (i) Find the probability that X exceeds 14.99 mm. (ii) Within what range will X lie with probability 0.95? (iii) Find the probability that a randomly chosen pipe fits into a randomly chosen fitting (i.e. X < Y).

Y

X

Example: Manufacturing variability

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222). (i) Find the probability that X exceeds 14.99 mm. Answer: π‘‹βˆΌπ‘

πœ‡, 𝜎 2

Reminder: =

𝑍=

𝑁(15.0, 0.022 )

𝑃 𝑋 > 14.99 = 𝑃 𝑍 >

14.99 βˆ’ 15.0 0.02

= 𝑃 𝑍 > βˆ’0.5 = 𝑃 𝑍 < 0.5 = 𝑄(0.5) β‰ˆ 0.6915

π‘‹βˆ’πœ‡ 𝜎

Example: Manufacturing variability

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222). (ii) Within what range will X lie with probability 0.95? Answer From previous example 𝑃 βˆ’1.96 < 𝑍 < 1.96 = 0.95 i.e. 𝑋 lies in πœ‡ Β± 1.96𝜎 with probability 0.95 β‡’ 𝑋 = 15 Β± 1.96 Γ— 0.02

β‡’ 14.96mm < 𝑋 < 15.04mm

Where is the probability We found 95% of the probability lies within 14.96mm < 𝑋 < 15.04mm What is the probability that 𝑋 > 15.04mm? P=0.025 P=0.025

1. 2. 3. 4.

0.025 0.05 0.95 0.975

71%

14%

11% 4%

1.

2.

3.

4.

Example: Manufacturing variability

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222). (iii) Find the probability that a randomly chosen pipe fits into a randomly chosen fitting (i.e. X < Y). Answer For 𝑋 < π‘Œ we want 𝑃(π‘Œ βˆ’ 𝑋 > 0). To answer this we need to know the distribution of π‘Œ βˆ’ 𝑋, where π‘Œ and 𝑋 both have (different) Normal distributions

Distribution of the sum of Normal variates Means and variances of independent random variables just add. If 𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent and each have a normal distribution 𝑋𝑖 ∼ 𝑁 πœ‡π‘– , , πœŽπ‘–2 β‡’ πœ‡π‘‹1+𝑋2 = πœ‡π‘‹1 + πœ‡π‘‹2

πœŽπ‘‹21+𝑋2 = πœŽπ‘‹21 + πœŽπ‘‹22

Etc.

A special property of the Normal distribution is that the distribution of the sum of Normal variates is also a Normal distribution. [stated without proof] If 𝑐1 , 𝑐2 , … , 𝑐𝑛 are constants then: 𝑐1 𝑋1 + 𝑐2 𝑋2 + β‹― 𝑐𝑛 𝑋𝑛 ∼ 𝑁(𝑐1 πœ‡1 + β‹― + 𝑐𝑛 πœ‡π‘› , 𝑐12 𝜎 2 + 𝑐22 𝜎 2 + β‹― + 𝑐𝑛2 𝜎 2 ) E.g.

𝑋1 + 𝑋2 ∼ 𝑁(πœ‡1 + πœ‡2 , 𝜎12 + 𝜎22 ) 𝑋1 βˆ’ 𝑋2 ∼ 𝑁(πœ‡1 βˆ’ πœ‡2 , 𝜎12 + 𝜎22 )

Example: Manufacturing variability

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222). (iii) Find the probability that a randomly chosen pipe fits into a randomly chosen fitting (i.e. X < Y). Answer For 𝑋 < π‘Œ we want 𝑃(π‘Œ βˆ’ 𝑋 > 0). π‘Œ βˆ’ 𝑋 ∼ 𝑁 πœ‡π‘Œ βˆ’ πœ‡π‘‹ , πœŽπ‘Œ2 + πœŽπ‘‹2 = 𝑁 15.07 βˆ’ 15,0.022 + 0.0222 = 𝑁(0.07,0.000884) Hence 𝑃 π‘Œβˆ’π‘‹ >0 =𝑃 𝑍 >

0βˆ’0.07 0.0.000884

= 𝑃 𝑍 > βˆ’2.35

= 𝑃 𝑍 < 2.35 =

β‰ˆ 0.991

Which of the following would make a random pipe more likely to fit into a random fitting?

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222).

Y

1. 2. 3. 4.

X

Decreasing mean of Y Increasing the variance of X Decreasing the variance of X Increasing the variance of Y

55%

16%

1

16%

14%

2

3

4

Which of the following would make a random pipe more likely to fit into a random fitting?

The outside diameter, X mm, of a copper pipe is N(15.00, 0.022) and the fittings for joining the pipe have inside diameter Y mm, where Y ~ N(15.07, 0.0222).

Y Answer Common sense.

Or use 𝑑 = π‘Œ βˆ’ 𝑋 ∼ 𝑁 πœ‡π‘Œ βˆ’ πœ‡π‘‹ , πœŽπ‘Œ2 + πœŽπ‘‹2

0 βˆ’ πœ‡π‘‘ πœ‡π‘‘ 𝑃 𝑋0 =𝑃 𝑍> =𝑃 𝑍< πœŽπ‘‘ πœŽπ‘‘ Larger probability if - πœ‡π‘‘ larger (bigger average gap between pipe and fitting) - πœŽπ‘‘ smaller (less fluctuation in gap size) πœŽπ‘‘2 = πœŽπ‘‹2 + πœŽπ‘Œ2 , so πœŽπ‘‘ is smaller if variance of 𝑋 is decreased

X

Normal approximations

Central Limit Theorem: If 𝑋1 , 𝑋2 … are independent random variables with the same distribution, which has mean πœ‡ and variance 𝜎 2 (both finite), then the sum 𝑛𝑖=1 𝑋𝑖 tends to the distribution 𝑁(π‘›πœ‡, π‘›πœŽ 2 ) as 𝑛 β†’ ∞. 1

Hence: The sample mean 𝑋𝑛 = 𝑛

𝑛 𝑖=1 𝑋𝑖

is distributed approximately as

𝜎2 𝑁(πœ‡, 𝑛 ).

For the approximation to be good, n has to be bigger than 30 or more for skewed distributions, but can be quite small for simple symmetric distributions. The approximation tends to have much better fractional accuracy near the peak than in the tails: don’t rely on the approximation to estimate the probability of very rare events. It often also works for the sum of non-independent random variables, i.e. the sum tends to a normal distribution (but the variance is harder to calculate)

Example: Average of n samples from a uniform distribution:

Example: The mean weight of people in England is μ=72.4kg, with standard deviation 𝜎 =15kg. The London Eye at capacity holds 800 people at once. What is the distribution of the weight of the passengers at any random time when the Eye is full?

Answer: The total weight π‘Š of passengers is the sum of 𝑛 = 800 individual weights. Assuming independent: β‡’ by the central limit theorem π‘Š ∼ 𝑁(π‘›πœ‡, π‘›πœŽ 2 ) 𝜎 = 15Kg, 𝑛 = 800 β‡’ π‘Š ∼ 𝑁 800 Γ— 72.4kg, 800 Γ— 152 kg 2 = 𝑁(58000kg, 180000kg 2 ) i.e. Normal with πœ‡π‘Š = 58000Kg , πœŽπ‘Š = 180000Kg = 424Kg [usual caveat: people visiting the Eye unlikely to actually have independent weights, e.g. families, school trips, etc.]

Course Feedback Which best describes your experience of the lectures so far?

Stopped prematurely, not many answers 1. 2. 3. 4. 5. 6.

7. 8.

Too slow Speed OK, but struggling to understand many things Speed OK, I can understand most things A bit fast, I can only just keep up Too fast, I don’t have time to take notes though I still follow most of it Too fast, I feel completely lost most of the time I switch off and learn on my own from the notes and doing the questions I can’t hear the lectures well enough (e.g. speech too fast to understand or other people talking)

38%

31%

13%

13%

6%

0% 1

0% 2

3

4

5

6

0% 7

8

Course Feedback What do you think of clickers?

1.

2.

3. 4.

5.

I think they are a good thing, help me learn and make lectures more interesting I enjoy the questions, but don’t think they help me learn I think they are a waste of time I think they are a good idea, but better questions would make them more useful I think they are a good idea, but need longer to answer questions

65%

14%

12%

7% 2% 1

2

3

4

5

Course Feedback How did you find the question sheets so far?

1. 2. 3. 4. 5.

Challenging but I managed most of it OK Mostly fairly easy Had difficulty, but workshops helped me to understand Had difficulty and workshops were very little help I’ve not tried them

47%

20% 14%

16%

4%

1

2

3

4

5

Normal approximation to the Binomial

If 𝑋 ∼ 𝐡(𝑛, 𝑝) and 𝑛 is large and 𝑛𝑝 is not too near 0 or 1, then 𝑋is approximately 𝑁 𝑛𝑝, 𝑛𝑝 1 βˆ’ 𝑝 .

1 2 𝑛 = 10 𝑝=

1 2 𝑛 = 50 𝑝=

p=0.5

p=0.5

Approximating a range of possible results from a Binomial distribution e.g. 𝑃(6 or fewer heads tossing a coin 10 times) = 𝑃(𝑋 ≀ 6) if 𝑋 ∼ 𝐡(10,0.5)

𝑃 𝑋 ≀ 6 = 𝑃 π‘˜ = 0 + 𝑃 π‘˜ = 1 + β‹― + 𝑃 π‘˜ = 6 = 0.8281 π‘₯βˆ’πœ‡ 2 6.5 βˆ’ 2𝜎2 𝑒

β‰ˆ

βˆ’βˆž

β‰ˆQ

2πœ‹πœŽ 2 1.5 2.5

6.5 βˆ’ πœ‡ =Q 𝜎

= Q 0.9487 = 0.8286

πœ‡ = 𝑛𝑝 = 5 𝜎 2 = 𝑛𝑝 1 βˆ’ 𝑝 = 2.5 [not always so accurate at such low 𝑛!]

If π‘Œ ∼ 𝑁(πœ‡, 𝜎 2 ) what is the best approximation for 𝑃(3 or more heads when tossing a coin 10 times)? i.e. If 𝑋 ∼ 𝐡(10,0.5), πœ‡ = 5, 𝜎 2 = 𝑛𝑝 1 βˆ’ 𝑝 = 2.5, what is the best approximation for 𝑃(𝑋 β‰₯ 3)?

1. 𝑃(π‘Œ > 2.5) 2. 𝑃(π‘Œ > 3) 3. 𝑃(π‘Œ > 3.5)

60%

27% 13%

1

2

3

Quality control example: The manufacturing of computer chips produces 10% defective chips. 200 chips are randomly selected from a large production batch. What is the probability that fewer than 15 are defective? Answer:

mean 𝑛𝑝 = 200 Γ— 0.1 = 20 variance 𝑛𝑝 1 βˆ’ 𝑝 = 200 Γ— 0.1 Γ— 0.9 = 18.

So if 𝑋 is the number of defective chips, approximately 𝑋 ∼ 𝑁 20,18 . Hence 𝑃 𝑋 < 15 β‰ˆ 𝑃 𝑍 <

14.5 βˆ’ 20 18

= 𝑃 𝑍 < βˆ’1.296 = 1 βˆ’ 𝑃 𝑍 < 1.296

= 1 βˆ’ [0.9015 + 0.6 Γ— 0.9032 βˆ’ 0.9015 ] β‰ˆ 0.097 𝑛 π‘˜ π‘›βˆ’π‘˜ β‰ˆ 0.093. The Binomial This compares to the exact Binomial answer 14 π‘˜=0 πΆπ‘˜ 𝑝 1 βˆ’ 𝑝 answer is easy to calculate on a computer, but the Normal approximation is much easier if you have to do it by hand. The Normal approximation is about right, but not accurate.

View more...

Comments

Copyright οΏ½ 2017 NANOPDF Inc.
SUPPORT NANOPDF