We are a sharing community. So please help us by uploading **1** new document or like us to download:

OR LIKE TO DOWNLOAD IMMEDIATELY

Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough?

February 2014 Dale Whittington, Ph.D. – Shaker Heights Russ Brown, Ph.D – CMSD Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.

Setting Standards for Performance

Licensing Tests (e.g., Pharmacists)

One Test and One Standard for Performance Pass or Fail

State Accountability Testing (e.g., Ohio OAA)

One Test, Multiple Standards Below Basic, Basic, Proficient, Advanced

CAP Foundation Science and Social Studies Assessments

Two Tests, One Standard Evaluating Growth (How Much?)

Looking at Performance Standards Content-Based Standards

Goal of standard setting is to determine a level of knowledge and skill judged to be appropriate for test purpose

Growth-Based Standards

Goal of standard setting is to use common statistical feature(s) of the data to set a criteria for acceptable performance

Three Statistic-Based Approaches for Evaluating Growth of Student Scores

•

Using Effect Size

•

Using The Score Distribution

•

Using the Standard Error of Measurement

Describing and Comparing Approaches

•

Data Points Needed

•

Calculations Required

•

Outcomes Using a Common Set of Student Data

•

Advantages and Disadvantages

The Common Data Set

The Common Data Set

The Common Data Set

The Common Data Set

The Common Data Set

Shaker Heights Schools Effect Size for SLO’s and Growth

Prepared by Dale Whittington Shaker Heights City School District Ohio Middle Level Annual Conference Columbus, Ohio February 21, 2014

What is effect size? • In an educational setting, effect size is one way to measure the effectiveness of a particular intervention. • Effect size enables us to measure both the improvement (gain) in learner achievement for a group of learners AND at the same time, take into account the variation of student performance.

Adapted from Understanding, using and calculating effect size, Govt of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

Practical Advantages • • • •

Easy to calculate Easy to understand; makes intuitive sense Adaptable to different kinds of assessments Adaptable to different kinds of ways of considering growth and goals for SLO’s: – Shared attribution across the district – Shared attribution within a school – Attribution for a specific teacher or group of students

So how do you calculate effect sizes for SLO’s or growth? Start with a set of pretest scores and posttest scores for the same students

Calculate the difference between the pretest & posttest for each student

Student

Pretest

Posttest

Denis

40

35

-5

Donna

25

30

+5

Dale

45

50

+5

Russ

30

40

+10

Difference (AKA Gain)

Calculations Continued Calculate the means and standard deviations for both tests

Average the Standard Deviations

• Pretest

• The average of 9.1 and 8.5 is 8.8

– Mean: 35.0 – SD: 9.1

• Posttest – Mean: 38.8 – SD: 8.5

How to adapt • If your pretest and posttest are different lengths, convert to a similar scale, like percentages. • Think about who you are basing your analysis on and use that to decide what standard deviation (SD) to use – – – –

Common attribution for district: District SD Common attribution for school: School SD Class: Class SD Specific group, such as economically disadvantaged: the group’s SD

Use the average standard deviation and the gains to calculate the effect size: Effect Size=Gain/SD Student

Pretest

Posttest

Gain

Effect

Denis

40

35

-5

-.57

Donna

25

30

+5

+.57

Dale

45

50

+5

+.57

Russ

30

40

+10

+1.14

Interpret your results: Common criteria Cohen (1969)

•

‘Small’ (.2) o o

•

‘Medium’ (.5) o o

•

real, but difficult to detect difference between the heights of 15 year old and 16 year old girls in the US

‘large enough to be visible to the naked eye’ difference between the heights of 14 & 18 year old girls

‘Large’ (.8) o o

‘grossly perceptible and therefore large’ difference between the heights of 13 & 18 year old girls

Hattie: “For students moving from one year to the next, the average effect size across all students is 0.40.”

How results differ, depending on attribution and how you tier students

Another Example based on OAA

Resources • Understanding, using and calculating effect size. Government of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

• Review Methods/Interpreting Effect Sizes. JHU: Best Evidence Encyclopedia. http://www.bestevidence.org/methods/effectsize.htm

• Calculating an effect size: a practical guide. Visible Learning Plus. http://visiblelearningplus.com/faqs/calculating-effect-size-practical-guide

Establishing Growth Targets with Limited Data

Prepared by Russ Brown, Ph.D – CMSD

Overview • Design Principles for Student Growth Model work • The PROBLEM! • An Idea for a Solution • Strengths/Weaknesses

Guiding Principles 1. Equity - like measures for like teachers, like expectations for like students. 2. Simplicity - Parsimony and transparency are critical. 3. Continuous improvement will be critical – It simply will not be perfect on the first try!

The PROBLEM How much growth is enough? How do you estimate this when you don’t know the relationship between the two tests?

?

What do we know? 1. Basic information about the distribution of scores. Time

Mean

SD

Pretest

24.28

9.6

2. The relative position of each student on the distribution. Can we leverage this to set targets?

The Idea 1. Devoid of any way to estimate what growth “should be”… 2. Students of like ability (ie., same pretest scores) would typically be expected to make comparable growth over time. 3. Use Normal Curve Equivalents as a means to establish targets and relative growth.

How 1. Translate Pre-Test scores to NCEs Class

Pretest

Class1 Class1 Class1

8.0 9.0 9.0

PreMean 24.3 24.3 24.3

SD

Pre-Z

9.6 9.6 9.6

-1.7 -1.6 -1.6

PreNCE 14.2 16.4 16.4

Z= (Pretest Score - Mean Pretest Score) Standard Deviation of the Pretest NCE = (Z x 21.063)+ 50 (1-99 Interval)

Outcomes – What Threshold? Calculating whether the goal is obtained: PreClass NCE Stu 1 Stu 2 Stu 3 Stu 4

14.2 16.4 16.4 18.6

PostNCE Stringency of Goal NCE Change 0 -5 -7.5 3.2 -11.1 No No No 7.6 -8.9 No No No 9.0 -7.4 No No Yes 11.9 -6.7 No No Yes

• Must make a judgment about the stringency of the goal/calculation

Outcomes – What Performance Level?

Percent of students achieving the Goal

Teacher Growth Rating

Translation

90- 100% 80-89% 70-79% 60-69% 0-59%

5 4 3 2 1

Above Met Met Met Below

Outcomes – What Performance Level?

Group Class 1 Class 2 Class 3 Class 4

Percent of Students Reaching the Goal 0 -5 -7.5 1- 12.0% 1- 20.0% 1- 52.0% 4- 84.0% 5- 92.0% 5- 92.0% 1- 44.0% 1- 52.0% 2- 60.0% 1- 44.0% 1- 56.0% 2- 64.0%

Mean Gain 24.04 37.88 34.44 34.76

• Not surprisingly – outcomes vary by the stringency of the expectation…

Outcomes – Quick Comparison

Group Class 1 Class 2 Class 3 Class 4

Percent of Students Reaching the Goal 0 -5 -7.5 1- 12.0% 1- 20.0% 1- 52.0% 4- 84.0% 5- 92.0% 5- 92.0% 1- 44.0% 1- 52.0% 2- 60.0% 1- 44.0% 1- 56.0% 2- 64.0%

Mean Gain 24.04 37.88 34.44 34.76

Percent of Students Reaching the Goal (SEM) Group Class 1 Class 2 Class 3 Class 4

3 SE 1- 44% 5- 96% 1- 52% 1- 48%

2 SE 4 -88% 5- 100% 1- 56% 2 – 60%

1 SE 5- 100% 5- 100% 3- 76% 3- 76%

Mean Gain 24.04 37.88 34.44 34.76

Outcomes – What about Real Data?

Applied to 3rd Grade OAA (Fall to Spring): Percent of students achieving the Goal

Building Growth Rating

Translation

IRN Count

90- 100% 60-89% 0-59%

5 2-4 1

Above Met Below

0 37 36

Outcomes – What about Real Data?

Applied to 4th Grade Benchmark to OAA (Fall to Spring): Percent of students achieving the Goal

Building Growth Rating

Translation

IRN Count

Mean Value Add Index

90- 100% 60-89% 0-59%

5 2-4 1

Above Met Below

2 50 13

1.96 -.68 -1.56

Pros and Cons + Students with like scores have like expectations for growth + Relatively simple and relatively transparent - Must make a value judgment about the amount of error for which one wishes to compensate (not so transparent) - More adjustment = more bias at the bottom

Standard Error of Measurement All scores have a “true” score and “error” • Error bands on score reports Standard Error quantifies degree of “error” in a test score Formula is: Standard Error of Measurement = Values needed: Mean, Standard Deviation, Reliability of the Test Assumptions that underlie this approach

Steps 1) For a set of data, calculate the mean and standard deviation 2) Calculate the reliability of the test 3) Use the formula to determine the Standard Error of Measurement (class level, school level) 4) Set a level for the growth standard (1 se, 2 se, etc.) 5) Add chosen level of standard error to raw score 6) Convert (raw score + standard error) to percent correct on pretest 7) Find corresponding percent correct/raw score on posttest (Note: Assumptions here not required once IRT equating is completed) 8) Compare actual student posttest score with target score 9) At or above target score = “Acceptable Progress”

Calculations for one student

Results

Observations

High pretest scores can lead to out-of-range posttest score targets. Any modification to the sample that increases the Standard Deviation will increase the value of the Standard Error and therefore require more score growth to reach the target.

View more...
February 2014 Dale Whittington, Ph.D. – Shaker Heights Russ Brown, Ph.D – CMSD Denis Jarvinen, Ph.D. – Strategic Measurement and Evaluation, Inc.

Setting Standards for Performance

Licensing Tests (e.g., Pharmacists)

One Test and One Standard for Performance Pass or Fail

State Accountability Testing (e.g., Ohio OAA)

One Test, Multiple Standards Below Basic, Basic, Proficient, Advanced

CAP Foundation Science and Social Studies Assessments

Two Tests, One Standard Evaluating Growth (How Much?)

Looking at Performance Standards Content-Based Standards

Goal of standard setting is to determine a level of knowledge and skill judged to be appropriate for test purpose

Growth-Based Standards

Goal of standard setting is to use common statistical feature(s) of the data to set a criteria for acceptable performance

Three Statistic-Based Approaches for Evaluating Growth of Student Scores

•

Using Effect Size

•

Using The Score Distribution

•

Using the Standard Error of Measurement

Describing and Comparing Approaches

•

Data Points Needed

•

Calculations Required

•

Outcomes Using a Common Set of Student Data

•

Advantages and Disadvantages

The Common Data Set

The Common Data Set

The Common Data Set

The Common Data Set

The Common Data Set

Shaker Heights Schools Effect Size for SLO’s and Growth

Prepared by Dale Whittington Shaker Heights City School District Ohio Middle Level Annual Conference Columbus, Ohio February 21, 2014

What is effect size? • In an educational setting, effect size is one way to measure the effectiveness of a particular intervention. • Effect size enables us to measure both the improvement (gain) in learner achievement for a group of learners AND at the same time, take into account the variation of student performance.

Adapted from Understanding, using and calculating effect size, Govt of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

Practical Advantages • • • •

Easy to calculate Easy to understand; makes intuitive sense Adaptable to different kinds of assessments Adaptable to different kinds of ways of considering growth and goals for SLO’s: – Shared attribution across the district – Shared attribution within a school – Attribution for a specific teacher or group of students

So how do you calculate effect sizes for SLO’s or growth? Start with a set of pretest scores and posttest scores for the same students

Calculate the difference between the pretest & posttest for each student

Student

Pretest

Posttest

Denis

40

35

-5

Donna

25

30

+5

Dale

45

50

+5

Russ

30

40

+10

Difference (AKA Gain)

Calculations Continued Calculate the means and standard deviations for both tests

Average the Standard Deviations

• Pretest

• The average of 9.1 and 8.5 is 8.8

– Mean: 35.0 – SD: 9.1

• Posttest – Mean: 38.8 – SD: 8.5

How to adapt • If your pretest and posttest are different lengths, convert to a similar scale, like percentages. • Think about who you are basing your analysis on and use that to decide what standard deviation (SD) to use – – – –

Common attribution for district: District SD Common attribution for school: School SD Class: Class SD Specific group, such as economically disadvantaged: the group’s SD

Use the average standard deviation and the gains to calculate the effect size: Effect Size=Gain/SD Student

Pretest

Posttest

Gain

Effect

Denis

40

35

-5

-.57

Donna

25

30

+5

+.57

Dale

45

50

+5

+.57

Russ

30

40

+10

+1.14

Interpret your results: Common criteria Cohen (1969)

•

‘Small’ (.2) o o

•

‘Medium’ (.5) o o

•

real, but difficult to detect difference between the heights of 15 year old and 16 year old girls in the US

‘large enough to be visible to the naked eye’ difference between the heights of 14 & 18 year old girls

‘Large’ (.8) o o

‘grossly perceptible and therefore large’ difference between the heights of 13 & 18 year old girls

Hattie: “For students moving from one year to the next, the average effect size across all students is 0.40.”

How results differ, depending on attribution and how you tier students

Another Example based on OAA

Resources • Understanding, using and calculating effect size. Government of South Australia, Department of Education & Child Development, http://www.decd.sa.gov.au/quality/files/links/WhatIsEffectSize.pdf

• Review Methods/Interpreting Effect Sizes. JHU: Best Evidence Encyclopedia. http://www.bestevidence.org/methods/effectsize.htm

• Calculating an effect size: a practical guide. Visible Learning Plus. http://visiblelearningplus.com/faqs/calculating-effect-size-practical-guide

Establishing Growth Targets with Limited Data

Prepared by Russ Brown, Ph.D – CMSD

Overview • Design Principles for Student Growth Model work • The PROBLEM! • An Idea for a Solution • Strengths/Weaknesses

Guiding Principles 1. Equity - like measures for like teachers, like expectations for like students. 2. Simplicity - Parsimony and transparency are critical. 3. Continuous improvement will be critical – It simply will not be perfect on the first try!

The PROBLEM How much growth is enough? How do you estimate this when you don’t know the relationship between the two tests?

?

What do we know? 1. Basic information about the distribution of scores. Time

Mean

SD

Pretest

24.28

9.6

2. The relative position of each student on the distribution. Can we leverage this to set targets?

The Idea 1. Devoid of any way to estimate what growth “should be”… 2. Students of like ability (ie., same pretest scores) would typically be expected to make comparable growth over time. 3. Use Normal Curve Equivalents as a means to establish targets and relative growth.

How 1. Translate Pre-Test scores to NCEs Class

Pretest

Class1 Class1 Class1

8.0 9.0 9.0

PreMean 24.3 24.3 24.3

SD

Pre-Z

9.6 9.6 9.6

-1.7 -1.6 -1.6

PreNCE 14.2 16.4 16.4

Z= (Pretest Score - Mean Pretest Score) Standard Deviation of the Pretest NCE = (Z x 21.063)+ 50 (1-99 Interval)

Outcomes – What Threshold? Calculating whether the goal is obtained: PreClass NCE Stu 1 Stu 2 Stu 3 Stu 4

14.2 16.4 16.4 18.6

PostNCE Stringency of Goal NCE Change 0 -5 -7.5 3.2 -11.1 No No No 7.6 -8.9 No No No 9.0 -7.4 No No Yes 11.9 -6.7 No No Yes

• Must make a judgment about the stringency of the goal/calculation

Outcomes – What Performance Level?

Percent of students achieving the Goal

Teacher Growth Rating

Translation

90- 100% 80-89% 70-79% 60-69% 0-59%

5 4 3 2 1

Above Met Met Met Below

Outcomes – What Performance Level?

Group Class 1 Class 2 Class 3 Class 4

Percent of Students Reaching the Goal 0 -5 -7.5 1- 12.0% 1- 20.0% 1- 52.0% 4- 84.0% 5- 92.0% 5- 92.0% 1- 44.0% 1- 52.0% 2- 60.0% 1- 44.0% 1- 56.0% 2- 64.0%

Mean Gain 24.04 37.88 34.44 34.76

• Not surprisingly – outcomes vary by the stringency of the expectation…

Outcomes – Quick Comparison

Group Class 1 Class 2 Class 3 Class 4

Percent of Students Reaching the Goal 0 -5 -7.5 1- 12.0% 1- 20.0% 1- 52.0% 4- 84.0% 5- 92.0% 5- 92.0% 1- 44.0% 1- 52.0% 2- 60.0% 1- 44.0% 1- 56.0% 2- 64.0%

Mean Gain 24.04 37.88 34.44 34.76

Percent of Students Reaching the Goal (SEM) Group Class 1 Class 2 Class 3 Class 4

3 SE 1- 44% 5- 96% 1- 52% 1- 48%

2 SE 4 -88% 5- 100% 1- 56% 2 – 60%

1 SE 5- 100% 5- 100% 3- 76% 3- 76%

Mean Gain 24.04 37.88 34.44 34.76

Outcomes – What about Real Data?

Applied to 3rd Grade OAA (Fall to Spring): Percent of students achieving the Goal

Building Growth Rating

Translation

IRN Count

90- 100% 60-89% 0-59%

5 2-4 1

Above Met Below

0 37 36

Outcomes – What about Real Data?

Applied to 4th Grade Benchmark to OAA (Fall to Spring): Percent of students achieving the Goal

Building Growth Rating

Translation

IRN Count

Mean Value Add Index

90- 100% 60-89% 0-59%

5 2-4 1

Above Met Below

2 50 13

1.96 -.68 -1.56

Pros and Cons + Students with like scores have like expectations for growth + Relatively simple and relatively transparent - Must make a value judgment about the amount of error for which one wishes to compensate (not so transparent) - More adjustment = more bias at the bottom

Standard Error of Measurement All scores have a “true” score and “error” • Error bands on score reports Standard Error quantifies degree of “error” in a test score Formula is: Standard Error of Measurement = Values needed: Mean, Standard Deviation, Reliability of the Test Assumptions that underlie this approach

Steps 1) For a set of data, calculate the mean and standard deviation 2) Calculate the reliability of the test 3) Use the formula to determine the Standard Error of Measurement (class level, school level) 4) Set a level for the growth standard (1 se, 2 se, etc.) 5) Add chosen level of standard error to raw score 6) Convert (raw score + standard error) to percent correct on pretest 7) Find corresponding percent correct/raw score on posttest (Note: Assumptions here not required once IRT equating is completed) 8) Compare actual student posttest score with target score 9) At or above target score = “Acceptable Progress”

Calculations for one student

Results

Observations

High pretest scores can lead to out-of-range posttest score targets. Any modification to the sample that increases the Standard Deviation will increase the value of the Standard Error and therefore require more score growth to reach the target.

We are a sharing community. So please help us by uploading **1** new document or like us to download:

OR LIKE TO DOWNLOAD IMMEDIATELY

Thank you for interesting in our services. We are a non-profit group that run this website to share documents. We need your help to maintenance this website.

To keep our site running, we need your help to cover our server cost (about $400/m), a small donation will help us a lot.