DDM Part II Analyzing the Results Dr. Deborah Brady
Agenda
Overview of how to measure growth in 4 “common sense” ways Quick look at “standardization” Not all analyses are statistical or new We’ll use familiar ways of looking at student work Excel might help when you have a whole grade’s scores, but it is not essential
Time for your questions; exit slips
My email
[email protected];
PowerPoint and handouts at http://tinyurl.com/k23opk6
2 Considerations Local DDMs,” 1. Comparable across schools Example: Where
possible, measures are identical
Easier Do
Teachers with the same job (e.g., all 5th grade teachers)
to compare identical measures
identical measures provide meaningful information about all students?
Exceptions: When might assessments not be identical? Different
content (different sections of Algebra I)
Differences
in untested skills (reading and writing on math test for ELL
students) Other
accommodations (fewer questions to students who need more time)
NOTE: Roster Verification and Group Size will be considerations by DESE 3
2. Comparable across the District
Aligned to your curriculum (comparable content) K-12 in all disciplines
Appropriate for your students
Aligned to your district’s content
Informative, useful to teachers and administrators
“Substantial” Assessments (comparable rigor):
“Substantial” units with multiple standards and/or concepts assessed. (DESE began talking about finals/midterms as preferable recently) See Core Curriculum Objectives (CCOs) on DESE website if you are concerned http://www.doe.mass.edu/edeval/ddm/example
/
Quarterly, benchmarks, mid-terms, and common end of year exams
NOTE: All of this data stays in your district. Only HML goes to DESE with a MEPID for each educator.
Examples of 4 +1 Methods for Calculating Growth Each is in handout
Pre-post
test
Repeated Holistic
Post A
measures
Rubric (Analytical Rubric)
test only
look at “standardization” with percentiles
Typical Gradebook and Distribution Page 1 of handout
Alphabetical order (random)
Sorted low to high
Determine “cut scores” (validate in the student work)
Use “Stoplight Method” to help see cut scores
Graph of distribution of all scores
Graph of distribution of High, Moderate, Low scores
Random 90 76 92 72 80 98 91 75 60 52 76 77 96 61 63 78 79 95 80 85 86 84 65
Sorted 52 60 61 63 65 72 75 76 76 77 78 79 80 80 84 85 86 90 91 92 95 96 98
Distribution of whole class all scores, low to high
120 100 80 60
“Cut” Scores and “common sense”: validate them with performances.
What work is not moving at an average rate?
40 20 0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
14 12
What work shows accelerated growth?
10
8 6 4 2 0
Some benchmarks have determined rates of growth over time
High
Mod
Low
High, Moderate, Low Distribution
High Count Mod Count Low Count
6 12
5
Pre/Post Test
Description: The
same or similar assessments administered at the beginning and at the end of the course or year
Example:
Grade 10 ELA writing assessment aligned to College and Career Readiness Standards at beginning and end of year
Measuring Growth: Difference
between pre- and post-test.
Check
if all students have an equal chance of demonstrating growth 8
Pre- Post Tests Pre-test Lowest to highest
Post test
Difference (Growth) Post minus pre
Analysis Range of growth Pre/post
Cut score? Look at work. Look at distribution.
%age growth Difference %age based on diff/pre
20
35
15
15
75%
25
30
5
5
20%
30
50
20
20
67%
35
60
25
25
42%
35
60
25
25
42%
40
70
35
35
87%
40
65
25
25
62%
50
75
25
25
50%
50
80
30
30
60%
50
85
35
35
70%
How many L/M/H?
5
3 2
LOW
MODERATE
HIGH
Holistic
Description: Assess growth across student work collected throughout the year. Example: Tennessee Arts Growth Measure System Measuring Growth: Growth Rubric (see example) Considerations: Option for multifaceted performance assessments Rating can be challenging & time consuming 10
Holistic Example (unusual rubric) 1 No improvement in the level of detail.
Modest improvement in the level of detail
One is true
One is true
* No new details across versions
* There are a few details included across all versions
* New details are added, but not included in future versions. * A few new details are added that are not relevant, accurate or meaningful Details
2
* There are many added details are included, but they are not included consistently, or none are improved or elaborated upon. * There are many added details, but several are not relevant, accurate or meaningful
3
4
Considerable Improvement in the level of detail
Outstanding Improvement in the level of detail
All are true
All are true
* There are many examples of added details across all versions,
* On average there are multiple details added across every version
* At least one example of a detail that is improved or elaborated in future versions *Details are consistently included in future versions
* There are multiple examples of details that build and elaborate on previous versions * The added details reflect the most relevant and meaningful additions
*The added details reflect relevant and meaningful additions
Example taken from Austin, a first grader from Anser Charter School in Boise, Idaho. Used with permission from Expeditionary Learning. Learn 11 more about this and other examples at 11 http://elschools.org/student-work/butterfly-drafts
HOLISTIC Easier for Large-Scale Assessments like MCAS Rubric Topic or Conventions and useful when categories overlap Criteria In one cell
Advanced
Proficient
NI
At Risk
1)Insightful, accurate, carefully developed claims and evidence. 2) Counterclaims are thoughtfully, accurately, completely discussed and argued. 3) Whole essay and each paragraph are carefully organized and show interrelationships among ideas. 4) Sentence structure, vocabulary, and mechanics show control over language use
Adequate Effective “Gets it”
Misconcep tions; some errors
Serious errors
Writing 1) 2) 3) 4)
Claims/evidence Counterclaims Organization Language/style
MCAS Has 2 Holistic Rubrics Topic/D evelop ment
Conven tions
6
5
4
4
5
6
Rich topic/idea development Careful, subtle organization Effective rich use of language
Full topic/idea development Logical organization Strong details Appropriate use of language
Moderate topic/idea development and organization Adequate, relevant details Some variety in language
Rudimentary topic/idea development and/or organization Basic supporting details Simplistic language
Limited or weak topic/idea development, organization, and/or details Limited awareness of audience and/or task
Little topic/idea development, organization, and/or details Little or no awareness of audience and/or task
Control of sentence structure, grammar, usage, and mechanics, (length and complexity of essay) provide opportunity for student to show control of standard English conventions)
Errors do not interfere with communication and/or Few errors relative to length of essay or complexity of sentence structure, grammar and usage, and mechanics
Errors interfere somewhat with communication and/or Too many errors relative to the length of the essay or complexity of sentence structure, grammar and usage, and mechanics
•Errors seriously interfere with communication AND •Little control of sentence structure, grammar and usage, and mechanics
Pre and Post Rubric (2 Criteria) Growth Add the scores Pretests Topic Conventio ns
Post tests Topic Convent ions
Differe ce
1/1
1/1
0/0
0
0
0
1 /2
2/2
1/0
1
1
100%
1/2
2/3
1/1
2
1
100%
2/3
3/3
1/0
1
2
50%
1= 2= 3= 4=
Analysis Add together criteria gains as raw score
In order
% of growth difference /pre
Rubrics do not represent percentages. A student who received a 1 would probably receive a 50. F? 50 F Seriously at risk range 60-72, 75? D to CAt risk 76-88, 89? C+ to B+ Average 90-100 A to A+ Above most
Holistic Rubric or Holistic Descriptor Keeping 1-4 scale distribution
Pre
Post
Difference
Rank order
Cut
0
1
+1
-1
-1
0
1
+1
0
0
0
1
+1
0
0
5
1
0
-1
1
1
4
1
1
0
1
1
1
1
0
1
1
1
3
+2
1
1
1
1
0
1
1
2
3
+1
2
2
7
6
3
2
1
0 low
mod
High
Converting Rubrics to Percentages Not recommended for classroom use because it distorts the meaning of the descriptors. May facilitate this large-scale use. District Decision Pre
Converted “grade”
Post
Converted “grade”
Difference
%age growth Difference/ pre
0
0
1
50
50
50%
0
0
1
50
50
50%
0
0
1
50
50
50%
1
50
0
0
-50
-50%
1
50
1
50
0
0
1
50
1
50
0
0
1
50
3
82
32
64%
1
50
1
50
0
0
2
65
3
82
17
26%
Common Sense analysis Was the assessment too difficult? Zeros in pretest (3) Zero growth Only 1 student improved
Change assessment scale? Look at all of the grade-level assessments. % conversion not helpful in this case?
Repeated Measures
Description: Multiple
assessments given throughout the year.
Example:
running records, attendance, mile
run
Measuring Growth: Graphically Ranging Less
from the sophisticated to simple
pressure on each administration.
Authentic
Tasks (reading aloud, running) 17
Repeated Measures
Description: Multiple assessments given throughout the year. Example: running records, attendance, mile run Measuring Growth: Graphically Ranging from the sophisticated to simple Considerations: Less pressure on each administration. Authentic Tasks 18
Repeated Measures Example Running Record Errors in Reading Average of high, moderate, and low error groups 70
Error Chart of Averages from each assessment
60 50 40 30 20 10 0 1
2
3
4
5
6
September Sept
Septe mber
Novem Januar March April ber y
June
65
48
13
30
15
15
Ra 68 63
30
35
20
22
18
10
65
65 32
22
10
12
5
2
1
30
30
28
24 22
20
19
22
Post test only AP exam: Use as baseline to show growth for each level or… for classroom
This assessment does not have a “normal curve”
An alternative for post test only for a classroom and to show student growth is to give a mock AP pre and post.
Post Test Only AP Exam Example 16 14 12 10 8 6 4
2 0
five
four
three
two
one
Looking for Variability Good
Problematic 200
150
150
# of students
# of students
200
100
50
100
50
0
0 Low
Moderate
High
Low
Moderate
High
The second graph is problematic because it doesn’t give us information about the difference between average and high growth because so many students fall into the “high” growth category. NOTE: Look at the work and make “common sense” decisions. 21 Consider the whole grade level; one class’s variation may be caused by teacher’s effectiveness Critical Question: Do all students have equal possibility for success?
“Standardizing” Local Norms Percentages versus Percentiles % within class/course %iles across all courses in district
Many Assessments with different standards
Student A
Student A
English: Math: Art: Social Studies: Science: Music:
15/20 22/25 116/150 6/10 70/150 35/35
“Standardized” Normal Curve
Percentage of 100% • • • • • •
English Math Art Social Studies Science Music
75% 88% 77% 60% 46% 100
Student A
English: Math: Art: Social Studies: 22 Science: Music:
62 %ile 72 %ile 59 %ile 71 %ile 70 %ile 61 %ile
Standardization In Everyday Terms
Standardization is a process of putting different measures on the same scale
For example Most
cars cost $25,000 give or take $5,000
Most
apples costs $1.50 give or take $.50
Getting
a $5000 discount on a car is about equal to what discount on an apple?
Technical terms
“Most are” = mean
“Give or take” = standard deviation
23
Percentile/Standard Deviation
Excel Functions Sort high to low or low to high, Graphing Function, Statistical Functions including Percentiles and Standard Deviation
Student
grades can be sorted from highest to lowest score with one command
Table
of student scores can be easily graphed with one command
Excel
will easily calculate %, but this is probably not necessary
“Common Sense”
The purpose of DDMs is to assess Teacher Impact
The student scores, the Low, Moderate, and High growth rankings are totally internal
DESE (in two years) will see MEPIDS L,
and
M or H next to a MEPID
The important part of this process needs to be the focus: Your
discussions about student learning with colleagues
Your
discussions about student learning with your evaluator
An
ongoing process