Borrowing the Strength of Unidimensional Scaling to Produce

January 12, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

Short Description

Download Borrowing the Strength of Unidimensional Scaling to Produce...

Description

Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectiveness Profiles P R E S E N T A T I O N A T T H E 1 2 TH A N N U A L MARYLAND ASSESSMENT CONFERENCE

COLLEGE PARK, MD OCTOBER 18, 2012 JOSEPH A. MARTINEAU JI ZENG

MICHIGAN DEPARTMENT OF EDUCATION

Background 2

 Prior research showing that using unidimensional measures of

multidimensional achievement constructs can distort value-added 

Martineau, J. A. (2006). Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Value-Added Accountability. Journal of Educational and Behavioral Statistics, 31(1), 35-62.



Construct irrelevant variance can become considerable in value-added measures when a construct is multidimensional, but is modeled in valueadded as unidimensional. Common misunderstanding is that if the multiple constructs are highly correlated, value-added should not be distorted. Correct understanding is that if value-added on the multiple constructs is highly correlated, value-added should not be distorted





Background 3

 Prior research showing that the choice of dimension/domain within

construct changes value-added significantly 

Lockwood, J.R et al. (2007). The Sensitivity of Value-Added Teacher Effect Estimates to Different Mathematics Achievement Measures. Journal of Educational Measurement, 44(1), 47-67.



Depending on choices made in value-added modeling, the correlation between teacher value-added on Procedures and Problem Solving ranged from 0.01 to 0.46. This gives a surprisingly low correlation in value-added that indicates that at least in this situation, one needs to be concerned about modeling valueadded in both dimensions rather than unidimensionally. Only work I am aware of to date that has inspected inter-construct valueadded correlations.





Background 4

 Prior research showing that commonly used factor analytic techniques

underestimate the number of dimensions in a multidimensional construct 

Zeng, J. (2010) . Development of a Hybrid Method for Dimensionality Identification Incorporating an Angle-Based Approach. Unpublished doctoral dissertation, University of Michigan.



Common dimensionality identifications procedures make the unwarranted assumption that all shared variance among indicator variables arise because the indicator variables measure the same construct (shared variance can also arise because the indicator variables are influenced by a common exogenous variable) Because of this unwarranted assumption, commonly used dimensionality identification techniques underestimate the number of dimensions in a data set.



Background 5

 Scaling constructs as multidimensional is a

difficult task 







Multidimensional Item Response Theory (MIRT) is timeconsuming and costly to run Replicating MIRT analyses can be challenging (there are multiple subjective decision points along the way) Identifying the number of dimensions in MIRT can be challenging Once the number of dimensions is identified, identifying which items load in which dimensions in MIRT can also be challenging 

The factor analysis techniques underlying MIRT are techniques for data reduction, not dimension identification

Background 6

 Short of resolving the considerable difficulties in

analytically identifying dimensions within a construct (and replicating such analyses), can another approach be used?  Propose using/trusting content experts’ identifications of dimensions within constructs (e.g., the divisions agreed upon by the writers of content standards) as the best currently available identification of dimensions, for example… 



Within English language proficiency, producing reading, writing, listening, and speaking scales. Within Mathematics, producing number & operations, algebra, geometry, measurement, and data analysis/statistics scales.

Background 7

 However, separately scaling each dimension can also be difficult and

costly compared to running a traditional unidimensional IRT calibration   

Confirmatory MIRT Bi-factor IRT model Separate unidimensional calibration and year-to-year equating of each dimension score

 Another option:   



Unidimensionally calibrate the total score Unidimensionally equate the total score from year to year Use (fixed) item parameters from the unidimensional calibration to create the multiple dimension scores as specified by content experts Use of this method needs to be investigated

 Practical necessity for Smarter Balanced Assessment Consortium

Purpose 8

 Investigate the feasibility and validity of relying on

unidimensional total score calibration as a basis for creating multidimensional profile scores…  

For reporting multidimensional student achievement scores For reporting multidimensional value-added measures

 Investigate the impact of separate versus fixed calibration

of multidimensional achievement scores in terms of impact on…  

Student achievement scores Value-added scores

 …as compared to the impact of other common decisions in

scaling, outcome selection, and value-added modeling

Methods 9

 Decisions Modeled in the Analyses 

Psychometric decisions 





Choice of outcome metric 



Choice of psychometric model  1-PL vs. 3-PL  PCM vs. GPCM Estimation of sub-scores  Separate calibration for each dimension vs. fixed calibration based on unidimensional parameters Which sub-score is modeled

Value-added modeling decisions  

Inclusion of demographics in models Number of pre-test covariates (for covariate adjustment models)

Methods 10

 Outcomes  Correlations in student achievement metrics compared across each psychometric choice and outcome choice  Correlations in value-added modeling compared across each choice  Classification consistency in value-added compared across each choice for 



Three-category classification decisions  Based on confidence intervals around point-estimates placing programs/schools into three categories: (1) above average, (2) statistically indistinguishable from the average, and (3) below average Four-category classification decisions  Based on sorting programs’/schools’ point estimates into quartiles, representing arbitrary cut points for classification

Methods 11

 Data  Michigan English Language Proficiency Assessment (ELPA)  Level III (Grades 3-5)  3391 students each with 3 measurement occasions (10,173 total scores)  Measures     

 

Total Reading Writing Listening Speaking

(domain) (domain) (domain) (domain)

Calibrated the ELPA as a unidimensional measure using both 1PL/Partial Credit Model and 3-PL/Generalized Partial Credit Model Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain

Methods 12

 Data    

Michigan Educational Assessment Program (MEAP) Mathematics Grades 7 and 8 (not on a vertical scale) Over 110,000 students per grade Measures Total  Number & Operations  Algebra 

 

(using items from the two domains) (domain) (domain)

Calibrated the MEAP Math tests as unidimensional measures using both 1-PL and 3-PL models Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain

Methods 13

 Value-added modeling the ELPA  3-level

HLM nesting test occasion within student within English language learner program to obtain program value-added  𝑦𝑖𝑗𝑘

= 𝜋0𝑗𝑘 + 𝜋1𝑗𝑘 𝑡𝑖𝑚𝑒𝑖𝑗𝑘 + 𝑒𝑖𝑗𝑘

 𝜋0𝑗𝑘

= 𝛽00𝑘 + β′0 𝐗𝑗𝑘 + 𝑟0𝑗𝑘

 𝜋1𝑗𝑘

= 𝛽10𝑘 + β′1 𝐗𝑗𝑘 + 𝑟1𝑗𝑘

 𝛽00𝑘

= 𝛾000 + 𝛄′00 𝐖𝑘 + 𝑢00𝑘 = 𝛾100 + 𝛄′10 𝐖𝑘 + 𝑢10𝑘

 𝛽10𝑘

Methods 14

 Value-added modeling the ELPA  VAMs

were run in a fully-crossed design with…

 All

outcomes (R, W, L, S)  PCM- and GPCM-calibrated outcomes  Fixed and separately calibrated outcomes  With and without demographics in the VAMs  32

real-data applications across design factors

Methods 15

 Value-added modeling MEAP mathematics  2-level

HLM covarying grade-8 outcomes on grade-7 outcomes with students nested within schools  𝑦𝑖𝑗𝑘

= 𝛽0𝑘 + 𝛽1𝑘 𝑦

 𝛽0𝑘

= 𝛾00 + 𝛄′0 𝐖𝑘 + 𝑢0𝑘 = 𝛾10 + 𝑢1𝑘 = 𝛾20 + 𝑢2𝑘

 𝛽1𝑘  𝛽2𝑘

𝑖−1 𝑗𝑘

+ 𝛽2𝑘 𝑧

𝑖−1 𝑗𝑘

+ 𝛃′𝐗𝑗𝑘 + 𝐞𝑗𝑘

Methods 16

 Value-added modeling MEAP mathematics  VAMs

were run in a fully-crossed design with…

 Both

outcomes (algebra and number & operations)  1-PL and 3-PL calibrated outcomes  Fixed and separately calibrated outcomes  With and without demographics  With either one or two pre-test covariates  32

real-data applications across design factors

Results 17

ELPA

Results: ELPA Student-Level Outcomes 18

 Correlations across fixed vs. separate calibrations

Model choice PCM

GPCM

Content Area Reading Writing Listening Speaking Reading Writing Listening Speaking

Correlation 0.997 0.995 0.997 1.000 0.997 0.997 0.994 1.000

Results: ELPA Student-Level Outcomes 19

 Correlations across model choice (PCM vs. GPCM)

Calibration choice Content Area Reading Writing Fixed Listening Speaking Reading Writing Separate Listening Speaking

Correlation 0.972 0.983 0.967 0.982 0.978 0.983 0.977 0.982

Results: ELPA Student-Level Outcomes 20

 Correlations across content areas Model choice

Calibration choice Fixed

PCM Separate

Fixed GPCM Separate

Content Area Reading Writing Listening Speaking Reading Writing Listening Speaking Reading Writing Listening Speaking Reading Writing Listening Speaking

Reading -

Content Area Writing Listening Speaking 0.636 0.627 0.371 0.537 0.385 0.368 0.622 0.626 0.373 0.519 0.375 0.365 0.655 0.662 0.402 0.559 0.407 0.405 0.639 0.648 0.395 0.543 0.400 0.394 -

Low to moderate inter-dimension correlations However, Rasch dimensionality analysis from WINSTEPS identified the total score as a unidimensional score

Results: ELPA Program District-Level Value-Added Outcomes 21

 Impact of fixed versus separate calibration Correlations

3-Category Consistency

4-Category Consistency

Content Area Reading Writing Listening Speaking

No Demos PCM GPCM 1.000 0.987 1.000 0.997 1.000 0.987 1.000 1.000

Content Area Reading Writing Listening Speaking

No Demos PCM GPCM 0.996 0.996 1.000 0.996 1.000 1.000 1.000 1.000

Content Area Reading Writing Listening Speaking

No Demos PCM GPCM 0.982 0.875 0.973 0.946 0.991 0.897 1.000 1.000

Demos PCM 1.000 1.000 1.000 1.000

GPCM 0.992 0.997 0.987 1.000

min max mean SD

0.987 1.000 0.997 0.005

GPCM 0.991 0.991 0.996 1.000

min max mean SD

0.991 1.000 0.998 0.003

GPCM 0.902 0.946 0.906 1.000

min max mean SD

0.875 1.000 0.961 0.043

Demos PCM 1.000 1.000 1.000 1.000 Demos PCM 0.982 0.982 0.991 1.000

Results: ELPA Program District-Level Value-Added Outcomes 22

 Correlations between Listening and Reading VA Reading

No Demos Demos

Listening

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.371 0.371 0.372 0.371 0.360 0.361 0.376 0.377 0.330 0.330 0.329 0.330 0.304 0.305 0.328 0.329

 Min = 0.228, Max = 0.397  Mean = 0.322, SD = 0.037

GPCM Fixed Separate 0.301 0.327 0.303 0.328 0.387 0.392 0.389 0.397 0.292 0.308 0.294 0.309 0.341 0.342 0.346 0.350

Demos PCM Fixed Separate 0.303 0.302 0.304 0.303 0.301 0.302 0.327 0.328 0.318 0.317 0.318 0.318 0.307 0.309 0.333 0.335

GPCM Fixed Separate 0.228 0.245 0.230 0.247 0.316 0.321 0.320 0.329 0.261 0.275 0.263 0.277 0.329 0.332 0.332 0.339

Results: ELPA Program District-Level Value-Added Outcomes 23

 Correlations between Listening and Writing VA Writing

No Demos Demos

Listening

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.358 0.359 0.359 0.360 0.403 0.403 0.368 0.368 0.362 0.362 0.363 0.364 0.395 0.395 0.364 0.364

 Min = 0.342, Max = 0.420  Mean = 0.373, SD = 0.019

GPCM Fixed Separate 0.369 0.366 0.370 0.367 0.420 0.412 0.383 0.376 0.373 0.371 0.374 0.372 0.410 0.405 0.378 0.373

Demos PCM Fixed Separate 0.342 0.343 0.343 0.344 0.385 0.385 0.354 0.355 0.361 0.362 0.362 0.363 0.397 0.397 0.365 0.365

GPCM Fixed Separate 0.353 0.353 0.354 0.354 0.401 0.396 0.370 0.364 0.372 0.371 0.374 0.372 0.412 0.407 0.379 0.374

Results: ELPA Program District-Level Value-Added Outcomes 24

 Correlations between Listening and Speaking VA Speaking

No Demos Demos

Listening

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.002 0.002 0.004 0.004 0.068 0.068 0.051 0.051 -0.005 -0.005 -0.004 -0.004 0.065 0.065 0.047 0.047

 Min = -0.005, Max = 0.108  Mean = 0.046, SD = 0.035

GPCM Fixed Separate 0.026 0.026 0.028 0.028 0.102 0.102 0.080 0.080 0.025 0.025 0.027 0.027 0.097 0.097 0.076 0.076

Demos PCM Fixed Separate 0.005 0.005 0.007 0.007 0.081 0.081 0.061 0.061 0.001 0.001 0.002 0.002 0.075 0.075 0.056 0.056

GPCM Fixed Separate 0.032 0.032 0.033 0.033 0.108 0.108 0.086 0.086 0.028 0.028 0.029 0.029 0.101 0.101 0.080 0.080

Results: ELPA Program District-Level Value-Added Outcomes 25

 Correlations between Reading and Writing VA Writing

No Demos Demos

Reading

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.389 0.390 0.392 0.393 0.466 0.464 0.455 0.454 0.365 0.365 0.369 0.369 0.453 0.450 0.440 0.438

 Min = 0.335, Max = 0.491  Mean = 0.412, SD = 0.047

GPCM Fixed Separate 0.393 0.386 0.396 0.389 0.480 0.466 0.468 0.455 0.370 0.365 0.374 0.369 0.465 0.454 0.452 0.442

Demos PCM Fixed Separate 0.335 0.336 0.338 0.339 0.442 0.440 0.420 0.419 0.374 0.374 0.379 0.379 0.478 0.476 0.464 0.462

GPCM Fixed Separate 0.341 0.338 0.344 0.341 0.455 0.443 0.432 0.422 0.379 0.372 0.384 0.377 0.491 0.477 0.476 0.461

Results: ELPA Program District-Level Value-Added Outcomes 26

 Correlations between Reading and Speaking VA Speaking

No Demos Demos

Reading

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.121 0.121 0.122 0.122 0.129 0.129 0.134 0.134 0.122 0.122 0.125 0.125 0.163 0.163 0.162 0.162

 Min = 0.121, Max = 0.205  Mean = 0.151, SD = 0.026

GPCM Fixed Separate 0.132 0.132 0.134 0.134 0.174 0.174 0.172 0.172 0.136 0.136 0.139 0.139 0.205 0.205 0.199 0.199

Demos PCM Fixed Separate 0.131 0.131 0.132 0.132 0.152 0.152 0.154 0.154 0.125 0.125 0.128 0.128 0.171 0.171 0.168 0.168

GPCM Fixed Separate 0.136 0.136 0.138 0.138 0.179 0.179 0.177 0.177 0.134 0.134 0.138 0.138 0.203 0.203 0.197 0.197

Results: ELPA Program District-Level Value-Added Outcomes 27

 Correlations between Speaking and Writing VA Writing

No Demos Demos

Speaking

No Demos

Fixed Separate Fixed GPCM Separate Fixed PCM Separate Fixed GPCM Separate PCM

PCM Fixed Separate 0.151 0.150 0.151 0.150 0.207 0.205 0.207 0.205 0.173 0.172 0.173 0.172 0.216 0.215 0.216 0.215

 Min = 0.150, Max = 0.246  Mean = 0.199, SD = 0.029

GPCM Fixed Separate 0.169 0.180 0.169 0.180 0.225 0.236 0.225 0.236 0.192 0.202 0.192 0.202 0.235 0.246 0.235 0.246

Demos PCM Fixed Separate 0.158 0.157 0.158 0.157 0.209 0.208 0.209 0.208 0.167 0.165 0.167 0.165 0.212 0.210 0.212 0.210

GPCM Fixed Separate 0.180 0.189 0.180 0.189 0.231 0.240 0.231 0.240 0.189 0.197 0.189 0.197 0.233 0.243 0.233 0.243

Results: ELPA Program District-Level Value-Added Outcomes 28

 Impact of choice of psychometric model Correlations

3-Category Consistency

4-Category Consistency

Content Area Reading Writing Listening Speaking

No Demos Fixed Sep 0.837 0.900 0.988 0.987 0.929 0.945 0.975 0.975

Demos Fixed Sep 0.834 0.887 0.988 0.986 0.942 0.955 0.980 0.980

min max mean SD

0.834 0.988 0.943 0.052

Content Area Reading Writing Listening Speaking

No Demos Fixed Sep 0.973 0.982 0.996 0.991 0.987 0.987 0.964 0.964

Demos Fixed Sep 0.978 0.987 0.996 0.996 0.982 0.987 0.969 0.969

min max mean SD

0.964 0.996 0.982 0.011

Content Area Reading Writing Listening Speaking

No Demos Fixed Sep 0.567 0.634 0.902 0.866 0.728 0.728 0.795 0.795

Demos Fixed Sep 0.580 0.634 0.920 0.893 0.768 0.754 0.839 0.839

min max mean SD

0.567 0.920 0.765 0.113

Results: ELPA Program District-Level Value-Added Outcomes 29

 Impact of Including/Not Including Demographics PCM Correlations

Content Area Reading Writing Listening Speaking

Fixed 0.915 0.978 0.982 0.993

GPCM Sep 0.915 0.978 0.982 0.993

Fixed 0.931 0.979 0.980 0.997

PCM 3-Category Consistency

Content Area Reading Writing Listening Speaking

Fixed 0.991 0.987 0.991 0.991

4-Category Consistency

Fixed 0.808 0.830 0.924 0.902

min max mean SD

0.915 0.997 0.969 0.030

Sep 0.982 0.973 0.982 0.996

min max mean SD

0.973 0.996 0.988 0.006

Sep 0.741 0.839 0.915 0.911

min max mean SD

0.741 0.924 0.859 0.060

GPCM Sep 0.987 0.987 0.991 0.991

Fixed 0.987 0.987 0.987 0.996

PCM Content Area Reading Writing Listening Speaking

Sep 0.922 0.982 0.981 0.997

GPCM Sep 0.817 0.821 0.911 0.902

Fixed 0.750 0.848 0.911 0.911

Results 30

MEAP Mathematics

Results: MEAP Math Student-Level Outcomes 31

 Correlations among variables based on psychometric

decisions

3-PL 1-PL 3-PL 1-PL

Number & Operations

Algebra

Grade 7 above diagonal/Grade 8 below Fixed Sep Fixed Sep Fixed Sep Fixed Sep

Algebra 1-PL Fixed Sep 1.000 1.000 0.900 0.901 0.891 0.893 0.684 0.685 0.684 0.685 0.670 0.671 0.667 0.668

3-PL Fixed Sep 0.943 0.941 0.943 0.941 0.996 0.984 0.677 0.666 0.676 0.665 0.691 0.682 0.688 0.679

Number & Operations 1-PL 3-PL Fixed Sep Fixed Sep 0.775 0.775 0.775 0.743 0.775 0.775 0.775 0.742 0.748 0.748 0.748 0.751 0.746 0.745 0.746 0.748 1.000 1.000 0.941 1.000 1.000 0.941 0.936 0.935 0.941 0.935 0.934 0.998 -

Results: MEAP Math Student-Level Outcomes 32

 Very high correlations based on fixed versus separate

calibrations

3-PL 1-PL 3-PL 1-PL

Number & Operations

Algebra

Grade 7 above diagonal/Grade 8 below Fixed Sep Fixed Sep Fixed Sep Fixed Sep

Algebra 1-PL Fixed Sep 1.000 1.000 0.900 0.901 0.891 0.893 0.684 0.685 0.684 0.685 0.670 0.671 0.667 0.668

3-PL Fixed Sep 0.943 0.941 0.943 0.941 0.996 0.984 0.677 0.666 0.676 0.665 0.691 0.682 0.688 0.679

Number & Operations 1-PL 3-PL Fixed Sep Fixed Sep 0.775 0.775 0.775 0.743 0.775 0.775 0.775 0.742 0.748 0.748 0.748 0.751 0.746 0.745 0.746 0.748 1.000 1.000 0.941 1.000 1.000 0.941 0.936 0.935 0.941 0.935 0.934 0.998 -

Results: MEAP Math Student-Level Outcomes 33

 Very high correlations based on fixed versus separate

calibrations

3-PL 1-PL 3-PL 1-PL

Number & Operations

Algebra

Grade 7 above diagonal/Grade 8 below Fixed Sep Fixed Sep Fixed Sep Fixed Sep

Algebra 1-PL Fixed Sep 1.000 1.000 0.900 0.901 0.891 0.893 0.684 0.685 0.684 0.685 0.670 0.671 0.667 0.668

3-PL Fixed Sep 0.943 0.941 0.943 0.941 0.996 0.984 0.677 0.666 0.676 0.665 0.691 0.682 0.688 0.679

Number & Operations 1-PL 3-PL Fixed Sep Fixed Sep 0.775 0.775 0.775 0.743 0.775 0.775 0.775 0.742 0.748 0.748 0.748 0.751 0.746 0.745 0.746 0.748 1.000 1.000 0.941 1.000 1.000 0.941 0.936 0.935 0.941 0.935 0.934 0.998 -

Results: MEAP Math Student-Level Outcomes 34

 Not as high correlations based on 1-PL versus 3-PL

calibrations

3-PL 1-PL 3-PL 1-PL

Number & Operations

Algebra

Grade 7 above diagonal/Grade 8 below Fixed Sep Fixed Sep Fixed Sep Fixed Sep

Algebra 1-PL Fixed Sep 1.000 1.000 0.900 0.901 0.891 0.893 0.684 0.685 0.684 0.685 0.670 0.671 0.667 0.668

3-PL Fixed Sep 0.943 0.941 0.943 0.941 0.996 0.984 0.677 0.666 0.676 0.665 0.691 0.682 0.688 0.679

Number & Operations 1-PL 3-PL Fixed Sep Fixed Sep 0.775 0.775 0.775 0.743 0.775 0.775 0.775 0.742 0.748 0.748 0.748 0.751 0.746 0.745 0.746 0.748 1.000 1.000 0.941 1.000 1.000 0.941 0.936 0.935 0.941 0.935 0.934 0.998 -

Results: MEAP Math Student-Level Outcomes 35

 Moderate to high correlations across dimensions

3-PL 1-PL 3-PL 1-PL

Number & Operations

Algebra

Grade 7 above diagonal/Grade 8 below Fixed Sep Fixed Sep Fixed Sep Fixed Sep

Algebra 1-PL Fixed Sep 1.000 1.000 0.900 0.901 0.891 0.893 0.684 0.685 0.684 0.685 0.670 0.671 0.667 0.668

3-PL Fixed Sep 0.943 0.941 0.943 0.941 0.996 0.984 0.677 0.666 0.676 0.665 0.691 0.682 0.688 0.679

Number & Operations 1-PL 3-PL Fixed Sep Fixed Sep 0.775 0.775 0.775 0.743 0.775 0.775 0.775 0.742 0.748 0.748 0.748 0.751 0.746 0.745 0.746 0.748 1.000 1.000 0.941 1.000 1.000 0.941 0.936 0.935 0.941 0.935 0.934 0.998 -

Results: MEAP Mathematics School-Level Value-Added Outcomes 36

Correlations

1 pre-test covariate No Demos Demos Content Area 1-PL 3-PL 1-PL 3-PL Algebra 1.000 0.995 1.000 0.992 Number & Operations 1.000 0.977 1.000 0.956

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 1.000 0.985 1.000 0.985 1.000 0.988 1.000 0.983

3-Cat Consistency

1 pre-test covariate No Demos Demos Content Area 1-PL 3-PL 1-PL 3-PL Algebra 0.989 0.968 0.987 0.973 Number & Operations 0.989 0.923 0.994 0.935

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 0.987 0.935 0.989 0.960 0.990 0.946 0.989 0.966

4-Cat Consistency

 Impact of fixed versus separate calibration

1 pre-test covariate No Demos Demos Content Area 1-PL 3-PL 1-PL 3-PL Algebra 0.995 0.926 0.993 0.883 Number & Operations 0.989 0.827 0.984 0.712

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 0.992 0.856 0.986 0.848 0.993 0.875 0.983 0.817

Results: MEAP Mathematics School-Level Value-Added Outcomes 37

Correlations

1 pre-test covariate Multidimensional No Demos Demos Calibration Type 1-PL 3-PL 1-PL 3-PL Fixed Parameter 0.548 0.608 0.361 0.391 Separate 0.549 0.649 0.366 0.436

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 0.652 0.697 0.574 0.609 0.653 0.711 0.576 0.614

3-Cat Consistency

1 pre-test covariate Multidimensional No Demos Demos Calibration Type 1-PL 3-PL 1-PL 3-PL Fixed Parameter 0.637 0.667 0.649 0.703 Separate 0.637 0.691 0.650 0.726

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 0.703 0.751 0.716 0.774 0.705 0.749 0.713 0.784

4-Cat Consistency

 Impact of choice of outcome (Algebra vs. Number)

1 pre-test covariate Multidimensional No Demos Demos Calibration Type 1-PL 3-PL 1-PL 3-PL Fixed Parameter 0.399 0.424 0.322 0.337 Separate 0.397 0.429 0.322 0.350

2 pre-test covariates No Demos Demos 1-PL 3-PL 1-PL 3-PL 0.447 0.475 0.404 0.412 0.444 0.484 0.405 0.436

Results: MEAP Mathematics School-Level Value-Added Outcomes 38

Correlations

1 pre-test covariate Multidimensional No Demos Demos Calibration Type Alg Num Alg Num Fixed Parameter 0.939 0.963 0.883 0.934 Separate 0.938 0.962 0.876 0.937

2 pre-test covariates No Demos Demos Alg Num Alg Num 0.918 0.961 0.925 0.962 0.925 0.962 0.873 0.938

3-Cat Consistency

1 pre-test covariate Multidimensional No Demos Demos Calibration Type Alg Num Alg Num Fixed Parameter 0.890 0.901 0.851 0.912 Separate 0.886 0.907 0.841 0.918

2 pre-test covariates No Demos Demos Alg Num Alg Num 0.867 0.921 0.837 0.915 0.876 0.918 0.839 0.915

4-Cat Consistency

 Impact of choice of psychometric model

1 pre-test covariate Multidimensional No Demos Demos Calibration Type Alg Num Alg Num Fixed Parameter 0.732 0.763 0.611 0.673 Separate 0.717 0.775 0.604 0.685

2 pre-test covariates No Demos Demos Alg Num Alg Num 0.679 0.773 0.602 0.677 0.701 0.770 0.610 0.670

Results: MEAP Mathematics School-Level Value-Added Outcomes 39

Correlations

1 pre-test covariate Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.964 0.815 0.813 0.717 Separate 0.962 0.819 0.806 0.780

2 pre-test covariates 1-PL 3-PL Alg Num Alg Num 0.984 0.822 0.895 0.775 0.983 0.825 0.877 0.793

3-Cat Consistency

1 pre-test covariate Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.880 0.772 0.771 0.713 Separate 0.875 0.767 0.774 0.724

2 pre-test covariates 1-PL 3-PL Alg Num Alg Num 0.928 0.774 0.841 0.771 0.927 0.775 0.831 0.756

4-Cat Consistency

 Impact of Including/Not Including Demographics

1 pre-test covariate Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.775 0.551 0.572 0.464 Separate 0.774 0.556 0.544 0.522

2 pre-test covariates 1-PL 3-PL Alg Num Alg Num 0.864 0.557 0.646 0.508 0.858 0.552 0.635 0.547

Results: MEAP Mathematics School-Level Value-Added Outcomes 40

Correlations

No Demographics Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.937 0.965 0.923 0.964 Separate 0.937 0.965 0.937 0.962

Includes Demographics 1-PL 3-PL Alg Num Alg Num 0.941 0.947 0.930 0.951 0.941 0.948 0.941 0.942

3-Cat Consistency

No Demographics Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.855 0.884 0.851 0.889 Separate 0.859 0.889 0.878 0.883

Includes Demographics 1-PL 3-PL Alg Num Alg Num 0.889 0.918 0.872 0.744 0.885 0.922 0.885 0.755

4-Cat Consistency

 Impact of covarying on one vs. two pre-test scores

No Demographics Multidimensional 1-PL 3-PL Calibration Type Alg Num Alg Num Fixed Parameter 0.734 0.764 0.696 0.753 Separate 0.729 0.768 0.727 0.754

Includes Demographics 1-PL 3-PL Alg Num Alg Num 0.715 0.687 0.704 0.713 0.716 0.693 0.714 0.698

Conclusions 41

 Practically important impacts on value-added

metrics and value-added classifications   

Choice of psychometric model Including/not including demographics Including/not including multiple pre-test values

 Prohibitive impacts on value-added metrics and

value-added classifications 

Choice of outcome (i.e., domain within construct)

 Practically negligible impacts on value-added metrics

and value-added classifications 

Separate versus fixed calibrations of domains within construct

Conclusions, continued… 42

 Need to pay attention to modeling domains within

constructs if constructs can reasonably be considered multidimensional 



Of the common psychometric and statistical modeling decisions one can make, the choice of which subscore to use as an outcome is the most influential Because subscores give different profiles of both student achievement and program/school value-added, each subscore should be modeled to the degree possible

 4-category (i.e., quartile) classifications on value-added

are appreciably impacted by every psychometric and statistical modeling choice evaluated here, but 3-category classifications are not  

Discourage more than three categories RTTT requires at least four categories

Conclusions, continued… 43

 3- vs. 4-category distinction is actually a proxy for  Statistical decision categories (3-categories)  Arbitrary cut point categories (4-categories)  Can leverage unidimensional calibrations of

multidimensional achievement scales to create multidimensional profiles of value-added 

Except where using four categories of classifications

Limitations 44

 Inductive reasoning  Results are likely to hold in similar circumstances  Still will need to investigate feasibility of using fixed parameters from unidimensional calibration for specific circumstances if those circumstances are high stakes  This is a proof of concept  PCM and GPCM models were run using different

software (WINSTEPS vs. PARSCALE)

Contact Information 45

 Joseph A. Martineau, Ph.D.  Executive Director  Bureau of Assessment & Accountability  Michigan Department of Education  [email protected]  Ji Zeng, Ph.D.  Psychometrician  Bureau of Assessment & Accountability  Michigan Department of Education  [email protected]

Borrowing the Strength of Unidimensional Scaling to Produce

Short Description

Description

Comments

We need your help!