# Statistics: A PowerPoint Presentation (Willis)

January 16, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

#### Description

Rivier University Education Division Specialist in Assessment of Intellectual Functioning (SAIF) Program ED 656, 657, 658, & 659 John O. Willis, Ed.D., SAIF 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

1

Statistics: Test Scores

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

2

One measurement is worth a thousand expert opinions.

— Donald Sutherland

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

3

We can measure the same thing with many different units.

4

We measure the same distances with many different units.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

5

Disability Rights Center Low Avenue

NH State House

Phenix Avenue

Main Street

0.1 miles 528 feet 176 yards 6,336 inches 161 meters 8 chains 3.11.13 Rivier Univ.

32 rods

6

We measure the same temperatures with many different units.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

7

ºC

ºF

100

212

37

0 -17.8 SAIF

Statistics

K 373.15

98.6

310.15

32

273.15

0

255.35

John O. Willis

8

Test authors and publishers feel compelled to do the same thing with test scores.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

9

Z scores

-4

-3

-2

-1

0

1

2

3

4

Standard

40

55

70

85

100

115

130

145

160

1

3

7

10

13

16

19

1

6

8

12

15

18

21

26

10

20

30

40

50

60

70

80

90

NCE

1

1

8

29

50

71

92

99

99

Percentile

0.1

0.1

2

16

50

84

98

99.9

99.9

Scaled V- Scale T

SCORES USED WITH THE TESTS When a new test is developed, it is normed on a sample of hundreds or thousands of people. The sample should be like that for a good opinion poll: female and male, urban and rural, different parts of the country, different income levels, etc. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

11

The scores from that norming sample are used as a yardstick for measuring the performance of people who then take the test. This human yardstick allows for the difficulty levels of different tests. The student is being compared to other students on both difficult and easy tasks. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

12

You can see from the illustration below that there are more scores in the middle than at the very high and low ends. Many different scoring systems are used, just as you can measure the same distance as 1 yard, 3 feet, 36 inches, 91.4 centimeters, 0.91 meter, or 1/1760 mile. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

13

1 There are 200 &s. Each && = 1%.

&

& &

&

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

& & &

Percent in each

2.2%

6.7%

16.1%

50%

16.1%

6.7%

2.2%

Standard Scores

– 69

70 – 79

80 – 89

90 – 110

111 – 120

121 – 130

131 –

Scaled Scores

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16 17 18 19

T Scores

– 29

30 – 36

37 – 42

43 – 56

57 – 63

64 – 70

71 –

Percentile Ranks

– 02 Very Low

03 – 08

09 – 24 Low Average

25 – 75 Average (90 – 110)

77 – 91

91 – 98 Superior (121 – 130)

98 –

WoodcockJohnson Classif. Stanines

Very Low - 73

Low Low 74 - 81

Below Average 82 - 88

Low Average 89 - 96

Average 97 - 103

High Average

(111 – 120) High Average

Above Average

104 - 111

112 - 118

High 119 - 126

&

Very Superior

(131 – ) Very High 127 -

Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley Custom Publishing, 1998, p. 27). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

14

PERCENTILE RANKS (PR) simply state the percent of persons in the norming sample who scored the same as or lower than the student. A percentile rank of 63 would be high average – as high as or higher than 63% and lower than the other 37% of the norming sample. It would be in Stanine 6. The middle 50% of examinees' scores fall between percentile ranks of 25 and 75. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

16

A percentile rank of 63 would mean that you scored as high as or higher than 63 percent of the people in the test’s norming sample  and lower than the other 37 percent . Never use the abbreviations “%ile” or “%.” Those abbreviations guarantee your reader will think you mean “percent correct,” which is an entirely different matter. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

17

Percentile ranks (PR) are not equal units. They are all scrunched up in the middle and spread out at the two ends. Therefore, percentile ranks cannot be added, subtracted, multiplied, divided, or – therefore – averaged (except for finding the median if you are into that sort of thing). 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

18

NORMAL CURVE EQUIVALENTS (NCE) were – like so many clear, simple, understandable things – invented by the government. NCEs are equal-interval standard scores cleverly designed to look like percentile ranks. With a mean of 50 and standard deviation of 21.06, they line up with percentile ranks at 1, 50, and 99, but nowhere else, because percentile ranks are not equal intervals. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

19

Percentile Ranks and Normal Curve Equivalents PR

1 10 20 30 40 50 60 70 80 90 99

NCE 1 23 33 39 45 50 55 61 67 77 99

PR

1

3

8 17 32 50 68 83 92 97 99

NCE 1 10 20 30 40 50 60 70 80 90 99

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

20

100 90 80 70 60 50 40 30 20 10 0

rubber band

PR

NCE

stick

1 10 20 30 40 50 60 70 80 90 99

21

A Normal Curve Equivalent of 57 would be in the 63rd percentile rank (Stanine 6). The middle 50% of examinees' Normal Curve Equivalent scores fall between 86 and 114.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

22

Because they are equal units, Normal Curve Equivalents can be added and subtracted, and most statisticians would probably let you multiply, divide, and average them.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

23

Z SCORES are the fundamental standard score. One z score equals one standard deviation. Although only a few tests (favored mostly by occupational therapists) report them, z scores are the basis for all other standard scores. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

24

Z SCORES have an average (mean) of 0.00 and a standard deviation of 1.00. A z score of +0.33 would be in the 63rd percentile rank, and it would be in Stanine 6. The middle 50% of examinees' z scores fall between -0.67 and +0.67. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

25

Wechsler-type STANDARD SCORES ("quotients" on some tests) have an average (mean) of 100 and a standard deviation of 15. A standard score of 105 would be in the 63rd percentile rank and in Stanine 6. The middle 50% of examinees' standard scores fall between 90 and 110. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

26

[Technically, any score defined by its mean and standard deviation is a “standard score,” but we usually (except, until recently, with tests published by Pro-Ed) use “standard score” for standard scores with mean = 100 and s.d. = 15.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

27

Wechsler-type SCALED SCORES ("standard scores“ [which they are] on some Pro-Ed tests) are standard scores with an average (mean) of 10 and a standard deviation of 3. A scaled score of 11 would be in the 63rd percentile rank and in Stanine 6. The middle 50% of students' standard scores fall between 8 and 12. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

28

V-SCALE SCORES have a mean of 15 and standard deviation of 3 (like Scaled Scores). A v-scale score of 15 would be in the 63rd percentile rank and in Stanine 6. The middle 50% of examnees' v-scale scores fall between 13 and 17. V-Scale Scores simply extend the ScaledScore range downward for the Vineland Adaptive Behavior Scales. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

29

T SCORES have an average (mean) of 50 and a standard deviation of 10. A T score of 53 would be in the 62nd percentile rank, Stanine 6. The middle 50% of examinees' T scores fall between approximately 43 and 57. [Remember: T scores, Scaled Scores, NCEs, and z scores are actually all standard scores.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

30

CEEB SCORES for the SATs, GREs, and other Educational Testing Service tests used to have an average (mean) of 500 and a standard deviation of 100. A CEEB score of 533 would have been in the 62nd percentile rank, Stanine 6. The middle 50% of examinees' CEEB scores used to fall between approximately 433 and 567. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

31

BRUININKS-OSERETSKY SUBTEST SCORES have an average (mean) of 15 and a standard deviation of 5. A Bruininks-Oseretsky score of 17 would be in the 66th percentile rank, Stanine 6. The middle 50% of examinees' scores fall between approximately 12 and 18. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

32

QUARTILES ordinarily divide scores into the lowest, antepenultimate, penultimate, and ultimate quarters (25%) of scores. However, they are sometimes modified in odd ways. DECILES divide scores into ten groups, each containing 10% of the scores. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

33

STANINES (standard nines) are a nine-point scoring system. Stanines 4, 5, and 6 are approximately the middle half (54%)* of scores, or average range. Stanines 1, 2, and 3 are approximately the lowest one fourth (23%). Stanines 7, 8, and 9 are approximately the highest one fourth (23%). _________________________

* But who’s counting?

34

Why do authors and publishers create and select all these different scores? 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

35

• Immortality. We still talk about “Wechsler-type standard scores” with a mean of 100 and standard deviation (s.d.) of 15. [Of course, Dr. Wechsler’s name has also gained some prominence from all the tests he published before and after his death in 1981.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

36

• Retaliation? I have always fantasized that the 1960 conversion of Stanford-Binet IQ scores to a mean of 100 and s.d. of 16 resulted from Wechsler’s grabbing market share from the 1937 Stanford-Binet with his 1939 Wechsler-Bellevue and 1949 WISC and other tests. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

37

My personal hypothesis was that when Wechsler’s deviation IQ (M = 100, s.d. = 15) proved to be such a popular improvement over the Binet ratio IQ (Mental Age/ Chronological Age x 100) (MA/CA x 100) there was no way the next Binet edition was going to use that score. [This idea is probably nonsense, but I like it.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

38

[Wechsler went with a deviation IQ based on the mean and s.d. because the old ratio IQ (MA/CA x 100) did not mean the same thing at different ages. For instance, an IQ of 110 might be at the 90th percentile at age 12, the 80th at age 10, and the 95th at age 14. The deviation IQ is the same at all ages.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

39

[The raw data from the Binet ratio IQ scores did show a mean of about 100 (mental age = chronological age) and a standard deviation, varying considerably from age to age, of something like 16 points, so both the Binet and the Wechsler choices were reasonable. However, picking just one would have made life a lot easier for evaluators from 1960 to 2003.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

40

In any case, the subtle difference between s.d. 15 and 16 (WISC 115 = Binet 116, WISC 85 = Binet 84, WISC 145 = Binet 148, etc.) plagued evaluators with the 1960/1972 and 1986 editions of the Binet. The 2003 edition finally switched to s.d. 15. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

41

• Matching the precision of the score to the precision of the measurement. Total or composite scores based on several subtests are usually sufficiently reliable and based on sufficient items to permit a fine-grained 15-point subdivision of each standard deviation (standard score). 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

42

It can be argued that a subtest with less reliability and fewer items should not be sliced so thin. There might be fewer than 15 items! A scaled score dividing each standard deviation into only 3 points would seem more appropriate, but there are consequently big jumps between scores on such scales. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

43

The Vineland Adaptive Behavior Scale v-scale extends the scaled score measurement downward another 5 points to differentiate among persons with very low ratings because the Vineland is often used with persons who obtain extremely low ratings. The v-scale helpfully subdivides the lowest 0.1% of ratings. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

44

T scores, dividing each standard deviation into 10 slices, are finer grained than scaled scores (3 slices), but not quite as narrow as standard scores (15). The Differential Ability Scales, Reynolds Intellectual Assessment Scales, and many personality and neuropsychological tests and inventories use T scores. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

45

Dr. Bill Lothrop often quotes Prof. Charles P. "Phil" Fogg: Gathering data with a rake and examining them under a microscope. Test scores may give the illusion of greater precision than the test actually provides. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

46

However, Kevin McGrew (http://www.iapsych.com/iapap 101/iap101brief5.pdf) warns us that wide-band scores, such as scaled scores, can be dangerously imprecise. For example a scaled score of 4 might be equivalent to a standard score of 68, 69, or 70 (the range usually associated with intellectual disability) or 71 or 72 (above that range). 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

47

That lack of precision can have severe consequences when comparing scores, tracking progress, and deciding whether a defendant is eligible for special education or for the death penalty (http://www.atkinsmrdeath penalty.com/). 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

48

The WJ III, KTEA-II, and WIAT-III, for example use standard scores with Mean 100 & SD 15 for both (sub)tests and composites. This practice does not seem to have caused any harm, even if it is unsettling to those of us who trained on the 1949 WISC and 1955 WAIS. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

49

• Sometimes test scores offer a special utility. The 1986 StanfordBinet Fourth Ed. (Thorndike, Hagen, & Sattler), used composite scores with M = 100 and s.d. = 16 and subtest scores with M = 50 and s.d. = 8.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

50

With that clever system, you could convert subtest scores to composite scores simply by doubling the subtest score. It was very handy for evaluators. Mentally converting 43 to 86 was much easier than mentally converting scaled score 7 or T score 40 to standard score 85. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

51

Sample Explanation for Evaluators Choosing to Translate all Test Scores into a Single, Rosetta Stone Classification Scheme [In addition to writing the following note in the report, remind the reader again in at least two subsequent footnotes. Readers will forget.] 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

52

“Throughout this report, for all of the tests, I am using the stanine labels shown below (Very Low, Low, Below Average, Low Average, Average, High Average, Above Average, High, and Very High), even if the particular test may have a different labeling system in its manual.” 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

53

Stanines &&&&&

There are 200 &s, so

Each && = 1 %

Stanine

Percentile Standard Score

&&&&&&

&&&&&&&

&&&&&&

&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

&&&&&&&

1

2

3

4

5

6

7

8

9

Very Low 4%

Low 7%

Below Average 12%

Low Average 17%

Average 20%

High Average 17%

Above Average 12%

High 7%

Very High 4%

1–4

4 - 11

11 - 23

23 - 40

40 – 60

60 – 77

77 - 89

89 - 96

96 -99

74 - 81

82 - 88

89 - 96

97 – 103

104 – 111

112- 118

119 - 126

127 -

- 73

Scaled Score

1 – 4

5

6

7

8

9

10

11

12

13

14

15

16 – 19

v-score

1 – 9

10

11

12

13

14

15

16

17

18

19

20

21 – 24

T Score

- 32

33 – 37

38 - 42

43 - 47

48 – 52

53 – 57

58 - 62

63 -67

68 -

Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley Custom Publishing, 1998, p. 26). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

54

Obviously, that explanation is for translating all scores into stanines. You would modify the explanation if you elected to translate all scores into a different classification scheme, such as that used with the Woodcock-Johnson III/NU. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

55

Sample Explanation for Evaluators Using the Rich Variety of Score Classifications Offered by the Several Publishers of the Tests Inflicted on the Innocent Examinee.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

56

“Throughout this report, for the various tests, I am using a variety of different statistics and different classification labels (e.g., Poor, Below Average, and High Average) provided by the test publishers. Please see p. i of the Appendix to this report for an explanation of the various classification schemes.” 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

57

There are 200 &s. Each && = 1%.

&

& &

&

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

& & &

Percent in each

2.2%

6.7%

16.1%

50%

16.1%

6.7%

2.2%

Standard Scores

– 69

70 – 79

80 – 89

90 – 109

110 – 119

120 – 129

130 –

Scaled Scores V-Scale Scores

1

2

3

1–8

T Scores

– 29

z-scores

< –2.00

Percentile Ranks Wechsler Classification DAS Classification WoodcockJohnson Classif. Pro-Ed Classification KTEA II Classification Vineland Adaptive Levels

– 02 Extremely Low Very Low Very Low Very Poor Lower Extreme Low – 70

4

5

6

7

8

9

10

11

9

10

11

12

13

14

15

16

30 – 36 –

2.00 – –1.34

03 – 08

37 – 42 –

1.33 – –0.68

09 – 24 Low Borderline Average Below Low Average Low Low Average Below Poor Average Below Average 70 – 84 Moderately Low 71 – 85

43 – 56 –

&

13 14 15 16 17 18 19 Standard 17 18 19 20 21 – 24 Score 110 57 – 62 63 – 69 70 – 12

0.67 – 0.66

0.67 – 1.32

1.33 – 1.99

2.00 –

25 – 74

75 – 90 High Average Above Average

91 – 97

98 – Very Superior Very High

Average Average Average (90 – 110) Average Average 85 – 115 Adequate 86 – 114

Superior High

(111 – 120)

Superior (121 – 130)

Above Average

Superior

High Average

Above Average 116 – 130 Moderately High 115 – 129

Very Superior

(131 – ) Very Superior

Upper Extreme High 130 –

Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley Custom Publishing, 1998, p. 27). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.

My score is 110! I am adequate, average, high average, or above average. I’m glad that much is clear!

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

59

There are 200 &s. Each && = 1%.

&

& &

&

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

& & &

Percent in each

2.2%

6.7%

16.1%

50%

16.1%

6.7%

2.2%

Standard Scores

– 69

70 – 79

80 – 89

90 – 109

110 – 119

120 – 129

130 –

Scaled Scores V-Scale Scores

1

2

3

1–8

T Scores

– 29

z-scores

< –2.00

BruininksOseretsky Percentile Ranks RIAS Classification Stanford-Binet Classification Leiter Classification Severe Delay = 30 – 39 WoodcockJohnson Classif. Pro-Ed Classification KTEA II Classification Vineland Adaptive Levels

&

4

5

6

7

8

9

10

11

12

13

14

15

16 17 18 19

9

10

11

12

13

14

15

16

17

18

19

20

21 – 24

30 – 36 –

2.00 – –1.34

37 – 42 –

1.33 – –0.68

43 – 56

57 – 62

63 – 69

70 –

0.67 – 0.66

0.67 – 1.32

1.33 – 1.99

2.00 –

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 – 02 Significantly Below Av.

03 – 08 Moderately Below Av.

09 – 24 Below Average

Moderately Impaired

Borderline

Low Average

40-54

Mildly Impaired

55-69

25 – 74 Average Average

75 – 90 Above Average

91 – 97 Moderately Above Av.

High Average

Superior

Very

ModLow/ erate Delay Mild 40-54 Delay 55-69

Very Low Very Poor Lower Extreme Low – 70

Low

Below Average

Low Average Below Poor Average Below Average 70 – 84 Moderately Low 71 – 85 Low

Average

Average (90 – 110) Average Average 85 – 115 Adequate 86 – 114

Above Average

High

(111 – 120)

Superior (121 – 130)

Above Average

Superior

High Average

Above Average 116 – 130 Moderately High 115 – 129

98 – Significantly Above Av. Gifted 130-144

(131 – ) Very Superior

Upper Extreme High 130 –

Adapted from Willis, J. O. & Dumont, R. P., Guide to identification of learning disabilities (1998 New York State ed.) (Acton, MA: Copley Custom Publishing, 1998, p. 27). Also available at http://alpha.fdu.edu/psychology/test_score_descriptions.htm.

Wechsler Classification DAS Classification RIAS Classification Stanford-Binet Classification Leiter Classification Severe Delay = 30 – 39

3.11.13 Rivier Univ.

WoodcockJohnson Classif. Pro-Ed Classification KTEA II Classification Vineland Adaptive Levels

Extremely Low Very Low Significantly Below Av.

Moderately Below Av.

Moderately Impaired

Borderline

40-54

Borderline Low

Mildly Impaired

55-69

Very

ModLow/ erate Delay Mild 40-54 Delay 55-69

Very Low Very Poor Lower Extreme Low – 70

Low

Low Poor Below Average 70 – 84 Moderately Low 71 – 85

61

PUBLISHER'S SCORING SYSTEM FOR THE WECHSLER SCALES

[These are not the student’s own scores, just the scoring systems for the tests.] When a new test is developed, it is normed on a sample of hundreds or thousands of people. The sample should be like that for a good opinion poll: female and male, urban and rural, different parts of the country, different income levels, etc. The scores from that norming sample are used as a yardstick for measuring the performance of people who then take the test. This human yardstick allows for the difficulty levels of different tests. The student is being compared to other students on both difficult and easy tasks. You can see from the illustration below that there are more scores in the middle than at the very high and low ends. Many different scoring systems are used, just as you can measure the same distance as 1 yard, 3 feet, 36 inches, 91.4 centimeters, 0.91 meter, or 1/1760 mile. PERCENTILE RANKS (PR) simply state the percent of persons in the norming sample who scored the same as or lower than the student. A percentile rank of 50 would be Average – as high as or higher than 50% and lower than the other 50% of the norming sample. The middle half of scores falls between percentile ranks of 25 and 75. STANDARD SCORES (called "quotients" on Pro-Ed tests) have an average (mean) of 100 and a standard deviation of 15. A standard score of 100 would also be at the 50th percentile rank. The middle half of these standard scores falls between 90 and 110. SCALED SCORES (called "standard scores" by Pro-Ed) are standard scores with an average (mean) of 10 and a standard deviation of 3. A scaled score of 10 would also be at the 50th percentile rank. The middle half of these standard scores falls between 8 and 12. QUARTILES ordinarily divide scores into the lowest, next highest, next highest, and highest quarters (25%) of scores. However, they are sometimes modified as shown below. It is essential to know what kind of quartile is being reported. DECILES divide scores into ten groups, each containing 10% of the scores.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

62

There are Each &&

&

& &

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

200 &s. = 1%.

&

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

& & &

&

Percent in each

2%

7%

16%

50%

16%

7%

2%

Standard Scores

- 69

70 – 79

80 - 89

90 – 109

110 – 119

120 - 129

130 -

Scaled Scores Percentile Ranks

1

2

3

Wechsler IQ Classification WIAT-III Classifications

5

6

03 – 08

- 02

7

0 Lowest 5%

1 Next 20%

0 Lowest 25% 10 Extremely Low Very Low Low 55 – 55 69 – 54

3.11.13 Rivier Univ.

8

20 Borderline

10

2 Next 25% 2 Next 25% 1 Next 25% 30 40 50

Low Average

Below Average 70 – 84

SAIF

9

11

12

25 – 74

09 - 24

1 Lowest 25%

Quartiles Modified Quartiles Modified Quartile-Based Scores Deciles

4

3 Next 25% 3 Next 25% 2 Next 25% 60 70 80

Average Average 85 – 115

Statistics

John O. Willis

13 75 – 90

14

15

16 17 18 19

91 - 97

98 -

4 Highest 25% 4 Highest 25% 3 Highest 25% 4 with 1 or more errors zero errors 90 High Average

100 Superior Above Average 116 – 130

Very Superior Super -ior 131145

Very Super -ior 146 –

63

It is essential that the reader know (and be reminded) precisely what classification scheme(s) we are using with the scores, whether we use all the different ones provided with the various tests or translate everything into a common language. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

64

I usually put all my test scores in an appendix to the narrative report. The right-most column is usually a verbal label for each score (e.g., “Above Average”). I use footnotes to explain the test scores, confidence bands, and percentile ranks in at least the first table in the appendix. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

65

The last column gets a footnote in every table so I can keep reminding the reader that I am either using one set of verbal labels (not necessarily the publisher’s) for scores or that I am using various publishers’ different sets of labels, so the same score may have different names. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

66

Ralph's Test Scores in Standard Scores, Percentile Ranks, and Stanines for his Age Test WISC-IV Full Scale IQ DAS-II General Conceptual Ability Woodcock-Johnson III General Intellectual Ability Stanford-Binet 5 Full Scale IQ KABC-III Mental Processing Index RIAS Composite Intelligence Index WIAT III Reading Comprehension Gray Oral Reading Test Oral Reading Quotient

Stan90% dard ConfiScore dence1 110 106 – 114 110 106 – 114 110 106 – 114 85 81 – 89 85 81 – 89 85 81 – 89 85 81 – 89 85 81 – 89

Percentile2 75 75 75 16 16 16 16 16

Classification3 High Average Above Average Average Low Average Average Below Average Ava Average Below Average

1. Even on the best tests, scores can never be perfectly accurate. This range shows how much scores are likely to vary 90% of the time just by pure chance. 2. Percentile ranks tell the percentage of students the same age who scored the same as Ralph or lower. For example, a percentile rank of 67 would mean that Ralph scored as high as or higher than 67 percent of students his age and lower than the remaining 33 percent. 3. Each test uses its own particular scheme for classifying scores. The same score may be called different names on different tests. Please see the explanation on p. i of the Appendix to this report.

Because of the dramatic discrepancy between Ralph's Average General Intellectual Ability on the WJ III and his Average Reading Comprehension on the WIAT-III, the team should consider the possibility that he might have a specific learning disability in reading comprehension.

1. These are the standard, scaled, or T scores used with the various tests. Please see p. i of the Appendix to this report for an explanation of these scores. 2. Even on the best tests, scores can never be perfectly accurate. This range shows how much scores are likely to vary 90% of the time just by pure chance. 3. Percentile ranks tell the percentage of students the same age who scored the same as Ralph or lower. For example, a percentile rank of 67 would mean that Ralph scored as high as or higher than 67 percent of students his age and lower than the remaining 33 percent.

4. Each test uses its own particular scheme for classifying scores. The same score may be called different names on different tests. Please see the explanation on p. i of the Appendix to this report. – or – 4. Each test uses its own particular scheme for classifying scores. The classification schemes for the various tests taken by Ecomodine are explained on p. ii. I have taken the liberty of substituting "stanine" classifications, as explained on p. i, for the publishers' classifications. These are NOT the classification labels used by the various test publishers. Please see p. ii. 69

If, as I usually do, I copy and paste parts of tables into my narrative (perhaps deleting some rows and columns), I again footnote all columns in the first table and footnote the verbal label column in all tables. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

70

• No matter what you do, you will confuse some readers, annoy others, and enrage a few.

• Explain what you are doing in at least three places in the narrative and in a footnote on every table and a few score citations in text. 10/4/13 ASAIF

Writing Reports John Willis

71

However, bear in mind that all such classification schemes are arbitrary (not, as attorneys say, “arbitrary and capricious,” just arbitrary).

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

72

"It is customary to break down the continuum of IQ test scores into categories. . . . other reasonable systems for dividing scores into qualitative levels do exist, and the choice of the dividing points between different categories is fairly arbitrary. . . . 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

73

“It is also unreasonable to place too much importance on the particular label (e.g., 'borderline impaired') used by different tests that measure the same construct (intelligence, verbal ability, and so on)." [Roid, G. H. (2003). StanfordBinet Intelligence Scales, Fifth Edition, Examiner's Manual. Itasca, IL: Riverside Publishing, p. 150.]

74

Life becomes more complicated when scores are not normally distributed, as is often the case with neuropsychological tests and behavioral checklists, and sometimes with visual-motor and language measures. 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

75

It is easy to check. In a normal distribution (or one that has been brutally forced into the Procrustean bed of a normal distribution), the following scores should be equivalent.

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

76

If the standard scores do not match these percentile ranks in the norms tables, the score distribution is not normal and the standard scores and percentile ranks must be interpreted separately. See the test manual and other books by the test author(s).

PR

SS

ss

v

T

B-O

z

PR

99.9 98 84 50 16 02 0.1

145 130 115 100 85 70 55

19 16 13 10 7 4 1

24 21 18 15 12 9 6

80 70 60 50 40 30 20

30 25 20 15 10 5 0

+3.0 +2.0 +1.0 0 –1.0 –2.0 –3.0

99.9 98 84 50 16 02 0.1

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

77

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

78

http://myweb.stedwards.edu/brianws/3328fa09/sec1/lecture11.htm Brian William Smith

Dumont/Willis Extra Easy Evaluation Battery (DWEEEB) http://alpha.fdu.edu/~dumont/psychology/DWEEBTOC.html

3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

80

SCORES IN THE AVERAGE RANGE There are 200 &s. Each && = 1%.

&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

Percent .1% S.S. s.s

- 55 1

2

3

4

5

6

7

8

9

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

&&

99.8%

.1%

56 – 144

145 -

10

11

12

13

14

15

16

17

18

19

T

- 20

21 – 79

80 -

PR

- 0.1

0.2 – 99.8

99.9 -

Average

High Average

Classi- Low fication Average

There are 200 &s. Each && = 1%.

& &&&&&& &&&&&&

& & & &

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

Percent

49%

2%

49%

S.S.

< 100

100

> 100

s.s

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

T

< 50

50

> 50

P. R.

- 48

4951

52 -

Classification

3.11.13 Rivier Univ.

&&

Below Average

16

17

& & & &

18

19

Above Average

Average

81

A publisher calling a score “average” does not make the student’s performance average. If a student earned a Low Average reading score of 85 on the KTEA or WIAT-II and is then classified as Average for precisely the same score on the KTEA-II or WIAT-III, the student is still in the bottom 16% of the population! 3.11.13 Rivier Univ.

SAIF

Statistics

John O. Willis

82

HAND ME THAT GLUE GUN Byron Preston, 15, hasn't gone to school for four months. . . . He . . . was expelled for possession of a "weapon" -- a tattoo gun, which he took to school to practice tattooing on fruit. "It doesn't shoot anything," complains his father, James. "It just happens to have the word 'gun'." But school officials wouldn't listen, saying a student having a "gun" at school calls for automatic expulsion according to their zero tolerance policy. A Prince George's County Public Schools spokesman says the policy is "under review" by the school board. The Prestons have been told verbally that they won the appeal of the expulsion, but somehow the paperwork to reinstate Byron into school has 83 never shown up. (RC/WTTG-TV)

I call 90 - 109 “Average.” There are Each &&

&

& &

&

- 69 1

2

3

Extremely Low – 69 Very Low Low 55 – – 55 69

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

200 &s. = 1%.

4

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

70 – 79

80 - 89

5

6

Borderline 70 – 79

90 – 109

7

8

Low Average 80 – 89

Below Average 70 – 84

3.11.13 Rivier Univ.

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

9

10

11

Average 90 – 109

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

110 – 119

120 - 129

12

13 High Average 110 – 119

Average 85 – 115

SAIF

Statistics

John O. Willis

14

15 Superior 120 – 129

Above Average 116 – 130

& & &

&

130 16 17 18 19 Very Superior 130 – Super Very -ior Super -ior 131145 146 –

84

I call 85 - 115 “Average.” There are Each &&

&

& &

&

- 69 1

2

3

Extremely Low – 69 Very Low Low 55 – – 55 69

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

200 &s. = 1%.

4

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

70 – 79

80 - 89

5

6

Borderline 70 – 79

90 – 109

7

8

Low Average 80 – 89

Below Average 70 – 84

3.11.13 Rivier Univ.

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

9

10

11

Average 90 – 109

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

110 – 119

120 - 129

12

13 High Average 110 – 119

Average 85 – 115

SAIF

Statistics

John O. Willis

14

15 Superior 120 – 129

Above Average 116 – 130

& & &

&

130 16 17 18 19 Very Superior 130 – Super Very -ior Super -ior 131145 146 –

85

I call 80 - 119 “Average.” There are Each &&

&

& &

&

- 69 1

2

3

Extremely Low – 69 Very Low Low 55 – – 55 69

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

200 &s. = 1%.

4

& &&&&&& &&&&&&

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

70 – 79

80 - 89

5

6

Borderline 70 – 79

90 – 109

7

8

Low Average 80 – 89

Below Average 70 – 84

3.11.13 Rivier Univ.

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

9

10

11

Average 90 – 109

&& &&&&&& &&&&&& &&&&&& &&&&&& &&&&&&

& &&&&&& &&&&&&

110 – 119

120 - 129

12

13 High Average 110 – 119

Average 85 – 115

SAIF

Statistics

John O. Willis

14

15 Superior 120 – 129

Above Average 116 – 130

& & &

&

130 16 17 18 19 Very Superior 130 – Super Very -ior Super -ior 131145 146 –

86

I call him “Nice Kitty.”

87