|
Criterion-referenced test administration designs and analyses by Takaaki Kumazawa (Kanto Gakuin University) |
|
|
Keywords: criterion-referenced tests, test analyses, intervention construct validity study ![]() |
[ p. 65 ]

[ p. 66 ]
| "If no diagnostic test is administered, teachers have no information on what students can do before instruction." |
| "If students know that a test administered as a diagnostic test is also going to be used as an achievement test, they may only study the parts of the class content that are on the test." |

[ p. 67 ]
Using a design with different pretest/posttest forms can minimize pretest reactivity effects, and teachers can test a wide range of class content with dual CRT forms. However, this design also entails a pitfall. If the difficulties of the two CRT forms differ, then it becomes difficult to estimate students' achievement simply by subtracting their test scores on the diagnostic test from their test scores on the achievement test.

[ p. 68 ]
Research questions|
1. A linguist studied how parents talked to their young children. |
[ p. 69 ]
[ p. 70 ]
| n | Minimum | Maximum | M | SD | Skewness | Kurtosis | KR 20 | φ | |
| Pre Form A (Group A) |
44 | 6 | 14 | 10.14 | 2.42 | 0.16 | -1.04 | .06 | .05 |
| Pre Form B (Group B) |
37 | 7 | 20 | 13.19 | 2.98 | 0.31 | -0.11 | 40 | .37 |
| Pre Forms A & B (Groups A & B) |
81 | 6 | 20 | 11.53 | 3.08 | 0.42 | -0.06 | ||
| Post Form A (Group B) |
36 | 11 | 25 | 17.75 | 2.91 | 0.05 | 0.53 | .49 | .46 |
| Post Form B (Group A) |
44 | 6 | 20 | 12.48 | 3.47 | 0.16 | -0.17 | .53 | .57 |
| Post Forms A & B (Groups A & B) |
80 | 6 | 25 | 14.85 | 4.16 | -0.09 | -0.42 |
[ p. 71 ]

Based on the phi dependability indexes, with the exception of the form taken by Group A as a pretest, the other test forms were found to have moderate dependability values from .37 to .53. Since most of the students in Group A scored low on the pretest, it was ideal as a diagnostic test since it revealed most students had not yet learned the items. However, because there was not much variance observed in the test scores, the dependability of the test was probably low. Statistics can be an indicator for deciding the quality of items; nevertheless, especially when the sample size and criterion-referenced item number are both small, teachers should examine the content carefully in order to decide whether or not items are really measuring the target objectives of the class.
1. To what extent were the two CRT forms dependable in the two administrations?
The pretest/posttest design with two counterbalanced forms enables teachers to determine to some degree the effectiveness of their instruction. Such designs focus on two indicators: DI and score gain. To calculate DI, the same items have to be administered as pretests and posttests. Recall that DI for the items in the posttest given to Group B had negative values. Because the proficiency level of Group A and Group B differed, this was not surprising. Ideally the DI statistic should be used when the proficiency levels of two groups are almost equal. To resolve the problem in this study, each class should have been split into halves.
2. To what extent did the students master the vocabulary items on the two forms of the CRT?
[ p. 72 ]
| "It is recommended that teachers make CRTs before instruction so that successful teach-to-test instruction can be accomplished. It is also recommended that two forms of any CRT be developed in order to test a wider range of content in a counterbalanced design." |
[ p. 73 ]
References[ p. 74 ]