Teacher development and assessment literacy
by Tim Newfields (Toyo University)

After defining the concept of assessment literacy and possible operationalizations of this concept for three different populations, the rationale for developing an assessment literacy scale is explained. Using a modified Angoff procedure, the suitability of 100 possible assessment literacy items for three target populations was evaluated by a small panel of experts. Sample items are described and 70 items concerning assessment related issues that may be appropriate for high school foreign language teachers are outlined. This paper concludes by considering possible uses and limitations of the Assessment Literacy for High School Foreign Language Teachers Inventory and a call for further research on assessment literacy. Keywords: assessment standards, evaluation skills, test competence, statistical literacy, test development

[ p. 48 ]
This paper examines the notion of assessment literacy and some of its possible components. After mentioning why assessment literacy is important for teachers, let's briefly conceptualize this term, then attempt to operationalize it, and finally examine some screening items that might actually begin to express what this notion represents for three different groups. Those who are hoping to find a single, cogent definition of "assessment literacy" that works for all groups will be disappointed because I believe the construct represents a wide matrix of skills which vary significantly from population to population. What might be called "assessment literacy" from the viewpoint of a university student, a high school teacher, and a professional test developer probably involve vastly different skills.[ p. 49 ]
"Instead of conceptualizing assessment literacy solely as a set of given skills, perhaps we should also focus on the conditions needed to foster such skills." 
[ p. 50 ]
When I ask university students in Japan about assessment literacy in their native language, very few are able to articulate anything. Is this because they lack the metalanguage needed to describe the concept? Or perhaps is this because the term satei nouryoku (perhaps the best translation of "assessment literacy") is not a pervasive word in Japanese? Such questions are fascinating, but beyond the scope of this paper.

"Often, the biggest challenge in promoting assessment literacy seems to be convincing endusers that the topic is actually worth learning: when many people encounter the arcane jargon and complex statistical formulas sometimes used in assessment, a frequent response is numbness." 
[ p. 51 ]


[ p. 52 ]
Figure 1. Procedure adopted in this assessment literacy research
[ p. 53 ]
Part I: Terminology  
question #  response format(s)  sample task(s)  sample topic(s) 
Q1  Q15  matching  match testing terms with appropriate symbols  sample variance, null hypothesis, mean 
Q16  Q29  multiple choice  select the correct term for a concept described  exam types, variable types, error types 
Q30  Q35  open response  explain or contrast various statistical terms  explain the central limit theorem 
Part II: Procedures  
Q36 Q40  short completion  specify the M and SD for 5 types of test scores  quartile/percentile/stanine/T/zscore 
Q41 Q45  short completion  calculate basic statistics interpret basic statistics 
calculate M, SD for a test pin point strong correlation(s) 
Q46 Q50  short completion  calculate advanced statistics decide an appropriate statistic 
determine effect size for two groups decide which type of ANOVA to use 
Q51  Q55  short completion  calculate five correlation statistics  determine the Pearson correlation index 
Q56  Q59  mostly open response  interpret pretest/posttest results  decide what classroom "progress" occurred 
Part III: Test Interpretation  
Q51  Q74  mostly open response  interpret published research  construct validity, accommodation 
Part IV: Assessment Ethics  
Q75  Q100  multiple choice  select the most appropriate sentence response for each question 
grading procedures, reporting test scores, handling ethical violations 
[ p. 54 ]
[ p. 55 ]
4 recommendations in favor  3 recommendations in favor  2 recommendations in favor  1 recommendation in favor  No recommendations in favor 
Q3, Q5, Q11, Q1314, Q20, Q2224, Q32, Q36, Q4142, Q56, Q71, Q7682, Q8486, Q8890, Q92, Q94100  Q16, Q28, Q32, Q37, Q43, Q7274, Q75, Q83, Q87, Q91  Q17, Q3940, Q45, Q47, Q50  Q1, Q4, Q6, Q8, Q15, Q21, Q26, Q38, Q44, Q5759, Q66  Q2, Q7, Q910, Q12, Q1819, Q25, Q27, Q2931, Q3335, Q46, Q4849, Q5155, Q6065, Q6770, Q93 
36 items total  12 items total  6 items total  13 items total  34 items total 
[ p. 56 ]
Based on this procedure, 48 items from Appendix A were adopted into the first version of the Assessment Literacy Test for High School Foreign Language Teachers in Appendix C and 47 were rejected. The remaining six items that had two votes were considered on a casebycase basis.Part I: Terminology  
question #  response format(s)  sample task(s)  sample topic(s) 
Q1  Q9  matching  match testing terms with appropriate symbols  sample variance, null hypothesis, mean 
Q10  Q16  multiple choice  select the correct term for a concept described  exam types, variable types, cutoff points 
Q17  Q20  open response  explain or contrast various statistical terms  distinguishing masters and nonmasters 
Part II: Procedures  
Q21 Q25  short completion  calculate basic descriptive statistics interpret basic statistics 
calculate mean & S.D. for a test identify points of significance 
Q26 Q29  open response  interpret pretest/posttest gains  assess whether classroom "progress" occurred 
Q30  Q33  short completion  calculate three descriptive statistics  describe a boxplot and bell curve 
Q34  Q36  open response  think of three ways to increase validity  validity & reliability issues the reliability of a writing test item 
Part III: Test Interpretation  
Q37  Q44  mostly open response  interpret tests and research  invalid test items, sloppy statistics interpreting error of measurement 
Part IV: Assessment Ethics  
Q45  Q56  multiple choice  select the most appropriate sentence response for each question 
grading procedures, reporting test scores handling ethical violations 
Q57  Q70  mostly open response  identify an ethical problem and/or suggest a solution to a problem 
grading procedures, confidentiality issues, dealing with test anxiety 
" . . . many aspects of assessment are interrelated: ethics often impinge upon interpretation and statistical procedure use." 
[ p. 57 ]
Another point clear from the inventory in Appendix C1 is that many items tend to focus on those aspects of assessment literacy which are easilymeasurable. As a result, the Assessment Literacy Test for High School Foreign Language Teachers Inventory has a strong quantitative orientation and perhaps too many questions about statistics. These aspects can be measured in vitro through writing, but perhaps the most important forms of classroom assessment happen in vivo and informally. Moreover, if we look at the tentative operationalization of assessment literacy for teachers suggested in Table 2, it is clear that some areas are underrepresented in the test in Appendix C. Specifically, items #6 and #11 are not sufficiently covered. This suggests that the test needs to be augmented in some areas (quite likely), or the operationalization of the concept needs to be worked out more (also likely), or both.[ p. 58 ]
If we take the optimistic view that given the time and resources, most teachers will be motivated to improve their own assessment literacy skills, then several suggestions are in order. Table 7 lists some ways that ordinary teachers can to become more literate about assessment.

Acknowledgement: I am grateful to Kristie Sage and Peter Ross for their feedback on this article. The limitations of this paper, however, are my responsibility. 
[ p. 59 ]
[ p. 60 ]
Main Article  Appendix A  Appendix B  Appendix C 