Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 12 No. 1. January 2008. (p. 15 - 17) [ISSN 1881-5537]

Book Review

Language Testing: The Social Dimension
(Language Learning Monograph Series)
Tim McNamara & Carsten Roever. (2007)
Malden, MA & Oxford, UK: Blackwell Publishing
ISBN: 978-1-4051-5543-4 (Pbk)
Pbk: ¥4,793 JPY,   £19.99 GBP,   $38.95 USD

The best way to regard this work is as a persuasive essay stressing the need for more socially-responsible foreign language testing. The authors maintain language testing has been overly swayed by psychometric traditions and suggest more focus be given to the social aspects of language use. Designed for applied linguists and educational analysts already familiar with standard works on language assessment, this 291-page volume suggests why the moment may be ripe for a paradigm shift in language testing.
The book starts off by demonstrating how psychometrically-oriented and socially-oriented testing goals are often disparate. Conventional language tests tend to ignore many socio-cultural factors shaping test performance. Moreover, the ways in which formal tests impact communities and the societal costs of inappropriate testing policies are also frequently overlooked. To create tests which are not only psychometrically valid psychometrically valid but also socially justified, McNamara and Roever stress the need for more comprehensive theories and for socially-oriented research throughout the test development and revision process.
The authors then highlight how various theories of test validation have dealt with the social dimensions of language performance, and they explore the strengths and weaknesses of the validity theories of Cronbach, Messick, Mislevy, and Kane. Remarking on how factors such as task, rater, and interlocutor impact test performance, the authors suggest that new theoretical foundations as well as a lot more empirical research needs to be done before language tests even begin to approach the same level of social precision as they have achieved in the psychological sphere. Suggesting that no widely accepted current theory of test validation does an adequate job of elucidating the social function of tests, McNamara and Roever stop short of suggesting what theory might be up to that task.
The authors then critique the limitations of conventional tests in describing language performance. In particular, concerns about oral interviews and tests of interlanguage pragmatics are raised. McNamara and Roever devote considerable space to pointing out the weaknesses in discourse completion tests (DCTs). To gain better insight into the socio-pragmatic knowledge of respondents, they suggest that role plays and think aloud protocols might be a more promising approach.
Various ways of dealing with test bias and differential item functioning (DIF) are then explored. Disambiguating these terms, McNamara and Roever declare:

[ p. 15 ]

Differential item functioning is a necessary condition but not sufficient condition for bias because a test item that functions differently for two groups might do so because it advantages one group in a construct-irrelevant way, but there might also be a legitimate reason for differential functioning. (p. 83)
A wide range of methods for detecting DIF are then contrasted. The strengths and weaknesses of various parametric, non-parametric, and IRT approaches are duly weighed. Special focus is given to considering the merits and limitations of generalizability theory and FACETS-based multifaceted analyses in detecting DIF. The authors recommend multifaceted analysis for fine-tuned investigations, but generalizability studies when overall impressions of test dependability are sought. Noting how different DIF detection procedures tend to yield different results, the mantra of caution is again stressed when interpreting test results. To some extent, we also need to question the agenda of the researchers: few researchers are immune from political, economic, or social pressures to present themes that they research in a certain light (Marco & Larkin, 2000).
"Though much of the socially-oriented research studies cited in this text are still underdeveloped, the authors do succeed in offering persuasive reasons why the somewhat narrow field of language testing needs to expand its scope."

Midway through the book fairness reviews and codes of ethics and practice are discussed. The authors note how fairness reviews are not only a way of reducing some types of test DIF, but also protecting test developers from costly litigation. Attention then shifts to codes of ethics and practice in the language testing profession. Here the McNamara and Roever dismiss efforts to change testing practices through such codes as ineffectual by noting:
In a weak profession, like language testing, no professional association regulates the right to practice: membership in a professional organization is voluntary, it is not a precondition for practice, and, consequently, there are no serious sanctions against members who violate codes of ethics. The association might exclude them, but they cannot be stopped from continuing to practice, ethically or unethically. (p. 139)
Observing how language tests often serve as identity markers, the authors then show how language test performance is often used to indicate membership in a specific group. Citing examples of test use in intercultural conflict, McNamara and Roever concur with Foucalt (1975/1997) in suggesting that language tests are often forms of surveillance and – in some cases – persecution. Ways in which language tests designed to ascertain the identity of persons are fraught with reliability and validity problems (not to mention moral quandaries) are underscored. Political uses and misuses of tests illustrate how frequently language testing is a form of social engineering. Indeed, the formal and seemingly scientific nature of most language tests serves as a mask for their preeminently social agenda. Stressing the need for more critical awareness about test use, the authors state:
. . . we cannot afford to be merely naive players in the discursively constructed world in which language tests are located. Appropriate intellectual and analytical tools enable us to recognize the roles that tests will play in the operation of power and systems of social control. We will be less inclined to seek shelter in the impersonality and purely technical aspects of our work. We need critical self-awareness in order for us to first recognize and then to decide whether or not to accept or to resist our own subject position in the system of social control in which tests play such a part. (p. 198)

[ p. 16 ]

Language teachers will be particularly interested in Chapter 7 of this work since it considers how tests are used and abused in schools. Ways that political mandates often shape testing in various countries are described. In many cases, politicians create some sort of standard that is not based on social research but rather on a political agenda by which students, teachers, and entire school systems are judged. The impact of such standards is often far-reaching and unforeseen. For example, some schools in the USA now encourage low-performing students to drop out to avoid lowering the overall school scores (Balfanz & Legters, 2004). The authors remind us that political motives are generally more influential in shaping test constructs than any formal academic research.
Readers in Japan may find the authors' reference to Akiyama's (2003) research on the resistance to implementing an EFL speaking test for high schools in Tokyo to be of particular interest. It is a good example of how competing agendas and cultural values can stymie innovation. Despite MEXT pronouncements to the contrary, the authors suggest that oral fluency is not a highly esteemed trait in Japan. Hence:
the underlying construct [of current high school EFL tests in Japan] is not communicative proficiency in English . . . but, rather, diligence and hard work - attributes highly valued in Japanese society . . . the actual content of the test and its validity in terms of conformity to the curriculum guidelines . . . are not the central issue; what matters is that the test be difficult and play the role of selecting the character attributes of diligence and effort. (p. 208)
The authors go on to assert that entrance exams in Japan are essentially a measure of character and/or intelligence rather than communicative ability. Though lip service is given to the need to improve the oral EFL proficiency among Japanese, most Japanese administrators believe tests should be a "properly noncommunicative" (p. 208, 209) means of demonstrating the ability to memorize complex rules and exhibit evidence of academic skills. In other words, testing is a sort of ritualized performance that has little to do with authentic communication, a point McVeigh (2002) discusses at length.
This book concludes by considering future themes for applied linguistic research and obliquely hinting at future training directions. The authors acknowledge that the "para-digmatic one-sidedness of conventional approaches to assessment" (p. 247) has blinded researchers to many social and cultural facets of language use and suggest that:
the relatively narrow intellectual climate of language testing will need to be broadened, with openness to input from such diverse fields as sociology, policy analysis, philosophy, cultural theory, social theory, and the like, in addition to the traditional source fields. (p. 254)
Specifically, McNamara and Roever call for more investigations of test bias and the learning potential of individuals in unfamiliar environments. They also feel more research into alternatives to existing native speaker norms is warranted, as well as further studies in discourse analysis.


This book is clearly an attempt to broaden the scope of language assessment. Though much of the socially-oriented research studies cited in this text are still underdeveloped, the authors do succeed in offering persuasive reasons why the somewhat narrow field of language testing needs to expand its scope. "Language testing is not simply applied psychometrics," the authors conclude, "but, rather, a central area of applied linguistics."
This book is likely to appeal to language testers and educators seeking a rationale for policy changes or fresh areas for research. Its value for EFL classroom teachers may be somewhat limited since not much space is devoted to considering how the ideas presented in the text could be realized in terms of classroom practice.
Even though readers might find themselves disagreeing with a few of the statements made by the authors, I believe most will welcome the chance to reflect on the field of language testing from a wider perspective.

– Reviewed by Tim Newfields
Toyo University


Akiyama, T. (2003). Assessing speaking: issues in school-based assessment and the introduction of speaking tests into the Japanese senior high school entrance examination. JALT Journal, 25 (2), 117-142.

Balfanz, R., & Legters, N. (2004, June). Locating the dropout crisis. Retrieved November 14, 2007 from

Foucalt, M. (1977). Discipline and punish: The birth of the prison. (A. Sheridan, Trans.) London: Allen Lane. (Original work published 1975).

Marco, C. A., and Larkin, G. L. (2000). Ethical issues of data reporting and the quest for authenticity. Academic Emergency Medicine, 7 (6) 691-694.

McVeigh, B. (2002). Japanese higher education as myth. Armonk, N.Y.: M.E. Sharpe Inc.

NEWSLETTER: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next

[ p. 17 ]