Lifelong Learning: Proceedings of the 4th Annual JALT Pan-SIG Conference.
May 14-15, 2005. Tokyo, Japan: Tokyo Keizai University.

An evaluation system for communicative learning activities
コミュニケーション学習活動の評価システム (Japanese Title)

by Richard Blight (Ehime University)


キーワード:	評価システム、コミュニカティブラーニング(コミュ
			ニケーション重視型学習) クラスルームアクティビテ


This paper describes an evaluation system for measuring the effectiveness of communicative learning activities in university freshmen courses. A set of evaluation criteria is developed based on what are determined as being the general qualities of successful communicative activities. Teachers use the criteria to rate five classroom activities. The activities are first considered in terms of the mean rating scores against each criterion. The general performance of the five activities is also compared in terms of mean scores on the evaluation system. In the final stage, the activities are developed for future classes based on directions provided by the evaluation system.

Keywords: evaluation system, communicative learning, ESOL classroom activities

How can teachers know whether communicative activities are effective in a given classroom context? How can they determine whether the learning outcomes provided by an activity are consistent with the goals and objectives of a language course? When developing an evaluation system to measure the effectiveness of classroom activities, there are a broad range of factors to consider. Does the activity facilitate the learning process in the designated learning context? What are the essential aspects of communicative theory that should be incorporated into the activity? Is the classroom practice “designed to engage learners in the pragmatic, authentic, functional use of language for meaningful purposes” (Brown, 2001, p. 43)? Do the materials “activate the learners in such a way as to get them to engage with the [language] material to be practiced” (Ur, 1988, p. 17). What other objectives can be identified for an activity and how do the learning outcomes correlate with those objectives?
A range of different forms of evaluation systems are used in contemporary language teaching programs. Most commonly, evaluations concentrate on broad program and curriculum goals, often for purposes of accountability (Richards, 2001). However, while broad-scale evaluations are invaluable to program administrators, teachers are more generally concerned with “whether they are accomplishing their goals and whether they need to make changes . . . . their attention is likely to focus . . . more on whether specific activities and techniques appear to 'work' in the context of a particular lesson” (Ellis, 1998, p. 218). Ellis consequently suggests that teachers should use forms of “micro-evaluation . . . characterized by a narrow focus on some specific aspect of the curriculum” (pp. 218-9).

[ p. 21 ]

This paper describes a micro-evaluation process developed for measuring the effectiveness of communicative activities in university freshmen courses at Ehime University. The evaluation system is designed as a practical instrument to provide feedback to teachers and for the purpose of producing improved teaching materials. The approach taken is to initially consider the qualities of communicative activities which lead to successful learning outcomes in a range of classroom situations. The qualities which tend to differentiate between successful and unsuccessful activities are identified and formulated as a set of evaluation criteria. The evaluation criteria used in this study have been developed in relation to the objectives of oral communication courses at Ehime University, and will be used to demonstrate the evaluation system in this context. It is hoped that teachers will be able to develop similar evaluation systems to meet the needs of their own learning contexts.
Teachers rated the performance of five communicative activities against the evaluation criteria using a 5-point Likert rating scale. A consensus of low scores against specific criteria would be indicative of a mismatch between instructional objectives and learning outcomes, which suggests that the materials require further development (Genesee & Upshur, 1996). The results of the evaluation are considered first in terms of the ratings of specific activities against the evaluation criteria, with two activities being examined in detail. Since the strong and weak areas of the activities are identified, the evaluation results suggest directions for developing the activities to achieve improved learning outcomes in future language courses (Rea-Dickens & Germaine, 1992). Another dimension of analysis is subsequently considered involving mean score comparisons for the five activities. This type of analysis provides a comparative perspective on the performance of the activities which also assists with the materials development process.

Materials evaluations at Ehime University

First year students at Ehime University take a compulsory oral communication course as part of the university's General Education program. The course aims to develop students' ability to communicate in spoken English by involving them in frequent conversation practise. While students need “knowledge of the linguistic forms, meanings, and functions” to communicate effectively, they also need to be exposed to a variety of purposes for language use and to appropriately “manage the process of negotiating meaning” (Larsen-Freeman, 2000) in different contexts. The classes are taught entirely in English and are based on contemporary methods for communicative language teaching, similar to many other oral communication courses currently being provided at the first-year university level in Japan.
An in-house textbook was produced for the oral communication course and used in all classes on the General Education program (comprising 84 classes and 1,680 students in total). The classroom activities follow a standard communicative approach, “focussed on all of the components (grammatical, discourse, functional, sociolinguistic, and strategic) of communicative competence” (Brown, 2001, p. 43). Conversations on specific topics are developed through a series of structured activities which focus on establishing and extending the target language expressions. Students are provided with “a purpose for communicating” and are “focused on the content of what they are saying” (Harmer, 2001, p. 85). Language development is integrated into pair and group activities “that encourage communication between students or between the instructor and students” (Comeau, 1987, p. 58). While the university textbook is subject to a general evaluation process, the purpose of the present research project is to develop an alternative evaluation system which can be used to measure the effectiveness of specific activities on the course. The two evaluation systems should subsequently complement one another to provide beneficial directions for developing the materials for future editions of the textbook (Brown, 1995).

[ p. 22 ]

Developing the evaluation system
"The first stage in developing an evaluation system is to determine a set of criteria which serve to relate the teaching objectives to the learning outcomes occurring in classroom situations . . ."

The first stage in developing an evaluation system is to determine a set of criteria which serve to relate the teaching objectives to the learning outcomes occurring in classroom situations (Rea-Dickens & Germaine, 1992). The present evaluation system was designed to meet the objectives of the first-year course at Ehime University and to be applied to any activities in the textbook. It was consequently necessary to maintain a focus on features which could be regarded as common to all activities, to use general wordings in the evaluation criteria, and to omit questions related to specific types of activities. A set of evaluation criteria was subsequently developed with these design features in mind (see Table 1).

Table 1. A proposed communicative evaluation criteria for a EFL test.

No. Evaluation Criteria
C1 Clear learning objective
C2 learning purpose is useful / beneficial
C3 involves meaningful communication
C4 provides practise / repetition of target language forms
C5 level of learner activation / active participation
C6 motivation factor / interesting, enjoyable
C7 personalization / personal experiences, opinions, feelings
C8 appropriate learning challenge / tension
C9 volume of language production
C10 appropriate difficulty level
C11 appropriate pace / rate of progression

Some of the criteria refer to properties associated with the objective or learning purpose, since this is fundamental to any activity (e.g., C1, C2). Several other criteria refer to communicative teaching objectives that are regarded as essential to all activities in the course (e.g., C3, C5, C7). Another group of criteria refer to general properties which are necessary for activities to be regarded as successful (e.g., C6, C8, C10, C11). The evaluation criteria cannot, however, all be categorized into separate performance areas, since some degree of overlap is common. For example, 'repetition of target language forms' (C4) can be associated with the achievement of learning objectives, as well as being a necessary property of activities. The evaluation criteria also tend to have different types of focus; some refer to properties inherent in the activities (e.g., C3, C4), while others refer to the response of the students to the activity (e.g., C6, C8), which depends on the particular group of students and could vary substantially between different classes at Ehime University.

[ p. 23 ]

Since the evaluation system is in an early stage of development, the degree of confidence that can be associated with the interpretation of the evaluation results has not yet been determined. It is possible that practical limitations may have seriously impacted on the survey results. For example, a type of averaging effect was produced when, rather than providing ratings for specific classes, the teachers gave general ratings for the effectiveness of each activity. This stage amounts to a type of impressionistic process or approximate measure (Hughes, 2003), with the raters themselves determining the average responses of each activity in relation to different classroom experiences, rather than producing statistical averages based on specific classroom contexts. Areas of validity and reliability also need to be more sufficiently established in order to interpret the evaluation results with some degree of certainty. Initial steps have been taken in these directions but have not yet provided a basis for theoretical verification (McNamara, 2000). In order to establish construct validity, for example, detailed explanations of the theoretical concepts underlying each of the evaluation criteria were provided to teachers prior to rating the activities, but variations in the interpretations that were actually applied during the rating process have not been examined. An area scheduled for future development consequently relates to “Quality control for raters” or inter-rater reliability (McNamara, 2000, p. 56), which should be investigated to determine the degree of rater consistency in the evaluation process.
Informal feedback from the teachers who rated the activities suggests that the evaluation system has reasonable face validity, although there were some differences of opinion concerning the formulation of the evaluation criteria. While establishing construct validity and reliability remains prominent in terms of the future development of the evaluation system, the initial lack of empirical verification needs to be balanced against the purpose of a low stakes micro-evaluation instrument. However, it is recognized that the potential impact of limitations inherent in any evaluation process should be monitored with a view to incorporating validation procedures in subsequent stages of development. A more extensive evaluation system could, for example, be developed to provide improved validation, although this direction would require the allocation of additional resources to this project.

Evaluation results

Fifteen teachers rated five activities from the university textbook against the eleven evaluation criteria, using a five point Likert scale (5 = Very Good, 4 = Good, 3 = Satisfactory, 2 = Poor, 1 = Very Poor) to score each activity. The evaluation results are first provided here for two activities to demonstrate the type of feedback which is provided by the evaluation system. Activity #2 and Activity #3 have been selected for this purpose since they are different types of activities which should provide contrasting evaluation results. Activity #2 is a role play activity which is expected to rate well in terms of providing a good motivation factor and high levels of active participation (C6, C5). However, it could have been difficult for some students, and as a consequence may not have produced sufficient language volume (C9, C10). By contrast, Activity #3 is an information gap activity which is expected to rate well in terms of providing repetition of target language forms (C4), although it is possibly weak in terms of providing opportunities for personalization and meaningful communication between the students (C7, C3).

Activity #2

Activity #2 occurs as the first exercise in a lesson on giving advice and suggestions in a range of common everyday situations. The target language, involving expressions for asking advice, giving advice, accepting advice, and rejecting advice, is first introduced. Six problem situations are described and the students asked to consider what they would do in each situation and to write some suggestions and advice. Students role play the situations in pairs, with one student describing the situation and asking for advice, and the other student considering the situation and providing helpful suggestions. The students are subsequently asked to extend the role play several steps beyond the one-sentence responses they initially provided on paper by discussing whether the advice is helpful and either accepting or rejecting the suggestions. The activity hence progresses from the initial controlled practice occurring in written form to the free practice stage involving the expanded role play discussion. The teacher monitors to ensure that the students are correctly using the target language expressions throughout each stage of the activity. A full description of the learning objective, target language, student instructions, and classroom procedure for Activity #2 is provided for reference purposes (see Appendix 1).

[ p. 24 ]

Figure 1
Figure 1. Evaluation results for Activity #2.

The evaluation results for Activity #2 are provided as measures of central tendency and dispersal (see Appendix 2). The mean scores were subsequently rounded to one decimal place and represented graphically (see Figure 1). The ratings are consistently high across the eleven criteria, with mean scores ranging from a high score of 4.5 points to a low score of 3.5 points. C1 (clear learning objective) is rated very highly, with the majority of the other criteria clustered around the 4 point level (equivalent to a 'good' rating on the Likert scale). The materials could possibly be developed to increase performance in areas of C8 (appropriate learning challenge), C9 (volume of language production), and C10 (appropriate difficulty level). The lower ratings on these three criteria suggest that some students experienced difficulty producing language during the extension stage, when they were asked to provide additional questions and to discuss the suggestions in some detail. The activity could hence be improved by providing an additional level of support in these areas; for example, language boxes could be provided with examples of extended discussions or featuring sample expressions that could be used during the expansion process. This type of materials revision would provide support for the students experiencing difficulty, while allowing other students to proceed independently. In summary, the evaluation results for this activity are generally very positive, with the consistently high ratings suggesting that the activity is effective in producing learning outcomes related to the teaching objectives. Furthermore, since the evaluation results represent the average responses of fifteen teachers, the results are generalised to apply beyond the single classroom situation at the university.

Activity #3

The purpose of Activity #3 is for students to practise the names of countries, nationalities, and capital cities in the European Union. The teacher commences by introducing the target language expressions. A straightforward information gap activity follows, involving the “transfer of given information from one person to another” (Nunan, 1989, p. 66), in which students use the target language to ask for / provide the missing information on their different versions of the table. Since each partner commences with incomplete information, frequent usage of the target language is necessary to complete the tables. The teacher continues to monitor for correct language usage throughout the activity. A more detailed description of the activity is again provided for reference purposes (see Appendix 1).

[ p. 25 ]

Figure 2
Figure 2. Evaluation results for Activity #3.

The mean scores and standard deviations were calculated to two decimal places (see Appendix 2), but again subsequently rounded to one decimal place for the purpose of graphical representation (see Figure 2). As anticipated, Activity #3 provides a different pattern of evaluation results to the preceding activity. Rather than being broadly consistent, the scores are spread across a large range of values. The high scores (for C4: repetition of target language forms; and C1: clear learning objective) occur at the 4 point level, which is equivalent to a 'Good' rating on the Likert scale. The majority of criteria are, however, rated considerably lower than this at the 3 point level (i.e., 'Satisfactory'), and one criterion (C7: personalization) is rated well below the others at 2.0 points or 'Poor' on the Likert scale. The conclusion to be drawn from these results is that Activity #3 is substantially less effective than Activity #2 at achieving the pedagogical objectives represented in the evaluation criteria. This conclusion can also be drawn with some degree of confidence, since the ratings represent the responses provided by fifteen teachers for this activity.
The weakness in many evaluation criteria suggests that Activity #3 should either be dropped from future courses, or that it should be subjected to major revisions. As a minimum requirement for inclusion in future courses, the five criteria rated as below 'Satisfactory' (C3: meaningful communication; C6: motivation factor; C7: personalization; C9: volume of language production; C10: appropriate difficulty level) clearly require further consideration. For example, the scores on several criteria could be increased by changing the classroom procedure to include a brief personal exchange after completing each line of the table. After writing the country / nationality / capital city, students could next discuss what they know about that part of Europe. A language box could be included on the worksheet, providing questions to generate discussion (e.g., “Have you been to . . . ?”, “Would you go to . . . ?”, “What do you know about . . . ?”, “What food do they eat in . . . ?”). Some locations would generate interesting exchanges, while others could be brief. However, in both cases, the meaningfulness of the interaction has been improved by personalizing the content, and the students would most likely respond with increased interest and better motivation during the activity.
Direct comparisons between the two sets of evaluation results also raise questions about how dependably we can interpret the ratings. For example, although the same fifteen teachers have rated the two activities, it is not apparent why Activity #3 has been rated substantially lower on C1 (clear learning objective) than Activity #2 (i.e., 3.8 points vs. 4.5 points). Rather, there seems to be a general sense of dissatisfaction with Activity #3, which appears to have reduced the ratings on many criteria. If this supposition is correct, the issue of how objectively the raters are measuring the criteria becomes a major concern. This observation also tends to indicate that the construct validity established during the training process requires further investigation in subsequent stages of the project.

[ p. 26 ]

Summary results

The evaluation results from the fifteen teachers for each of the five textbook activities can also be averaged across the eleven criteria to produce average results for each activity (see Appendix 2). This type of analysis is useful for comparing the general effectiveness of activities based on the pedagogical values represented by the evaluation criteria. The mean values for the five activities evaluated in the present study are represented in Figure 3. The comparative effectiveness of the activities is clearly evident, with Activities #1, #2, and #3 receiving mean values at the 4 point level, which equate to a 'Good' rating on the Likert scale. The other two activities (Activities #3 and #5) have been rated substantially lower at the 3 point level, equivalent to a 'Satisfactory' rating. The contrast between the effectiveness of the two activities considered in the preceding section of this paper (Activities #2 and #3) is also clearly evident.
Figure 3
Figure 3. Mean scores for the activities.

This type of analysis is useful for identifying which activities should be prioritized for materials development work. Since fifteen teachers have rated Activities #3 and #5 as being substantially less effective than the other activities, these two activities should be considered first for revisions. By studying the results for these activities against the specific assessment criteria, it can be determined whether the activities are weak in a few areas or generally weak across the range of evaluation criteria. Course developers ultimately need to determine whether to continue using the activity or to drop it from future language courses. If they decide to continue with the activity, modifications should be made to the procedures and materials. The type of analysis relevant to this process has been demonstrated in preceding sections of this paper and suggestions were provided for improving the activities. The next stage in the development process involves trialling the revised materials to determine whether satisfactory improvements have been achieved. Repeating the evaluation process on the revised materials would provide useful feedback on the effectiveness of the revisions; in this way the activities used in a language course can be systematically developed to produce a highly effective language curriculum (Brown, 1995; Tomlinson, 1998).


This paper has demonstrated the process of developing an evaluation system for communicative activities which can be used to substantially improve the effectiveness of teaching materials. Five activities from the Ehime University first-year English course were evaluated by fifteen teachers based on a set of eleven evaluation criteria. The rating process determined that Activities #3 and #5 were generally less effective for the first-year courses than Activities #1, #2, and #4. The evaluations of the activities against the eleven evaluation criteria also indicated the strengths and weaknesses of each activity so as to provide directions for future materials revisions. The analysis of two activities has been demonstrated, with Activity #2 receiving a generally strong evaluation profile, although some suggestions were made to improve ratings for the learning challenge, the volume of language production, and the difficulty level. By contrast, Activity #3 received mixed ratings, which included some strong areas but were predominantly weak. It was consequently recommended that the activity be subjected to major revisions to address issues in five evaluation criteria, and suggestions were provided for improving the learning outcomes achieved in future language courses.
"While the formulation of the criteria and the results may differ between learning contexts, the evaluation process demonstrated in this paper remains fundamentally unchanged."

[ p. 27 ]

It is anticipated that teachers working in different educational contexts will be able to devise similar evaluation systems to improve the teaching materials used in their own language courses. While the formulation of the criteria and the results may differ between learning contexts, the evaluation process demonstrated in this paper remains fundamentally unchanged. The evaluation system provides a unique profile for each activity based on pedagogical objectives and the specific learning context. Each activity is analysed in terms of its strengths and weaknesses, and an overall performance rating is also determined. Materials can subsequently be included in a curriculum on the basis of providing enhanced performance in specific pedagogical areas. Activities with different profiles could also be sequenced in a language course to complement one another in their arrangement and delivery.
The present type of micro-evaluation system provides a useful and practical instrument for teachers to gauge the effectiveness of communicative activities in their courses. It simply and effectively provides an “appraisal of the value of materials in relation to their objectives and to the objectives of the learners using them” (Tomlinson, 1998, p. xi). The evaluations provide directions for developing activities for future language courses (Graves, 2000), as well as general advantages in areas of professional development: “Classroom-based evaluation under the active management of teachers can also serve important professional development purposes since information resulting from such evaluations provides teachers with valuable feedback about their instructional effectiveness that they can use to hone their professional skills” (Genesee, 2001, p. 147). In subsequent stages of this research project, it is intended to develop more effective validation procedures which will provide an increased measure of confidence in the evaluation results, including gathering feedback from raters concerning the evaluation criteria, and determining reliability coefficients to ascertain whether acceptable levels of agreement have been achieved between raters. These procedures would clearly enhance the effectiveness of the present system and lead to more accurate evaluation results.


Brown, H. D. (1995). The elements of language curriculum: A systematic approach to program development. Boston: Heinle & Heinle.

Brown, H. D. (2001). Teaching by principles: An interactive approach to language pedagogy. New York: Pearson Education.

Comeau, R. F. (1987). Interactive oral grammar exercises. In W. M. Rivers (Ed.), Interactive language teaching (pp. 57-69). Cambridge: Cambridge University Press.

Ellis, R. (1998). The evaluation of communicative tasks. In B. Tomlinson (Ed.), Materials development in language teaching (pp. 217-238). Cambridge: Cambridge University Press.

Genesee, F. (2001). Evaluation. In R. Carter & D. Nunan (Eds.), The Cambridge guide to teaching English to speakers of other languages (pp. 144-150). Cambridge: Cambridge University Press.

Genesee, F., & Upshur, J. A. (1996). Classroom-based evaluation in second language education. Cambridge: Cambridge University Press.

Graves, K. (2000). Designing language courses: A guide for teachers. Boston: Heinle and Heinle.

Harmer, J. (2001). The practice of English language teaching. Harlow: Pearson Education.

Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press.

Larsen-Freeman, D. (2000). Techniques and principles in language teaching. Oxford: Oxford University Press.

McNamara, T. (2000). Language testing. Oxford: Oxford University Press.

Nunan, D. (1989). Designing tasks for the communicative classroom. Cambridge: Cambridge University Press.

Rea-Dickens, P., & Germaine, K. (1992). Evaluation. Oxford: Oxford University Press.

Richards, J. C. (2001). Curriculum development in language teaching. Cambridge: Cambridge University Press.

Tomlinson, B. (1998). Materials development in language teaching. Cambridge: Cambridge University Press.

Ur, P. (1988). Grammar practice activities: A practical guide for teachers. Cambridge: Cambridge University Press.

2005 Pan SIG-Proceedings: Topic Index Author Index Page Index Title Index Main Index
Complete Pan SIG-Proceedings: Topic Index Author Index Page Index Title Index Main Index

Last [ p. 28 ] Next