Activity 3.3

Performance Criteria— 
Keys to Success


Purposes:

  1. To learn the characteristics of sound performance criteria—what do good performance criteria look like?

  2. To learn the additional features of performance criteria that make them most useful as instructional tools in the classroom

  3. To summarize previous points about the relationship of assessment purpose to design of performance criteria

  4. To build skill in reviewing and critiquing performance criteria

  5. To practice one method of teaching criteria to students

Uses:

This is an advanced level activity that can be used in Chapter 3 to explore the characteristics of quality performance criteria (rubrics) and discuss the advantages and disadvantages of different kinds of performance criteria. It can also be used to follow-up Activity 1.9Going to School and Activity 2.1Sorting Student Work.

Prerequisites might include (a) an activity on the rationale for alternative assessment (e.g., Activities 1.1Changing Assessment Practices..., 1.6Comparing Multiple-Choice and Alternative Assessment, or 1.12Assessment Principles); (b) activities that build knowledge of the role of performance criteria in a performance assessment, how to use criteria to score student work, and how to use criteria as instructional tools in the classroom such as Chapter 2 text and Activities 2.1—Sorting Student Work, and 2.5How Knowledge of Performance Criteria Affect Performance; and (c) one of the activities providing a gentle introduction to quality issues, such as 1.5Clapping Hands or 1.11Assessment Principles.

Rationale:

A variety of performance criteria types are being developed. The type to use depends on: (1) the type of student outcome being assessed; (2) whether the assessment is large-scale or classroom; and (3) the vision one has of the role of assessment in the educational process. These issues are discussed in the text in Chapter 3. The purpose of this exercise is to demonstrate, with specific examples, the characteristics of high quality performance criteria if the purpose is to maximize classroom instructional usefulness. At the end of the activity we want participants to feel confident that they would know how to reply to a colleague who comes up to them and asks, "I'm thinking about using this set of performance criteria in my classroom. What do you think about them?"

The activity also models the process for teaching criteria to students. When teachers teach criteria they need to define each trait, give some examples, have students practice identifying strong and weak examples of products/performances using the criteria, and have students practice making weak products/performances stronger. While students might be working on, say, criteria for quality oral presentations, participants in this professional development activity are working on criteria for criteria. Thus, in this activity, we define criteria for criteria and identify strong and weak examples of criteria. You can even make this into a little assessment joke because the criteria for criteria have to adhere to their own standards—content/coverage, clarity, and generality.

Materials:

For both options:

Time Required:

60-90 minutes

Facilitator's Notes:

A. (2 minutes) Introduction. Use Overhead A3.3,O1Performance Criteria Keys Purposes to introduce the activity. The facilitator's notes include two running examples—math and writing. You will likely only be using one of the two in any single setting. Directions that relate to both activities are not boxed. Boxed instructions mean that you need to choose either the math or writing version.

B. (20 minutes) Key 1: Coverage

  1. The first "key to success" (see Overhead A3.3,O2Performance Criteria, Keys to Success) for performance criteria is: Don't leave out anything of importance. Describe why it is essential to cover everything of importance and give examples. You might say something like: Well-defined performance criteria are a statement of what is valued in student work. They communicate to us, as teachers, and students what to do and what is important to attend to. If important dimensions of performance are left out, it is likely that they won't be emphasized in instruction or student work because they won't be assessed. An example in writing is a state that left the trait of "voice" out of their rubric because they felt it would be too difficult to get good rater agreement. The result—no systematic attention to voice in writing. Another example is in music. What if you were a fiddle player and bowing wasn't rated as part of performance? What message does that communicate? Or, if teachers were held accountable for student performance, what effect might leaving bowing out have on instruction? A third example might be the Olympics. What if entry into the water was not taken into account in diving? Instead, type of bathing suit is scored. What message does this send about what is important? How would it affect coaching?

    Ask participants to think of examples from their own experience about how criteria communicate what counts. Think: driver's test, Olympics, what usually gets rewarded in student writing, etc.

  2. Choose either math or writing.

    Math: Show the strong student performance (A3.3,O3Number of Teeth). Ask, How well did this student communicate what he or she did? Most teachers agree that the student did a good job on this trait. Now put up the Skimpy Criteria (A3.3,O5) and point out the section on communication. Then cover up the trait of communication and ask, What message would be sent if communication was not scored? How might instruction be affected? How might student work be affected?

    Writing: Show the strong student performance (A3.3,O9—Fox) and give participants the Skimpy Writing Criteria (A3.3,10). Ask, How well did this student use detail to add clarity? Most teachers agree that the student did a good job on this trait. Then cover up the trait of detail and ask, What message would be sent if detail was not scored? How might instruction be affected? How might student work be affected?

  3. Besides covering all those aspects of a performance that are truly aspects of a strong performance, it is equally important to leave out those aspects of performance that do not truly distinguish quality. A great example is the "two minute speech." The son of a friend had to make an oral presentation as part of a seventh grade culminating project for the semester. The speech had to be between two and three minutes long. He practiced at home and was timed at two minutes 15 seconds. When he gave his oral presentation at school he was a little nervous and was two seconds under two minutes. His grade went from an 'A' to a 'C.' What message does this send to students about what is important?

  4. In order for participants to really understand the importance of good content coverage, it is useful to practice finding and correcting the "holes" in criteria/rubrics. Overhead A3.3,O13—Speaking Criteria to Critique can be used for this. Ask participants to work in groups to identify important aspects of performance that have been left out or not covered completely.

    (What other groups have said: There should be a trait for "content." Under delivery one might add hand gestures and posture. Students also need to understand that oral presentation is organized and presented different than written presentation—sentences are shorter, there is more repetition, and fewer points can be made in the same amount of time. Where should multi-media and visuals go? Where should personal style go?)

    Now rate A3.3,O13—Speaking Criteria to Critique on a scale of 1-5 where 1 denotes "lots of holes; coverage is weak" and 5 denotes "not many holes; coverage is good. (These speaking criteria usually get a 3-4 rating on the trait of coverage.)

  5. A3.3,O16—Informal Writing Inventory is usually rated very weak on the trait of coverage, so it provides a good example to anchor the low end of the scale. The Informal Writing Inventory (IWI) was developed to screen students into special education classes. The purpose is to identify whether there is a writing disability. Students are shown a picture and write about what is happening in the picture. This is a nice writing task; pictures are frequently used to prompt primary writing. The criteria, however, are these: First one counts up the number of grammatical errors. Then one counts up the number of fatal grammatical errors—those that block one's ability to understand what the writer means. Then one forms two ratios—the error index which is the ratio of erroneous writing to well-formed writing; and the communication index, which is the percentage of total errors that disrupt communication.

    Participants usually feel that counting grammatical errors is not a very good criterion for good writing. It leaves out a lot that is important. Therefore, the coverage is rated very low. (It anchors the "1" end of the scale.)

  6. Be sure to emphasize the main point here—teachers teach to the criteria and students learn to the criteria. Therefore, criteria/rubrics must cover that which most distinguishes quality, and they must leave out those things which are not really indicators of quality. (If things like following directions, having a date on the paper, etc. are important outcomes, score them separately. Don’t mix "following directions" with the quality of the student response.)

  7. If you would like to give participants more practice on critiquing performance criteria for "contact/coverage," see the Activity 3.3 Index part of Appendix ASampler.

C. (20 minutes) Key 2: Detail

  1. The second key to success (A3.3,O2) is having criteria that are clear, detailed, and illustrated with samples of student work.

    Math: Refer to the Skimpy Criteria (A3.3,O5). Have participants score the strong and/or weak student performance (A3.3,O3 or A3.3,O4) on the trait of communication. Chances are that rater agreement will be pretty good. Now, pose the dilemma: What if you give feedback to a student using this scale and the student says, "I know I communicated well because my score is high, but I don't know why. Why did I get a good score on communication?"

    Writing: Refer to the Skimpy Criteria (A3.3,O10). Ask participants to score the strong and/or weak student performances (A3.3,O8 orA3.3,O9) on the trait of "personal expression." Chances are that rater agreement will be pretty good. (Fox is very high and Redwoods is very low.) Now pose the dilemma: What if you give feedback to a student using this scale and the student says, "I know I exhibited a sense of personal expression because my score is high, but I don't know why. Why did I get a good score on personal expression?

    The point to make is that students have a harder time generating another strong response if they don't know what made the previous response strong. Or if the previous response is weak, they have a harder time making the next one better if they don't know what made the current one weak. The point of having clarity and detail in criteria is not necessarily to ensure that two different raters would give the student work the same score (rater agreement) although detail will help here too; rather, detail helps to clearly communicate to students what they need to do to produce quality work.

    Also note that criteria can have good content/coverage and still be weak in clarity. For example, either of the Skimpy Criteria covers the important dimensions of performance; it is just not clear what many of the terms mean, or what the difference is between, for example, a '1' and a '2.'

  2. Ask participants to rate Skimpy Criteria (A3.3,O10 or A3.3O5) for clarity on a scale of one to five where '1' is low and '5' is high. Remind them to base their rating on two questions: (a) are these clear enough that two different raters are likely to give the same score to the same piece of student work; and (b) would these criteria communicate to a student who is not doing well what to do differently next time? (Both Skimpy Criteria examples usually receive a low score for clarity.) Ask participants to outline what they would do to make the criteria clearer. (Responses generally include: add definitions, describe indicators of 1-5 performance; find work samples that illustrate what you mean.)

    Now ask participants to rate the detailed analytical trait criteria selected from the Appendix A—Sampler on the trait of 'clarity' where '1' is low and '5' is high. (These rubrics get a higher score, usually around '4' for math and '5' for writing) Ask participants to outline what they would do to make the criteria even clearer.

  3. Make sure that participants understand the relationship of purpose to design. If the purpose of the criteria is use as an instructional tool in the classroom, then adequate detail is of paramount importance. If the purpose is merely to put numbers on student work for large-scale assessment, perhaps not as much detail is important.

  4. Ask participants to find another example of skimpy criteria in the Sampler (Appendix A) and an example of good detail and moderate detail. (See the Appendix ASampler Activity 3.3 Index for suggestions.)

D. (20 minutes) Key 3: General

  1. The third key to success (A3.3,O2) is using generalized performance criteria. Define "generalized" and "task specific" and discuss examples. You could say: "Generalized" means that the criteria can be used to rate a whole class or category of performances and not just the work from a specific task. Task-specific criteria can only be used to rate performance on a single task. For example, in the Olympics the same criteria are used to rate all dives—difficulty, form, entry into the water, etc.; there are not separate criteria for each type of dive. In writing assessment, criteria are often very general—some places use the same criteria to rate all kinds of writing; others have separate rubrics for persuasive, narrative, and expository writing. But, very few are task-specific—what sense would it make to have a separate rubric for each prompt?

    Math: Illustrate task-specific criteria with A3.3,O6—Task Specific Criteria, Version 1: Name the Graph, and A3.3,O7—Task Specific Criteria, Version 2: Name the Graph. Ask participants whether they could use these criteria to judge student performance on another task (take any other math task from the Sampler. (The answer is "no." This is the essence of task-specific scoring; it applies only to a single task.) Then ask whether the general criteria (selected from the Appendix A—Sampler for this purpose) could be used to judge student performance on Name the Graph. (The answer is "yes." This is the essence of generalized criteria; they apply across tasks.)

    Writing: Illustrate task specific criteria with A3.3,O12—Task Specific Writing Criteria. Ask participants whether they could use these criteria to judge student performance on another task (say, "write a persuasive essay on school uniforms"). (The answer is "no." This is the essence of task-specific scoring; it applies only to a single task.) Then ask whether the general criteria (selected from the Appendix A—Sampler for this purpose) could be used to judge student performance on Describe a Favorite Place. (The answer is "yes." This is the essence of generalized criteria; they apply across tasks.)

  2. To make sure participants know the difference between task specific and general criteria, ask them to find examples in the Appendix A—Sampler. (The General Index lists which samples illustrate the two kinds of criteria.)

  3. Now discuss the relative advantages and disadvantages of these types of criteria. (Activity 1.9Going to School also includes such a discussion.) As a guide to or summary of the discussion use A3.3,O14—Advantages and Disadvantages of Task Specific and General Criteria.

    Advantages of task-specific criteria:

    a. Quicker to train raters.

    b. High rater agreement right from the start.

    c. Therefore, often used in large-scale assessment.

    Disadvantages of task-specific criteria:

    a. Have to develop new criteria for each task.

    b. Makes no sense to show them to students ahead of time because they "give away" the answer.

    c. Can't use them for judging the work in portfolios because the content of each student's portfolio can be very different. (This is especially true at the large-scale level.)

    d. What happens if a student comes up with a perfectly reasonable strategy or solution, but it isn't one included in the task-specific criteria? This happens frequently in large-scale assessment when raters are going fast and not really thinking about what they are doing. In fact a personal communication from a major test publisher indicated that this was a major reason they were moving away from task-specific scoring.

    e. Related to (d) above—task-specific criteria do not make the rater think, the thinking has already been done for the rater. The developer of the rubric already thought through, for example, how good problem solving would look on this problem. Therefore, the rater doesn't have to think. Likewise such criteria don't make students think. If a major reason for developing criteria is as a tool for learning in the classroom (e.g., students learn standards of quality for work), then generalized criteria do a better job because we want students to think—to be able to generalize what they learned on one task to make performance on the next task better.

    (A sample script for the last point might be: The value of generalized criteria is that they help us to define what "good" looks like so that we can begin to generalize across performances and bring information from past experiences to bear on new experiences. This is especially important for the "hard to define" or "disagreement on what this means" goals such as critical thinking, problem solving, collaborative working, communication skills, etc. If people already agree on the definitions, then you can cut corners. But, people often don't agree on what some skills mean and what it looks like when students do "it" well. We can't hold students accountable for different visions of the same target. It is our moral obligation to define precisely what we mean.)

    Advantages of generalized criteria:

    a. They help students generalize what quality looks like from one task to the next.

    b. It is very beneficial to ask students to help develop them.

    Disadvantages of generalized criteria:

    a. They take longer to learn (because the rater has to learn how to apply them to a variety of tasks).

    b. Rater agreement, at least at first, is lower.

  4. We don't necessarily mean to imply that task-specific criteria should never be used. Remember, rubric design relates to purpose—what they will be used for. It might be, for example, that task-specific criteria are better used when one wants to determine the student's conceptual understanding of specific aspects of a task or if you want the student to use specific equations or come up with specific substeps or final answers. Generalized criteria might be best for "big" outcomes such as problem solving, communication in math, group collaboration, and critical thinking—process skills that should be generalized across tasks. They are also, obviously, necessary when you will be looking at work generated by students on different tasks as, for example, in portfolios.

    A compromise might be to "mix and match" generalized criteria and task-specific criteria. For example, one might, on a task, overlay generalized criteria for group collaboration and critical thinking, and also have task-specific criteria for specific substeps. Another compromise might be to have students consciously develop task-specific criteria from a generalized rubric. That way they practice the ability to generalize.

E.(20 minutes) Key 4: Analytical Trait Criteria

  1. We hesitated before putting this on this list of quality criteria because choice of analytical trait or holistic really depends on purpose. For some uses, such as large-scale summative assessment, where speed of scoring is an issue, holistic scoring may be a fine choice. However, for classroom instructional uses, analytical trait criteria are more useful. Be sure participants are clear about this. We don't want to create analytical trait zealots that write nasty letters to their state departments of education for using holistic rubrics. So...

    If the purpose for criteria is using them with students to promote learning, the fourth key to success (A3.3,O2) is analytical trait rather than holistic criteria. Hopefully you have already done activities that illustrate the difference and why it is important (such as Activity 2.1—Sorting Student Work). In a pinch, you might use the following script. (But, telling is never as effective as showing.) You might say, Analytical trait means that more than one dimension of performance is assessed. Holistic means that you make one overall judgment of the quality of the response.

  2. Have participants score the student performances with the analytical trait and holistic rubrics selected from the Appendix A—Sampler for this activity. (Suggestions are provided in the Activity 3.3 part of the Sampler Index. A sample holistic writing rubric is attached as Handout A3.3,O11Sample Holistic Scoring Criteria.) Discuss advantages and disadvantages of each method (see A3.3,O15—Advantages and Disadvantages of Holistic and Analytical Trait Criteria).

    The bottom line is that analytical trait systems are not worth the effort in the classroom if all they are to be used for is putting grades on student papers. If, however, they are used as an instructional methodology—to focus instruction, communicate with students, allow for student self-evaluation, and direct instruction on traits—they are very powerful. In short, purpose dictates design.

  3. To consolidate the notion of holistic and analytical trait criteria, ask participants to browse through the Appendix A—Sampler and find examples of each type. (The general index to Appendix A will help you locate relevant samples.)

F. (10 minutes) Wrap-Up. Make a game out of repeating the mantra for quality criteria—coverage, clarity, generality (and maybe analytical trait if the purpose is instructional). Ask participants to repeat this mantra to each other.

Ask participants to explain to each other why these features of criteria are important for the purpose of classroom use in instruction. (And, if the purpose were different, say, large-scale screening of students, what rubric features would be important.)

Then ask participants to write in their own words what it means to have quality criteria. If a colleague were to come up to them and ask, "I'm thinking about using these criteria in my classroom, what do you think?" what would they look for?