Wednesday, June 20, 2012

The Judgment of Writing

For the CUNY Assessment Test in Writing (CATW) a scoring rubric breaks analysis of student writings (responding to prompts that include short texts) into five categories. These are:
  1. Critical Response to the Writing Task and the Text;
  2. Development of Writer's Ideas;
  3. Structure of the Response;
  4. Language Use: Sentences and Word Choice;
  5. Language Use: Grammar, Usage, Mechanics.
Each of these is scored on a scale of 1-6, with the scores of the first three categories doubled for the final result.

At the start of scoring sessions, a round of "norming" is standard procedure. This brings scorers together, making sure that each understands the ratings in approximately the same way so that deviation in scores will be at a minimum.

From a listserv I subscribe to, I find that the norming concept is now being taken a step further (though not with CATW but with another scoring rubric). Setting up a bar graph of the results in each of four (rather than five) categories, the researcher hopes to establish a standard grade for the putative essay associated with each particular configuration, doing so through a norming process called "Social Judgment Analysis" (SJA), a process that was developed in the 1970s for policy-conflict resolution and that centers around the work of Kenneth R. Hammond.

In Leonard Adelman, Thomas R. Stewart, and Hammond's “Application of Social Judgment Theory to Policy Formulation” (Policy Sciences 6 1975, 137-159), the authors state that:
social judgment theorists have developed computer graphics technology as a means of resolving policy differences. Such devices can provide (1) immediate statistical analysis of the judgment process, in terms of weights, function forms, and consistency…, and (2) immediate pictorial description of these parameters. (141)
the primary advantage of the present computer graphics system to policy-makers is that it makes explicit, both statistically and pictorially, where agreement and disagreement lie; or in other words, the cognitive differences that result in disagreement. In short, it serves a clarifying function. (142)
They go on to describe how they put what would come to be SJA to use:
Since the participants had different policies concerning the relative importance of the various functions…, the first step in the study was to describe each participant’s policy, in terms of weights, function forms, and consistency, i.e., do policy-capturing. Such action would permit 91) the pictorial representation of the participants’ policies and thereby aid them in understanding their similarities and differences, and (2) the groups or clustering of the participants in terms of the homogeneity of their individual policies. (147)
Their conclusions are:
specifically, (a) social judgment theory asserts that policy quarrels are often cognitive in origin…. The theory also asserts that (b) computer graphics technology makes explicit, both statistically and pictorially, the cognitive differences that result in disagreement, and (c) that such clarification should result in the understanding and subsequent resolution of such differences. (156)
Applying this as a norming process, rather than as one for problem solving, can have, the researcher implies, certain benefits. An "expert" reader of the bar graphs developed through the process can learn a great deal about the writing and even the writer--and about the opinions of the scorers (which can then be discussed and even adjusted through the analysis). If it were applied to CATW (it won't be: it's not needed there... I just show that as an example of a rubric and so that I can talk about norming), the "expert" might even be able to tell quickly where to place a student on a needs spectrum.

Before I go further, I should say that I have no problem with SJA. In fact, it relates to Robert Leston's excellent chapter "Smart Mobs or Mobs Rule?" in our book Beyond the Blogosphere: Information and Its Children. The problem is that this (as norming does in general, but to a much greater and much more troubling extent) can strip writing of its primary purpose, communication (or effectiveness) and writing evaluation of its necessary close link to the act of communication itself. If used as a structure for evaluating writing, it will encourage the type of writing that George Orwell warned against just after the end of World War II in "Politics and the English Language," rewarding use of 'dying metaphors,' 'operators or false verbal limbs,' 'pretentious diction,' and 'meaningless words.' Though proponents might try to argue differently, an SJA norming must completely ignore the rules for writing Orwell offers:
(i) Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
(ii) Never use a long word where a short one will do.
(iii) If it is possible to cut a word out, always cut it out.
(iv) Never use the passive where you can use the active.
(v) Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
(vi) Break any of these rules sooner than say anything outright barbarous.
It's almost impossible for any norming process to take these rules into account, for none of them is quantifiable.

Any scoring rubric is problematic, anyway: reducing writing to numbers always removes it from the dynamic of communication. Though the rubric is created in order to remove the subjective element, it also removes the written document from the stimulus/response/reinforcement active paradigm that is real communication--and makes possible the 'barbarisms' Orwell details. Any rubric removes the writing under consideration one step from what should be the real purpose of evaluation, consideration of the effectiveness of the writing. Adding an additional step of creating a visual representation (the bar graph) further removes evaluation from the writing task itself.

A rubric is an artificial device for a specific purpose and should not be considered beyond that purpose. I suspect the researcher knows that and doesn't want to use it but for one specific purpose. The problem is, it probably will be. In CATW, the rubric is used with the exam for placement in First Year Composition. The developers of the exam and the rubric understand that there is limited value in what they have constructed and make no claims for it beyond the single purpose. Even that is troubling, for it does not stop others from imagining that writing can be assessed comprehensively and effectively through rubrics, leading to arguments that 'machine grading' of essays can be effective. Yet that doesn't even work in situations like the CATW: there are just too many judgment calls that a machine can't make. But there are plenty of people who want to believe that it can.

The researcher in question uses SJA to explore the differing attitudes towards written work within different teaching populations by asking teachers to match bar-graph instances with specific letter grades. The idea is to then see what the differences are in the answers and to use that to better understand the needs perceived in the different environments.

In addition to my concern that this can become an assessment tool, especially for 'machine grading,' I worry that the information the researcher will gather is flawed, and for one quite particular and specific reasons: Writing teachers like me cannot complete the survey, so the information will be lopsided, at best. And necessarily incomplete.

When I tried to complete the survey, I was faced with a series of bar graphs of four bars each, the bars representing different areas of judgment of writing on a scale of 1 to 4 (similar enough to the CATW rubric for me to understand what was being done). I had to match each configuration to a letter grade.

As I started, I felt a strange sensation, almost a physical vertigo. Something was wrong. As I tried to make a selection, I realized I could not; I felt as though there were a wall between me and what I was trying to evaluate.

Upon reflection, I realized that, indeed, there was: I was being asked to evaluate writing without the ability to see the writing. And I could not do it, not even in the most abstract fashion. I could not withdraw myself from the focus on actual communication that is at the heart of my teaching. What I was being asked to do had nothing to do with actual writing, but it claimed it had. The cognitive dissonance was so great that it paralyzed me. As the researcher said, it would be too much work to actually ask people to read all of the essays. This is more efficient. But I could not do it.

Maybe it is more efficient. But it is not evaluation of writing. Even by participating in such a survey, I would be tacitly agreeing that writing can be evaluated, to some degree at least, through graphic representation of previous evaluations of defined parts of the physical artifact. By participating, I would be abetting those who would like to see essay assessment done by machines, who believe that writing can be stripped of all of its dynamism and reduced to formalized squiggles alone, leaving communication completely out of the picture. I could not.

We are at a dangerous crossroads, already turning towards unwarranted reliance on assessment tools rather than on teachers and readers. For a nation that claims to have faith in the individual, this is peculiar--but it is also changing learning into the mastering of form. Real content gets stripped away, as it does in this particular usage of SJA (it doesn't in others, where the content remains central because the scorers are also the contributors). We cannot afford more of that, so should be extremely careful about feeding our mania for data, no matter how well intended.

No comments: