Dave
Treder, Assessment Consultant at Genesee ISD, put
this together to help people understand the large changes schools and districts
experienced in their writing results this year.
A few points on the “extreme” between-year swings noticed by
districts and schools on the MEAP writing test:
The change in the percent of
students passing across years is much larger for writing than the other
subjects.
The
average change in a school’s percent proficient, from 2002 to 2004, in writing,
is 36% larger than math, 54% larger than reading, and 66% larger than both
science and social studies. (See graph on page 3)
The year-to-year swings in grade 5
MEAP writing test are extreme.
The
5th grade MEAP writing test had characteristics similar to the current 4th
grade MEAP Writing Test. The average
school change, between 2003 and 2004, for the 4th grade writing test, was 12%
(factoring out the “+” or “–“ in the State’s change). The across-year change on
5th writing MEAP writing test was 11%, 12%, 7%, and 5%. The graph on page 4 shows the extent of the
extreme swings.
Results from raters indicate
difficulty in differentiating between “2’s” and “3’s” – which approximates the
point where the Cut Score is located.
The
graph on page 5 indicates that the scores most often assigned by raters are 2s
and 3s. Further, more than 25% of the
time, raters disagree on whether a paper deserves a 2 or a 3 (1163 out of 4302
papers). And this is the point where the Pass-Fail decision is made -- two
‘3’s’ will award a student a passing grade, while a 2 and a 3 will not.
The use of a “cut score”
exaggerates the difference between schools.
This also heightens the change in schools’ scores across years.
As
noted on the graph on page 6, the vast majority of districts received about
40-50% of the possible writing points (a 10 percentage point range),
while the percent passing ranged from 30% to 66% for these districts (a
36 percentage point range).
The MEAP writing assessment
consists of a single prompt, which is not sufficient for determining a
student’s writing ability or a school’s ability to effectively instruct
writing.
Students'
knowledge/interest will vary from prompt to prompt. Providing only one prompt leaves much to
chance, as to whether the score a student receives actually represents the
student’s true ability. As noted by Richard Shavelson, Dean of the School of
Education at Stanford University, "You probably need between six and ten
tasks to get a reliable measure of performance." (Quoted in Education
Week, May 17, 2000.)
While the raters are appropriately
qualified and trained, and the scoring process is well validated and properly
monitored, inter-rater issues and the scoring process will invariably add unreliability
to individual and school-level scores.
The company responsible for
hiring and training raters and conducting the scoring does 1) utilize college
graduate educators; 2) conduct rigorous and well-validated training; and 3)
periodically check raters to ensure a continued connection to the rubric.
However, without getting into
it too deeply, the method utilized by the MEAP office to report inter-rater
agreement – the “percent of ratings within one” – is not considered an
acceptable practice. Rather, a Kappa
statistic or an Intra-class correlation coefficient would be more
appropriate. For both of these, the
results (.43 and .81, respectively, for
2004 4th grade Writing) would be considered marginally reliable, at best. * And the
inter-rater agreement issue is exacerbated by the issue discussed above, i.e.,
the point where raters show the most difficulty in differentiating between
papers – between a score of 2 and 3 – is the point where pass and fail is
differentiated.
Further,
all the papers for a teacher group are rated by the same raters to make it
easier to identify “irregular administration practices.” It is not possible to actually compute the
additional “error” that this will add to the scores because of differences in
raters – but it seems relatively sure that the error would be larger than if
the papers were randomly assigned to raters.
*
The apparently disparate facts, that 1) raters are
sufficiently skilled and trained, and 2) inter-rater agreement is only
marginal, result from, I would surmise, the fine distinctions that raters are
required to make (see the Table on page 7, which delineates the differences
teachers must make, between a 2 and a 3 paper: differentiating between
vocabulary that is limited vs. basic, and writing that is only
occasionally clear and focused vs. somewhat clear and focused.)
Average
School Change, 2003 to 2004 -- w/State
Change Factored Out
(All
Michigan Schools w/ 10 or more students taking the test)
Percent
Subject Change
Distribution
of Ratings Received By Students (GISD)
Gr. 4 MEAP
Writing - 2004
“Cut” Score
Rater’s
Scores
Scale
Scores 440
455 470 485 500
515 530 545 560 575 590
605 620
Grade 5
Writing, Percent “Proficient” 1998-2002
(Every
10th school in Genesee ISD)
%
Proficient
1 2 3
4 5 6 7 8 9 10 11 12 13
SCHOOL
“NUMBER”
District
Mean Percent Correct & Percent
“Passing”
Percent “Passing” MEAP Writing Test Average Percent Correct
(out of 12 possible points) on the MEAP Writing Test
Gr. 4 MEAP Writing, 2004
MEAP
Writing Rubric
Differentiating
Between a ‘2’ and a ‘3’ Paper
What
classifies a paper as a ‘2’ What
classifies a paper as a ‘3’
2. The
writing is only occasionally clear and focused. Ideas and content are
underdeveloped. There may be little evidence of organizational structure.
Vocabulary may be limited. Limited control over writing conventions may make
the writing difficult to understand. |
3. The
writing is somewhat clear and focused. Ideas and content are developed with
limited or partially successful use of examples and details. There may be
evidence of an organizational structure, but it may be artificial or
ineffective. Incomplete mastery over writing conventions and language use may
interfere with meaning some of the time. Vocabulary may be basic. |
Highlighting
the concepts/skills that are evaluated, and
the
degree to which they need to be evident/accomplished
The
writing is only occasionally clear and focused. Ideas and content are underdeveloped.
There may be little
evidence of organizational structure. Limited control
over writing
conventions may make the writing difficult to understand. Vocabulary may
be limited. |
The
writing is somewhat clear and focused. Ideas and
content are developed
with limited or partially successful use of examples and details. There may be evidence
of an organizational structure, but it may be artificial or
ineffective. Incomplete
mastery over writing
conventions and language use may interfere with meaning some of the time.
Vocabulary
may be basic. |