TAKE OUT YOUR NO. 2 PENCILS; STANDARDIZED TESTS LIKE THE STANFORD 9 ARE
THE RAGE IN PUBLIC EDUCATION. BUT DO THEY REALLY MEASURE ACADEMIC ACHIEVEMENT,
OR SIMPLY REFLECT COLLECTIVE ANXIETY ABOUT OUR SCHOOLS?
JAY MATHEWS
Sunday, November 8, 1998 ; Page W11
When the long envelope arrived from her son Darius's elementary school
last fall, Charlette Hedgman reacted slowly and carefully. She knew the
thickly numbered columns on the two enclosed sheets were bad news, but
she did not raise her voice with Darius or threaten punishment. That was
not the way she raised her son. Hiding her disappointment, she sat and
thought about what she wanted to say.
On the double-sided sheets, in beige-colored boxes with a light
blue border, were the results of the Stanford Achievement Test Series,
Ninth Edition, for Darius Q. Leggette, student No. 8276447, age 8 yrs.,
01 mos. The sweet-tempered boy with the incandescent grin was beginning
second grade at the Ketcham Elementary School, a ramshackle assemblage
of brick buildings at the foot of 15th Street SE in Anacostia. The report
listed 133 indicators, many as incomprehensible as a corporate tax return,
but Hedgman was a classroom aide at another District elementary school.
She knew what this meant.
In all six areas of performance -- total reading, vocabulary,
reading comprehension, total mathematics, problem solving and procedures
-- Darius had received the lowest rating: "below basic." The sheets of
paper told her this categorization "indicates little or no mastery of fundamental
knowledge and skills." Hedgman didn't know it at the time, but almost half
of Darius's classmates at Ketcham Elementary scored equally poorly on the
reading portion of the test.
Charlette Hedgman did not think Darius had too little intelligence.
She thought he had too much energy and impatience. He tried to do everything
hastily -- he was the kind of child who liked to stampede through the living
room and leap down the six steps in front of their brick town house. In
1996 he had dashed into the middle of Good Hope Road, two blocks from the
house, directly in front of a big burgundy sedan. He spent a month at Children's
Hospital. She never wanted to go through that again.
She adjusted the papers and called Darius over to sit with her
on the Queen Anne couch with the flowered upholstery. The muscles of his
small face tightened. The couch meant serious business. She showed him
the report.
"This is how you did," she said. She pulled him close. "You could
do better, but I'm not mad. I think we're just going to have to work a
little harder."
Darius peered at the two sheets of paper. There were 27 separate
content categories. He was below average in 21. Hedgman had to squint to
read the fine print on the back. It said that RS/NP/NA meant Raw Score/Number
Possible/Number Attempted. Darius had worked hard at the test, answering
nearly every question, even if many of his answers were wrong. His energy
and time had finally run out on the "Word Reading" section, where he had
answered only 11 of the 30 questions.
The report gave no encouragement, not even the usual bureaucratic
concession that one test could not define a child. Seven thick bars, crawling
like black worms across the page, said Darius was no higher than the 24th
percentile on any indicator. At least 75 percent of the national sample
of second graders were ahead of him. On Word Reading he was at the absolute
bottom, the 1st percentile, all the rest of the world looking down on him.
Hedgman could see her son was shaken. She tried to be positive.
She told him he could do better on the test; it would just take some time
and effort, and a few new rules. She began to formulate a plan.
All over the District -- and, indeed, the nation -- households were
receiving similar packets. In the last two decades standardized tests have
become one of the most powerful forces in American education. Psychometrics,
the science of measuring the mind, pervades the academic world. Few districts
dare to conclude a school year without requiring students to spend several
hours filling in circles and rectangles on computerized scoring sheets
with No. 2 pencils. The companies that market such tests are approaching
$200 million in annual sales, evidence of a boom that by all indications
shows no sign of abating.
The tests have become a universal measure of success in the world
of public education. Principals and teachers are given bonuses or fired.
Students are promoted or forced to repeat grades, placed in programs for
the gifted or dispatched to special ed, and some are denied graduation.
All occur because of what the indicators and the bar graphs and the pie
charts reveal.
In this results-oriented, parent-sensitive environment, no experiment
in teaching children can thrive without a test that says it is working.
There does not have to be any proven connection between a rise in test
scores and the program in question. But the advocates of the new idea have
to be able to point to some upward trend in numbers, or they cannot get
very far.
Most states have at least one testing program in place, and many
have several. School districts in South Carolina can be declared educationally
bankrupt and subject to state takeover if their test scores fall below
a certain level. Districts in Michigan can lose their accreditation for
the same reason.
California administered the largest standardized testing program
in the country last spring to more than 4 million students at a cost of
$35 million and posted school-by-school results on the World Wide Web.
Several districts sued unsuccessfully to prevent the release of scores
for limited-English students. The state Department of Education has proposed
paying cash awards of 5 to 10 percent of teachers' annual salaries to schools
with high or much improved scores. Schools that fail to do well could have
their entire staffs transferred or face closure.
Standardized tests have long been a fact of life in public education.
For half a century, according to educational historian Sherman Dorn at
the University of South Florida, achievement tests were administered and
scored, their results confined largely to the internal consumption of teachers,
parents and children. What is different now is that the tests have become
a public measuring and punishing stick. Parents and voters want the results
disclosed and are quick to seek comparisons between various schools and
districts. The media have jumped in, treating test results as a major news
story.
"In less than 25 years, statistical accountability has become
so ubiquitous that it appears inevitable," says Dorn, who calls the change
"both breathtaking and alarming."
What bothers Dorn and other critics is that tests have gone from
being one possible measure of academic achievement to being the only measure
that really counts. The pressures on school officials, teachers and students
to do well on the test have increased dramatically. This has led some schools
to emphasize practicing for the tests over real learning, critics argue.
Some school officials seem to have gone even further, searching for ways
to beat the system by selectively choosing who takes the test and who doesn't,
in order to raise their school's average scores.
In D.C., former school chief executive Julius W. Becton Jr. and
his successor, superintendent Arlene Ackerman, have decreed that standardized
tests will be the ultimate criteria for Darius Leggette and 73,000 other
public schoolchildren. Principals who do not raise their schools' scores
significantly have been told they may lose their jobs. Students who do
not improve have been required to attend summer school. Children with scores
as low as Darius's face the prospect of having to repeat the same grade.
These officials are responding to what they see as a crisis of
confidence in the District's education system. Overcrowded classrooms,
poorly trained teachers, decaying buildings and unprepared and uninspired
students all have contributed to the city's educational meltdown. Its standardized
test scores are among the worst in the country, its dropout rate among
the highest. On the average, every year that a student spends in the system,
his or her scores go down. About 70 percent of high school juniors test
below grade level.
To measure how their students were really doing, District officials
decided they had neither the time nor money to develop a sophisticated,
performance- and curriculum-based test like those that have come into use
in Maryland and Virginia. Instead, they turned to one of the most popular,
widely used and economical standardized tests in the nation.
The Stanford Achievement Test Series, Ninth Edition -- known far and
wide as the Stanford 9 or SAT9 -- was designed 1,600 miles from the District
in a sprawling office complex in the San Antonio residential neighborhood
of Collins Gardens, using many buildings that had once been Levi Strauss
warehouses. This is the headquarters of Harcourt Brace Educational Measurement
Inc., the company that writes and markets the Stanford 9. HBEM's test --
considered the first standardized achievement test in America -- dates
back three-quarters of a century. The SAT9 -- which has no affiliation
with the Educational Testing Service's SAT given to college-bound high
school students -- has the respect of many educators, the backing of a
major textbook publisher, and a marketing philosophy that seeks to capitalize
on its long history.
Some of that history is less than distinguished. The co-founder
of the SAT9 series, Stanford University psychologist Lewis Terman, began
his celebrated half-century of work on the mysteries of intelligence with
some of the most racist and elitist speculations ever found in 20th-century
academia. Harvard University science historian and paleontologist Stephen
Jay Gould, in his 1981 book The Mismeasure of Man, unearthed malodorous
chunks of early Terman, such as his admiration for the intellectual superiority
of Anglo-Saxons.
In 1916 in his seminal book, The Measurement of Intelligence,
Terman bemoaned the ignorance of a conscientious mother of a child who
had just scored 75 on the professor's rudimentary IQ test. "Strange to
say, the mother is encouraged and hopeful because she sees that her boy
is learning to read," Terman wrote. His own conclusion: The boy "is feeble-minded;
he will never complete the grammar school; he will never be an efficient
worker or a responsible citizen." Similar tests assigned most immigrant
Jews of that era to the feeble-minded category.
Terman later disowned his early fascination with race-based intelligence,
and by 1937 had acknowledged that how a child was raised had a significant
impact on how well he or she did on the tests. Terman did not quite know
what it was he was measuring, but he continued to refine the yardstick.
The Stanford-Binet intelligence test that he co-authored has lived on,
a controversial device still used in assigning children to gifted classes,
even though critics contend it is culturally biased and underestimates
the intelligence of ethnic minorities.
The achievement tests developed by Terman and others at Stanford
have a somewhat different goal -- to measure how much children absorb of
what school teaches them, and to help educators see how each child measures
up against the American average.
Constructing tests such as the Stanford 9 is like erecting a high
wire in such a way that when children try to walk across it, half fall
on one side and half on the other. The test-makers cull textbooks and nationally
circulated lesson plans for questions that reflect what is most widely
taught in American schools, giving their product what specialists call
"content validity."
For the test to give parents and teachers an idea of where each
student fits on the national spectrum, the test-makers have to choose questions
that, in the jargon of psychometrics, "behave properly." The best questions
are those that half of the students miss. Some questions can be easier
and some harder, but psychometricians try to avoid those that 90 percent
of the students get wrong or right.
If a disproportionate number of people with low total scores get
a given question right, or a disproportionate number with high scores get
it wrong, that also is a problem. It might mean that one of the "distracters,"
the psychometric name for the wrong answers to a multiple-choice question,
is causing too many high scorers to make a mistake or giving too many low
scorers too easy a route to the correct answer. The distracters are there
to trick the uncertain student into getting the question wrong, otherwise
the scores will not break down into the bell-shaped curve that reflects
most students scoring in the middle, with a few at the high and low ends.
The troubling aspects of this practice hit Fairfax-based educational
consultant Gerald W. Bracey when he was helping teach a freshman psychology
course at Stanford. Bracey, who went on to become chief assessment officer
for Virginia's public schools and is author of Put to the Test, a consumer's
guide to testing, was ordered to grade on the curve at Stanford so that
15 percent received A's and 15 percent D's or F's. His students were among
the brightest in the country. Many of them had never seen anything less
than an A on their high school report cards. Instead of testing them on
the course's main points, Bracey had to trick them into committing errors
by formulating improbably subtle questions or making references to obscure
footnotes.
To achieve their own bell curve, the makers of the Stanford 9
have to build in the same kind of tricks. They justify these devices as
part of the educational process. A student who truly knows the subject
matter, they argue, will step lightly around the traps.
Bracey contends that the search for questions that address only
lessons taught in the majority of school districts has an important drawback:
It has inhibited educators from trying innovations that don't show an immediate
payoff in higher test scores or that might put their young test-takers
at a disadvantage. The Coalition of Essential Schools, an effort to revive
secondary schools through deeper courses and less test-conscious assessments,
has won enthusiastic reviews from many parents, students and educators.
But some districts have lost interest in the coalition's program, in part
because it has failed to improve test scores as quickly as they had hoped.
"It would be very hard to overestimate the importance of tests
in the United States today," says Bracey. "And yet they often inhibit educational
innovations that try to widen what children learn, because they measure
a very small range of skills, usually verbal and math, and there are many
other things in the world."
Despite the flaws and the critics, the tests and their results have
taken firm hold in the public consciousness. Newspapers began headlining
lowered standardized test scores in the mid-1970s, quoting experts and
politicians who claimed that the results confirmed an alarming erosion
in the nation's schools. A series of national reports in the early 1980s
demanded higher standards in public schools. The standardized test industry
began its two-decade boom.
The big three companies, each part of a major publishing firm,
are CTB/McGraw-Hill, with $95 million in annual sales, and the two runners-up,
HBEM and Riverside Publishing, owned by Houghton Mifflin & Co. Although
No. 2 in sales, HBEM is pushing its leading rival hard. The Stanford 9
replaced a CTB/McGraw-Hill product in the District and a Riverside product
in Virginia last year after HBEM executives convinced school officials
their test had a longer track record and was more in tune with the curriculums.
In California, HBEM won the largest testing contract in the state's history
despite a recommendation from the superintendent of public instruction
in favor of CTB/McGraw-Hill's Terra Nova series. Economics is also a factor.
HBEM charges the District just $12 per test for the Stanford 9, a total
of $636,000 for the 53,000 students who were tested last spring.
Today the Stanford 9 is used in at least 17 states and the District
of Columbia, although D.C. uses it in more grades and more frequently than
many other school districts. In the District, the full version of the test
is used, allowing teachers to identify particular strengths and weaknesses
in each child. Virginia is using an abbreviated version, which allows parents
to see how their children compare generally to a national norm. Maryland
does not use the Stanford 9. It periodically gives the Comprehensive Test
of Basic Skills, a CTB/McGraw-Hill exam, to a few students in each district
to check them against a national sample.
The Stanford 9 is a norm-referenced test, designed to show how
a student compares to a national sample of students. But HBEM is also producing
criterion-referenced tests. The latter are designed to demonstrate how
much a student has learned of a specific course curriculum, such as Virginia's
new, fact-rich Standards of Learning program. District officials are using
a version of the Stanford 9 that also serves as a criterion-referenced
test, with a panel of experts deciding how low a score must be to indicate
that a student like Darius has not reached a basic level of learning.
HBEM has the contract to write and score the new Virginia tests
and is working on similar projects in other states. But despite its growing
success and influence, the company remains so unaccustomed to public notice
that it declined a Washington Post request to visit its San Antonio headquarters,
saying it had never before been asked to let a reporter inside. Instead,
company officials consented to telephone interviews for this story.
Not only have standardized tests become a required part of nearly
every school district's spring calendar, but the publishing houses that
market them are also developing and selling guides and textbooks designed
to help students pass. Some have even started to dabble in tutoring services
to assure worried parents that they are doing everything they can to make
sure their children succeed. It's a huge potential market; in effect the
companies are addressing the parental anxieties that they themselves have
helped create.
Within days of receiving Darius's test results, Charlette Hedgman had
formulated her plan. Henceforth, she decreed, there would be no TV on weekdays.
The hour from 4 to 5 p.m. would be study time. Nothing would happen in
that hour unrelated to homework. She and Darius would read together every
evening, on the same couch, slowing down the process of absorption so that
the words would remain in his brain overnight.
She could tell he viewed this as one long prison sentence. But
what was she to do? She understood the power of tests, just as she understood
how difficult they could be for those who were unprepared. When she was
Darius's age, she had also attended Ketcham Elementary. She could remember
staring at a standardized test and feeling her mind freeze shut. In class
she usually knew the answers, at least for that day's lesson. But the words
and concepts slipped away. There was little review. A test that demanded
she recall a year's worth of work brought terror.
She received a guide for parents distributed by the District,
but found it was sometimes contradictory. Page 4 of "Stanford 9: How Parents
Can Help Their Children," for instance, said, "Students who score below
a certain level will be held back and required to attend six weeks of summer
school." But page 6 said the scores would only be "one of the indicators
used" and "teachers and principals will have the final say on summer school
and promotion."
Still, many of the recommendations in the guide made sense to
Hedgman: Answer all of your child's questions, encourage your child to
read at least two books a month, and turn off the TV. Hedgman followed
those guidelines and felt Darius was making progress. She had pushed him
to read more books than that in the past, but she realized that too often
he had whipped through them -- loving the speed but failing to comprehend
what he was reading. She found he learned more if they went more slowly.
Darius spent the winter reading. A tutor, someone his mother knew
from school, worked with him occasionally. At home, Hedgman made certain
he read to her at least an hour each evening, even when she could barely
keep her eyes open after a day of rustling kindergartners. He seemed to
be getting it. He was growing accustomed to the routine of TV-less weekdays.
His vocabulary was increasing. His reading comprehension improved.
The work that Darius was doing at home reinforced the efforts
that were taking place in the classroom. In Room 212 at Ketcham Elementary,
every hour of class included an exercise devoted to reading and writing.
Darius's second-grade teacher, Florine English, posted new terms on the
Word Wall, and displayed illustrated student stories in the hallway just
outside her glass-windowed door. Phonics were pushed hard. The school's
principal, Romaine Thomas, had been at Ketcham 25 years. She believed that
decoding the sounds of each word helped children absorb their meaning and
their use.
Some of Darius's work went up on the light-green cinder-block
walls of the classroom, where he would stand and admire his progress. Often
at lunch time English would give him an extra 15 minutes, taking him through
a recent lesson to make sure he understood.
English says she believes all her books and charts and teaching
devices were of help to Darius. But when asked about the child, the first
thing she says is: "He has a very supportive mother."
Nonetheless, Charlette Hedgman felt a mounting sense of frustration
as she investigated the District's use of the Stanford 9. The test-makers
refused to let the children and teachers see the graded tests -- that would
require them to create an entire set of new questions every year, they
argued, and make it impossible to match D.C. scores with the Stanford 9's
national sample. Instead, Hedgman had to settle for a brochure that outlined
the learning goals for each test. She saw immediately that these did not
always match what was being taught in the schools. For example, the test
demanded a form of multiplication that was not taught until the following
grade. The fall test on which Darius had done so poorly had been given
less than a month after school started. That was not long enough, Hedgman
felt, to build him back up after a summer of games and television. How
could they say all these numbers represented her son's level of achievement?
He was already multiplying in his head, by a process that made sense to
him. Yet the test said he could not adequately add or subtract. How could
that be?
Hedgman's doubts were new to her but not to the debate over standardized
testing. Critics around the country have been raising them for years.
John Katzman graduated from Princeton University in 1981. After six
weeks on Wall Street, he was bored and disheartened, eager to set his own
standards instead of the Dow's. He had worked with a small admissions-test-coaching
school in college. His parents lent him $3,000 to set up his own operation
in their New York apartment.
Today he is the president of the Princeton Review, a $50 million-a-year
test-preparation company with 500 locations. His success stems from his
knack for unlocking the secrets of standardized tests, particularly the
college applicant's worst nightmare, the Scholastic Assessment Test (SAT).
So presumably he should be fond of the high-stakes exams and national obsession
with testing that have made him wealthy.
In fact, Katzman has an almost thermo-nuclear distaste for standardized
testing of any kind. On the telephone from his office in Manhattan, Katzman
slides into his anti-test stump speech, an acidic riff on overanxious parents
and cowardly bureaucrats. These are people, he says, who can no longer
recall the time when they were in school and their parents judged the value
of their educations by little more than how much homework they were assigned
each night.
"The Stanford 9 conceptually is no better and no worse than any
other nationally standardized test," Katzman says. "What is problematic
is the value assigned to the test."
The 175 questions on Darius Leggette's three-hour reading and
arithmetic test can only dimly reflect what he learned in first grade,
Katzman argues, but that is all that is necessary to consign him to the
bottom of the scholastic heap. And if he manages to crawl his way up to
the basic level, where he will be allowed to move to the next grade, it
will only mean he will have chosen different answers to a handful of questions,
hardly deepening the understanding that his education is supposed to be
about.
With the Stanford 9 and other standardized tests used in elementary
schools, says Katzman, "the test-writers don't know what the kid is learning,
the students, parents and teachers don't know the process or have any access
to old tests, and the general public makes important decisions about policies
and people on the basis of deeply flawed numbers."
Just as failure on a standardized test may not be as significant
as it seems, success may also mean less. Fairfax consultant Gerald Bracey
notes that in the early 1980s Prince George's County schools received much
praise for their annual improvement in test scores. Critics raised questions
about classes that appeared to focus on little but strategies for passing
the test. But their views were overlooked until a few years later, when
the state of Maryland adopted a new test. Scores dropped state-wide, as
often happens with a new test, but in Prince George's they plummeted.
The curious fact is that when it comes to the question of how
test results are used, the people who design and market the Stanford 9
and its sister exams do not necessarily disagree with their critics.
Joanne Lenke is president of HBEM and chief defender of the Stanford
9. She is a prominent psychometrician, part of a direct intellectual lineage
back to Terman. Her professor at Syracuse University, Eric Gardner, was
a student of T.L. Kelly, one of the other co-designers of the first Stanford
Achievement Test. Lenke is also a former junior high school math teacher
who says she understands exactly how much her carefully calibrated, lovingly
drafted tests can be stretched out of shape by school board members eager
for a cheap, quick fix.
"The original reason that norm-referenced tests were developed
was to identify relative strengths and weaknesses of students in the classroom,"
Lenke says. "They were never intended to make personnel decisions . . .
I know it is done, and certainly test scores could be one indicator of
performance, but I would argue that visiting the classroom and observing
the teacher, looking at the day-to-day instructional activities, would
be far more useful."
Rating schools or holding back children because of poor Stanford
9 scores does not please her. The test is there to tell a parent and a
teacher how each child compares to a representative sample of American
children, not whether the principal should be rehired or the child forced
to repeat a grade. Test-makers, she says, have no more power to prevent
customers from abusing the test than the automobile industry is able to
ban lovemaking in back seats.
Still, Lenke defends the concept of accountability that lies behind
standardized tests. Parents and teachers cannot serve a student well unless
they know what the child understands and what he doesn't. A child will
nod and say he knows a certain word because he wants to get outside to
play on the jungle gym. Only by some objective test, such as putting the
word in front of him and requiring him to choose from four alternative
meanings, can the real extent of his knowledge be judged. "Testing has
become very important to our society," Lenke says, "because it is a way
we can together gather information to make important educational decisions."
D.C. school officials were well aware of the arguments for and
against standardized testing, and many of them had little enthusiasm for
the Stanford 9. But faced with an educational crisis of massive proportion,
they felt they had little choice but to turn to HBEM's test. It became
for them a form of shock therapy designed to awaken principals, teachers,
parents and students to a dreary reality no one seemed prepared to face.
The first round of Stanford 9 tests in the District was held in May
1997, and the results were predictably catastrophic. Overall scores in
both reading and math were disappointingly low. Nearly 4,000 tests were
not even scored because students had answered too few questions. The political
and fiscal implications were hard to miss in a city whose budget is in
the hands of a Republican-controlled Congress. "The average D.C. student
can't read this letter!" declared a missive to the House of Representatives
in large bold type from Rep. Charles Taylor, the North Carolina Republican
who chairs the House Appropriations subcommittee on the District. The test
results, he wrote, proved that the "D.C. public school system is failing
virtually every young life it is responsible for."
Many parents were equally shocked by the results. But some felt
that the school system effectively had herded their children into the middle
of I-395 -- subjecting them without proper preparation to a test that they
could not possibly pass.
"We had to make a decision about what was in the best interests
of children," says Patricia A. Anderson, the District's interim director
of educational accountability, looking back on what happened. A veteran
of two decades of working on D.C. tests, she says she understands the bruised
feelings of parents but believes the District had no choice. "Maybe it
wasn't the most comfortable position to take, but we felt that in order
to get our kids where they needed to be we had to shock the public and
parents into understanding their kids are not getting the skills that they
are going to need and this is what we have to do."
City officials informed principals that 50 percent of their evaluation
would depend upon their school's performance in the next round of Stanford
9 testing. Parents were told to become more concerned and involved in their
children's education.
Having gotten everyone's attention, school officials set out to
prepare teachers and students for the next round of tests. The mobilization
took months to organize. By last winter, many schools were deeply into
crash programs of preparation. At Langdon Elementary in Northeast Washington,
teachers spent 21/2 hours daily drilling students for multiple-choice tests,
part of a "We Push" campaign. At Wilson Senior High School, students spent
two hours a week practicing multiple-choice math and reading questions.
While Charlette Hedgman and Darius were hard at work on his reading
and math skills at home, principal Romaine Thomas was lighting a flame
under the faculty and students at Ketcham. Thomas held after-school training
sessions for teachers and placed posters on the hallway walls setting "A
Performance Standard for Reading." Each student was assigned to read 25
books or the equivalent every school year, with at least five different
authors and three different literary styles. A series of test-taking tips
were also posted: Don't spend too much time on any one question, pay attention
to directions, eliminate those answers you know are incorrect before guessing,
mark items to return to if time permits. Ketcham also adopted the Drop
Everything and Read program. Each child was required to have a 90-minute
reading session at school every day.
Many students in the District were dismayed by the new emphasis
on testing. On a chilly February morning this past winter about 100 students
at Cardozo High School's Transportation and Technology Academy gathered
in the bright yellow basement cafeteria for a special assembly. After speakers
spoke and awards were handed out, academy coordinator Shirley C. McCall
broached the delicate subject of the tests.
"I want to explain to you how important the Stanford 9 is," she
said. Ninth, 10th and 11th graders would be taking the test in April. "The
Stanford Achievement Test will be a barometer," she said. "It will mean
whether or not you will be able to get to the next grade level. Please
take the test seriously. In the near future, students who do not pass the
Stanford 9 test will not be able to graduate with a diploma."
There was an unhappy buzz among the students, who were some of
the most motivated and hardest working in the school. A teacher, Emma Stephens,
asked for questions, unleashing a torrent of resentment. "How come the
seniors are not penalized on the Stanford 9 test?" asked Shawnice Palmer,
winner of an award for best grade point average in the junior class.
Candie Parrish, another high-achieving junior, answered in a stage
whisper: "They're more stupid than we are."
Stephens begged for calm. "We have a lot of tests. It is not designed
to penalize you. Let me say this to you, Shawnice. We are talking about
standards across the country, and the Stanford 9 reflects the standards
we are talking about."
The mobilization campaign had an effect. When students throughout
the system took the test again last April, the results seemed to show across-the-board
improvements. At every grade level fewer students scored "below basic"
in both math and reading. The biggest gains in reading came from the lowest-performing
students in second and fourth grades, whose scores jumped 11 percentage
points. In math, lowest-performing sixth- and eighth-graders improved by
12 points.
But a closer look revealed some odd discrepancies, the most glaring
of which concerned the number of students who took the test. Of the 104
elementary schools in the District, 28 reported a drop of at least 30 students
-- in most cases about 10 percent or more of the total -- being tested
in reading compared with the year before. District school officials say
they need to complete a review of scores from special-education and limited-English
students, but they think the number of 1997 test-takers was mistakenly
inflated. There have been unconfirmed reports in the past that some schools
have intentionally culled poor students from the flock of test-takers,
but there is no evidence that any of the schools in D.C. did anything unethical
to improve their scores. Still, such declines ring alarms among psychometricians.
Increasing the number of test-takers tends to lower scores, while decreasing
the number raises them.
Of the 28 schools that reported a decline in the number of reading
test-takers, 19 showed an improvement in scores. One of the most impressive
improvements occurred at Thomson Elementary in Northwest Washington. Thomson
scored highest of the District's 41 lowest-income elementary schools. Only
12.5 percent of its students scored "below basic" in reading, whereas the
four low-income schools with the worst results had from 40 to 50 percent
of their students scoring "below basic."
But Thomson also had one of the sharpest declines in test-takers.
The number of students in grades one through six changed very little, from
300 in 1997 to 291 in 1998. But the number listed as taking the Stanford
9 reading test dropped from 238 last year to 144 this year, a decline of
nearly 40 percent. At the same time, the percentage of students scoring
"below basic" dropped from 18 percent to 12 percent, a 33 percent improvement.
The percentage of students scoring at the proficient level in reading increased
17 percent and the percentage scoring at the advanced level increased more
than 140 percent.
Thomson principal Robert Bracy III, a well-regarded educator with
16 years at the school, says the improvement was the result of a hard-working,
committed staff and an emphasis on phonics in a school where the majority
of children are from immigrant Hispanic and Asian families. He says he
does not know why the number of students taking the test has dropped, but
he thought there might have been an increase in the number of children
designated as non-English-speaking and exempted from the test.
Figures supplied by Sheryl Hamilton, project coordinator for data
evaluation in the D.C. schools office of bilingual education, indicate
the opposite is true. Last year, she says, her office told the school that
105 of its students should not take the test. This year that number dropped
to 77. Patricia Anderson says many Thomson students who took the test in
1997 should not have been counted because they had limited English or other
disabilities. The District's test experts took this into account, she says,
and concluded that Thomson's improvement was real and not a statistical
fluke.
The decline in test-takers at other schools remains a mystery,
but teachers and administrators do not seem interested in delving into
it too deeply. That attitude reflects a general sense of resignation regarding
the tests. Parental concerns and political demands force them to release
scores each year. Superintendent Ackerman's staff not only did that, but
it provided enough detail to raise doubts about the validity of the schools'
average scores. Ackerman says she plans to raise the issue of testing all
children with the principals, but many administrators appear too busy to
worry much about it.
In the same fashion, principals in the District show little enthusiasm
for Ackerman's announcement that scores will count as 50 percent of their
evaluations. But they recognize that is the way educators are assessed
these days. Sheila Ford, principal at Mann Elementary for the last nine
years, was in Memphis recently, where, she was told, test scores count
for 60 percent on principal evaluations. "It is a backlash," she says.
"People have not been accountable. Their feet have not been held to the
fire. There have been some drastic reactions to that failure."
Michael Feuer, a D.C. parent who also heads the testing and assessment
board at the National Academy of Sciences, echoes the feelings of many
parents and teachers about the Stanford 9. "One needs to apply more than
a single test score to decisions such as grade retention or firing a principal,"
he says. "I'm not saying the test has no role to play, but you need to
be cautious." Ackerman says she plans to have a new test tailored to the
D.C. curriculum -- including a writing assessment -- in place for grades
three, five, eight and 10 by the 1999-2000 school year.
The point, says Sheila Ford, is to help children learn better
and more. If the scores can help, no matter how erratic or annoying or
misleading they can be, they will be used.
Charlette Hedgman received the spring Stanford 9 results for Darius
in May. The two sheets of paper looked as dense and chilly as the fall
report had. But many of the black bars and check marks had shifted in a
promising direction.
She sat on the couch and studied them. Last fall Darius had scored
"below average" on 21 indicators. This time all but six of those check
marks had moved to the average column. In the fall he had been "below basic"
on all six main subtests. This time he moved up to "basic" in the three
reading categories, although he was still "below basic" in mathematics.
The nightly reading hour had proved its worth.
"You did much better," she told Darius. "But we still have some
work to do." She told him he would have to go to summer school, as a new
school policy required.
Despite all of Hedgman's doubts about the test, it appeared to
have moved Darius along, just as its makers in San Antonio and the administrators
on North Capitol Street had hoped.
Whatever its flaws, it had been a useful guide in motivating one
child to spend more time with his books. His mother and teacher had taken
the Stanford 9 and the fears and uncertainties it generated, and had used
them as best they could. There was no question Darius had learned a great
deal in second grade. They could see it, with or without the test.
Working in the carpeted, air-conditioned learning areas of Terrell
Elementary's summer school, Darius spent weeks doing math games and quizzes
and more reading. The well-funded special session, a crucial part of Ackerman's
new focus on achievement, included extra classroom aides. Darius had 14-year-old
Christopher Henry, a high school sophomore, to take him through his exercises.
Like Darius's mother, Christopher marveled at the boy's ability to multiply
but not add. They focused on addition.
The summer report card reached Charlette Hedgman at the end of
July. It said Darius had improved in all areas. He had escaped retention
and would be on his way to third grade.
Hedgman watched him dash gleefully away. She would get him to
sit down and read soon. She still didn't like the clumsy way the tests
were applied, but she had the rhythm now. The summer mathematics teacher
had put it exactly right at the end of the report card: "Don't work too
fast, but take your time, concentrate and apply yourself."
Jay Mathews covers public schools for The Post's Metro section.
Cutline: CHARLETTE HEDGMAN and son Darius, a student at D.C.'s
Ketcham Elementary School.
PATRICIA ANDERSON, a D.C. educator: The tests were "in the
best interests of children."
Articles appear as they were originally printed in The Washington
Post and may not include subsequent corrections.
Return to Search Results