Take Out Your No. 2 Pencils

TAKE OUT YOUR NO. 2 PENCILS; STANDARDIZED TESTS LIKE THE STANFORD 9 ARE THE RAGE IN PUBLIC EDUCATION. BUT DO THEY REALLY MEASURE ACADEMIC ACHIEVEMENT, OR SIMPLY REFLECT COLLECTIVE ANXIETY ABOUT OUR SCHOOLS?

JAY MATHEWS
Sunday, November 8, 1998 ; Page W11

When the long envelope arrived from her son Darius's elementary school last fall, Charlette Hedgman reacted slowly and carefully. She knew the thickly numbered columns on the two enclosed sheets were bad news, but she did not raise her voice with Darius or threaten punishment. That was not the way she raised her son. Hiding her disappointment, she sat and thought about what she wanted to say.

On the double-sided sheets, in beige-colored boxes with a light blue border, were the results of the Stanford Achievement Test Series, Ninth Edition, for Darius Q. Leggette, student No. 8276447, age 8 yrs., 01 mos. The sweet-tempered boy with the incandescent grin was beginning second grade at the Ketcham Elementary School, a ramshackle assemblage of brick buildings at the foot of 15th Street SE in Anacostia. The report listed 133 indicators, many as incomprehensible as a corporate tax return, but Hedgman was a classroom aide at another District elementary school. She knew what this meant.

In all six areas of performance -- total reading, vocabulary, reading comprehension, total mathematics, problem solving and procedures -- Darius had received the lowest rating: "below basic." The sheets of paper told her this categorization "indicates little or no mastery of fundamental knowledge and skills." Hedgman didn't know it at the time, but almost half of Darius's classmates at Ketcham Elementary scored equally poorly on the reading portion of the test.

Charlette Hedgman did not think Darius had too little intelligence. She thought he had too much energy and impatience. He tried to do everything hastily -- he was the kind of child who liked to stampede through the living room and leap down the six steps in front of their brick town house. In 1996 he had dashed into the middle of Good Hope Road, two blocks from the house, directly in front of a big burgundy sedan. He spent a month at Children's Hospital. She never wanted to go through that again.

She adjusted the papers and called Darius over to sit with her on the Queen Anne couch with the flowered upholstery. The muscles of his small face tightened. The couch meant serious business. She showed him the report.

"This is how you did," she said. She pulled him close. "You could do better, but I'm not mad. I think we're just going to have to work a little harder."

Darius peered at the two sheets of paper. There were 27 separate content categories. He was below average in 21. Hedgman had to squint to read the fine print on the back. It said that RS/NP/NA meant Raw Score/Number Possible/Number Attempted. Darius had worked hard at the test, answering nearly every question, even if many of his answers were wrong. His energy and time had finally run out on the "Word Reading" section, where he had answered only 11 of the 30 questions.

The report gave no encouragement, not even the usual bureaucratic concession that one test could not define a child. Seven thick bars, crawling like black worms across the page, said Darius was no higher than the 24th percentile on any indicator. At least 75 percent of the national sample of second graders were ahead of him. On Word Reading he was at the absolute bottom, the 1st percentile, all the rest of the world looking down on him.

Hedgman could see her son was shaken. She tried to be positive. She told him he could do better on the test; it would just take some time and effort, and a few new rules. She began to formulate a plan.

All over the District -- and, indeed, the nation -- households were receiving similar packets. In the last two decades standardized tests have become one of the most powerful forces in American education. Psychometrics, the science of measuring the mind, pervades the academic world. Few districts dare to conclude a school year without requiring students to spend several hours filling in circles and rectangles on computerized scoring sheets with No. 2 pencils. The companies that market such tests are approaching $200 million in annual sales, evidence of a boom that by all indications shows no sign of abating.

The tests have become a universal measure of success in the world of public education. Principals and teachers are given bonuses or fired. Students are promoted or forced to repeat grades, placed in programs for the gifted or dispatched to special ed, and some are denied graduation. All occur because of what the indicators and the bar graphs and the pie charts reveal.

In this results-oriented, parent-sensitive environment, no experiment in teaching children can thrive without a test that says it is working. There does not have to be any proven connection between a rise in test scores and the program in question. But the advocates of the new idea have to be able to point to some upward trend in numbers, or they cannot get very far.

Most states have at least one testing program in place, and many have several. School districts in South Carolina can be declared educationally bankrupt and subject to state takeover if their test scores fall below a certain level. Districts in Michigan can lose their accreditation for the same reason.

California administered the largest standardized testing program in the country last spring to more than 4 million students at a cost of $35 million and posted school-by-school results on the World Wide Web. Several districts sued unsuccessfully to prevent the release of scores for limited-English students. The state Department of Education has proposed paying cash awards of 5 to 10 percent of teachers' annual salaries to schools with high or much improved scores. Schools that fail to do well could have their entire staffs transferred or face closure.

Standardized tests have long been a fact of life in public education. For half a century, according to educational historian Sherman Dorn at the University of South Florida, achievement tests were administered and scored, their results confined largely to the internal consumption of teachers, parents and children. What is different now is that the tests have become a public measuring and punishing stick. Parents and voters want the results disclosed and are quick to seek comparisons between various schools and districts. The media have jumped in, treating test results as a major news story.

"In less than 25 years, statistical accountability has become so ubiquitous that it appears inevitable," says Dorn, who calls the change "both breathtaking and alarming."

What bothers Dorn and other critics is that tests have gone from being one possible measure of academic achievement to being the only measure that really counts. The pressures on school officials, teachers and students to do well on the test have increased dramatically. This has led some schools to emphasize practicing for the tests over real learning, critics argue. Some school officials seem to have gone even further, searching for ways to beat the system by selectively choosing who takes the test and who doesn't, in order to raise their school's average scores.

In D.C., former school chief executive Julius W. Becton Jr. and his successor, superintendent Arlene Ackerman, have decreed that standardized tests will be the ultimate criteria for Darius Leggette and 73,000 other public schoolchildren. Principals who do not raise their schools' scores significantly have been told they may lose their jobs. Students who do not improve have been required to attend summer school. Children with scores as low as Darius's face the prospect of having to repeat the same grade.

These officials are responding to what they see as a crisis of confidence in the District's education system. Overcrowded classrooms, poorly trained teachers, decaying buildings and unprepared and uninspired students all have contributed to the city's educational meltdown. Its standardized test scores are among the worst in the country, its dropout rate among the highest. On the average, every year that a student spends in the system, his or her scores go down. About 70 percent of high school juniors test below grade level.

To measure how their students were really doing, District officials decided they had neither the time nor money to develop a sophisticated, performance- and curriculum-based test like those that have come into use in Maryland and Virginia. Instead, they turned to one of the most popular, widely used and economical standardized tests in the nation.

The Stanford Achievement Test Series, Ninth Edition -- known far and wide as the Stanford 9 or SAT9 -- was designed 1,600 miles from the District in a sprawling office complex in the San Antonio residential neighborhood of Collins Gardens, using many buildings that had once been Levi Strauss warehouses. This is the headquarters of Harcourt Brace Educational Measurement Inc., the company that writes and markets the Stanford 9. HBEM's test -- considered the first standardized achievement test in America -- dates back three-quarters of a century. The SAT9 -- which has no affiliation with the Educational Testing Service's SAT given to college-bound high school students -- has the respect of many educators, the backing of a major textbook publisher, and a marketing philosophy that seeks to capitalize on its long history.

Some of that history is less than distinguished. The co-founder of the SAT9 series, Stanford University psychologist Lewis Terman, began his celebrated half-century of work on the mysteries of intelligence with some of the most racist and elitist speculations ever found in 20th-century academia. Harvard University science historian and paleontologist Stephen Jay Gould, in his 1981 book The Mismeasure of Man, unearthed malodorous chunks of early Terman, such as his admiration for the intellectual superiority of Anglo-Saxons.

In 1916 in his seminal book, The Measurement of Intelligence, Terman bemoaned the ignorance of a conscientious mother of a child who had just scored 75 on the professor's rudimentary IQ test. "Strange to say, the mother is encouraged and hopeful because she sees that her boy is learning to read," Terman wrote. His own conclusion: The boy "is feeble-minded; he will never complete the grammar school; he will never be an efficient worker or a responsible citizen." Similar tests assigned most immigrant Jews of that era to the feeble-minded category.

Terman later disowned his early fascination with race-based intelligence, and by 1937 had acknowledged that how a child was raised had a significant impact on how well he or she did on the tests. Terman did not quite know what it was he was measuring, but he continued to refine the yardstick. The Stanford-Binet intelligence test that he co-authored has lived on, a controversial device still used in assigning children to gifted classes, even though critics contend it is culturally biased and underestimates the intelligence of ethnic minorities.

The achievement tests developed by Terman and others at Stanford have a somewhat different goal -- to measure how much children absorb of what school teaches them, and to help educators see how each child measures up against the American average.

Constructing tests such as the Stanford 9 is like erecting a high wire in such a way that when children try to walk across it, half fall on one side and half on the other. The test-makers cull textbooks and nationally circulated lesson plans for questions that reflect what is most widely taught in American schools, giving their product what specialists call "content validity."

For the test to give parents and teachers an idea of where each student fits on the national spectrum, the test-makers have to choose questions that, in the jargon of psychometrics, "behave properly." The best questions are those that half of the students miss. Some questions can be easier and some harder, but psychometricians try to avoid those that 90 percent of the students get wrong or right.

If a disproportionate number of people with low total scores get a given question right, or a disproportionate number with high scores get it wrong, that also is a problem. It might mean that one of the "distracters," the psychometric name for the wrong answers to a multiple-choice question, is causing too many high scorers to make a mistake or giving too many low scorers too easy a route to the correct answer. The distracters are there to trick the uncertain student into getting the question wrong, otherwise the scores will not break down into the bell-shaped curve that reflects most students scoring in the middle, with a few at the high and low ends.

The troubling aspects of this practice hit Fairfax-based educational consultant Gerald W. Bracey when he was helping teach a freshman psychology course at Stanford. Bracey, who went on to become chief assessment officer for Virginia's public schools and is author of Put to the Test, a consumer's guide to testing, was ordered to grade on the curve at Stanford so that 15 percent received A's and 15 percent D's or F's. His students were among the brightest in the country. Many of them had never seen anything less than an A on their high school report cards. Instead of testing them on the course's main points, Bracey had to trick them into committing errors by formulating improbably subtle questions or making references to obscure footnotes.

To achieve their own bell curve, the makers of the Stanford 9 have to build in the same kind of tricks. They justify these devices as part of the educational process. A student who truly knows the subject matter, they argue, will step lightly around the traps.

Bracey contends that the search for questions that address only lessons taught in the majority of school districts has an important drawback: It has inhibited educators from trying innovations that don't show an immediate payoff in higher test scores or that might put their young test-takers at a disadvantage. The Coalition of Essential Schools, an effort to revive secondary schools through deeper courses and less test-conscious assessments, has won enthusiastic reviews from many parents, students and educators. But some districts have lost interest in the coalition's program, in part because it has failed to improve test scores as quickly as they had hoped.

"It would be very hard to overestimate the importance of tests in the United States today," says Bracey. "And yet they often inhibit educational innovations that try to widen what children learn, because they measure a very small range of skills, usually verbal and math, and there are many other things in the world."

Despite the flaws and the critics, the tests and their results have taken firm hold in the public consciousness. Newspapers began headlining lowered standardized test scores in the mid-1970s, quoting experts and politicians who claimed that the results confirmed an alarming erosion in the nation's schools. A series of national reports in the early 1980s demanded higher standards in public schools. The standardized test industry began its two-decade boom.

The big three companies, each part of a major publishing firm, are CTB/McGraw-Hill, with $95 million in annual sales, and the two runners-up, HBEM and Riverside Publishing, owned by Houghton Mifflin & Co. Although No. 2 in sales, HBEM is pushing its leading rival hard. The Stanford 9 replaced a CTB/McGraw-Hill product in the District and a Riverside product in Virginia last year after HBEM executives convinced school officials their test had a longer track record and was more in tune with the curriculums. In California, HBEM won the largest testing contract in the state's history despite a recommendation from the superintendent of public instruction in favor of CTB/McGraw-Hill's Terra Nova series. Economics is also a factor. HBEM charges the District just $12 per test for the Stanford 9, a total of $636,000 for the 53,000 students who were tested last spring.

Today the Stanford 9 is used in at least 17 states and the District of Columbia, although D.C. uses it in more grades and more frequently than many other school districts. In the District, the full version of the test is used, allowing teachers to identify particular strengths and weaknesses in each child. Virginia is using an abbreviated version, which allows parents to see how their children compare generally to a national norm. Maryland does not use the Stanford 9. It periodically gives the Comprehensive Test of Basic Skills, a CTB/McGraw-Hill exam, to a few students in each district to check them against a national sample.

The Stanford 9 is a norm-referenced test, designed to show how a student compares to a national sample of students. But HBEM is also producing criterion-referenced tests. The latter are designed to demonstrate how much a student has learned of a specific course curriculum, such as Virginia's new, fact-rich Standards of Learning program. District officials are using a version of the Stanford 9 that also serves as a criterion-referenced test, with a panel of experts deciding how low a score must be to indicate that a student like Darius has not reached a basic level of learning.

HBEM has the contract to write and score the new Virginia tests and is working on similar projects in other states. But despite its growing success and influence, the company remains so unaccustomed to public notice that it declined a Washington Post request to visit its San Antonio headquarters, saying it had never before been asked to let a reporter inside. Instead, company officials consented to telephone interviews for this story.

Not only have standardized tests become a required part of nearly every school district's spring calendar, but the publishing houses that market them are also developing and selling guides and textbooks designed to help students pass. Some have even started to dabble in tutoring services to assure worried parents that they are doing everything they can to make sure their children succeed. It's a huge potential market; in effect the companies are addressing the parental anxieties that they themselves have helped create.

Within days of receiving Darius's test results, Charlette Hedgman had formulated her plan. Henceforth, she decreed, there would be no TV on weekdays. The hour from 4 to 5 p.m. would be study time. Nothing would happen in that hour unrelated to homework. She and Darius would read together every evening, on the same couch, slowing down the process of absorption so that the words would remain in his brain overnight.

She could tell he viewed this as one long prison sentence. But what was she to do? She understood the power of tests, just as she understood how difficult they could be for those who were unprepared. When she was Darius's age, she had also attended Ketcham Elementary. She could remember staring at a standardized test and feeling her mind freeze shut. In class she usually knew the answers, at least for that day's lesson. But the words and concepts slipped away. There was little review. A test that demanded she recall a year's worth of work brought terror.

She received a guide for parents distributed by the District, but found it was sometimes contradictory. Page 4 of "Stanford 9: How Parents Can Help Their Children," for instance, said, "Students who score below a certain level will be held back and required to attend six weeks of summer school." But page 6 said the scores would only be "one of the indicators used" and "teachers and principals will have the final say on summer school and promotion."

Still, many of the recommendations in the guide made sense to Hedgman: Answer all of your child's questions, encourage your child to read at least two books a month, and turn off the TV. Hedgman followed those guidelines and felt Darius was making progress. She had pushed him to read more books than that in the past, but she realized that too often he had whipped through them -- loving the speed but failing to comprehend what he was reading. She found he learned more if they went more slowly.

Darius spent the winter reading. A tutor, someone his mother knew from school, worked with him occasionally. At home, Hedgman made certain he read to her at least an hour each evening, even when she could barely keep her eyes open after a day of rustling kindergartners. He seemed to be getting it. He was growing accustomed to the routine of TV-less weekdays. His vocabulary was increasing. His reading comprehension improved.

The work that Darius was doing at home reinforced the efforts that were taking place in the classroom. In Room 212 at Ketcham Elementary, every hour of class included an exercise devoted to reading and writing. Darius's second-grade teacher, Florine English, posted new terms on the Word Wall, and displayed illustrated student stories in the hallway just outside her glass-windowed door. Phonics were pushed hard. The school's principal, Romaine Thomas, had been at Ketcham 25 years. She believed that decoding the sounds of each word helped children absorb their meaning and their use.

Some of Darius's work went up on the light-green cinder-block walls of the classroom, where he would stand and admire his progress. Often at lunch time English would give him an extra 15 minutes, taking him through a recent lesson to make sure he understood.

English says she believes all her books and charts and teaching devices were of help to Darius. But when asked about the child, the first thing she says is: "He has a very supportive mother."

Nonetheless, Charlette Hedgman felt a mounting sense of frustration as she investigated the District's use of the Stanford 9. The test-makers refused to let the children and teachers see the graded tests -- that would require them to create an entire set of new questions every year, they argued, and make it impossible to match D.C. scores with the Stanford 9's national sample. Instead, Hedgman had to settle for a brochure that outlined the learning goals for each test. She saw immediately that these did not always match what was being taught in the schools. For example, the test demanded a form of multiplication that was not taught until the following grade. The fall test on which Darius had done so poorly had been given less than a month after school started. That was not long enough, Hedgman felt, to build him back up after a summer of games and television. How could they say all these numbers represented her son's level of achievement? He was already multiplying in his head, by a process that made sense to him. Yet the test said he could not adequately add or subtract. How could that be?

Hedgman's doubts were new to her but not to the debate over standardized testing. Critics around the country have been raising them for years.

John Katzman graduated from Princeton University in 1981. After six weeks on Wall Street, he was bored and disheartened, eager to set his own standards instead of the Dow's. He had worked with a small admissions-test-coaching school in college. His parents lent him $3,000 to set up his own operation in their New York apartment.

Today he is the president of the Princeton Review, a $50 million-a-year test-preparation company with 500 locations. His success stems from his knack for unlocking the secrets of standardized tests, particularly the college applicant's worst nightmare, the Scholastic Assessment Test (SAT). So presumably he should be fond of the high-stakes exams and national obsession with testing that have made him wealthy.

In fact, Katzman has an almost thermo-nuclear distaste for standardized testing of any kind. On the telephone from his office in Manhattan, Katzman slides into his anti-test stump speech, an acidic riff on overanxious parents and cowardly bureaucrats. These are people, he says, who can no longer recall the time when they were in school and their parents judged the value of their educations by little more than how much homework they were assigned each night.

"The Stanford 9 conceptually is no better and no worse than any other nationally standardized test," Katzman says. "What is problematic is the value assigned to the test."

The 175 questions on Darius Leggette's three-hour reading and arithmetic test can only dimly reflect what he learned in first grade, Katzman argues, but that is all that is necessary to consign him to the bottom of the scholastic heap. And if he manages to crawl his way up to the basic level, where he will be allowed to move to the next grade, it will only mean he will have chosen different answers to a handful of questions, hardly deepening the understanding that his education is supposed to be about.

With the Stanford 9 and other standardized tests used in elementary schools, says Katzman, "the test-writers don't know what the kid is learning, the students, parents and teachers don't know the process or have any access to old tests, and the general public makes important decisions about policies and people on the basis of deeply flawed numbers."

Just as failure on a standardized test may not be as significant as it seems, success may also mean less. Fairfax consultant Gerald Bracey notes that in the early 1980s Prince George's County schools received much praise for their annual improvement in test scores. Critics raised questions about classes that appeared to focus on little but strategies for passing the test. But their views were overlooked until a few years later, when the state of Maryland adopted a new test. Scores dropped state-wide, as often happens with a new test, but in Prince George's they plummeted.

The curious fact is that when it comes to the question of how test results are used, the people who design and market the Stanford 9 and its sister exams do not necessarily disagree with their critics.

Joanne Lenke is president of HBEM and chief defender of the Stanford 9. She is a prominent psychometrician, part of a direct intellectual lineage back to Terman. Her professor at Syracuse University, Eric Gardner, was a student of T.L. Kelly, one of the other co-designers of the first Stanford Achievement Test. Lenke is also a former junior high school math teacher who says she understands exactly how much her carefully calibrated, lovingly drafted tests can be stretched out of shape by school board members eager for a cheap, quick fix.

"The original reason that norm-referenced tests were developed was to identify relative strengths and weaknesses of students in the classroom," Lenke says. "They were never intended to make personnel decisions . . . I know it is done, and certainly test scores could be one indicator of performance, but I would argue that visiting the classroom and observing the teacher, looking at the day-to-day instructional activities, would be far more useful."

Rating schools or holding back children because of poor Stanford 9 scores does not please her. The test is there to tell a parent and a teacher how each child compares to a representative sample of American children, not whether the principal should be rehired or the child forced to repeat a grade. Test-makers, she says, have no more power to prevent customers from abusing the test than the automobile industry is able to ban lovemaking in back seats.

Still, Lenke defends the concept of accountability that lies behind standardized tests. Parents and teachers cannot serve a student well unless they know what the child understands and what he doesn't. A child will nod and say he knows a certain word because he wants to get outside to play on the jungle gym. Only by some objective test, such as putting the word in front of him and requiring him to choose from four alternative meanings, can the real extent of his knowledge be judged. "Testing has become very important to our society," Lenke says, "because it is a way we can together gather information to make important educational decisions."

D.C. school officials were well aware of the arguments for and against standardized testing, and many of them had little enthusiasm for the Stanford 9. But faced with an educational crisis of massive proportion, they felt they had little choice but to turn to HBEM's test. It became for them a form of shock therapy designed to awaken principals, teachers, parents and students to a dreary reality no one seemed prepared to face.

The first round of Stanford 9 tests in the District was held in May 1997, and the results were predictably catastrophic. Overall scores in both reading and math were disappointingly low. Nearly 4,000 tests were not even scored because students had answered too few questions. The political and fiscal implications were hard to miss in a city whose budget is in the hands of a Republican-controlled Congress. "The average D.C. student can't read this letter!" declared a missive to the House of Representatives in large bold type from Rep. Charles Taylor, the North Carolina Republican who chairs the House Appropriations subcommittee on the District. The test results, he wrote, proved that the "D.C. public school system is failing virtually every young life it is responsible for."

Many parents were equally shocked by the results. But some felt that the school system effectively had herded their children into the middle of I-395 -- subjecting them without proper preparation to a test that they could not possibly pass.

"We had to make a decision about what was in the best interests of children," says Patricia A. Anderson, the District's interim director of educational accountability, looking back on what happened. A veteran of two decades of working on D.C. tests, she says she understands the bruised feelings of parents but believes the District had no choice. "Maybe it wasn't the most comfortable position to take, but we felt that in order to get our kids where they needed to be we had to shock the public and parents into understanding their kids are not getting the skills that they are going to need and this is what we have to do."

City officials informed principals that 50 percent of their evaluation would depend upon their school's performance in the next round of Stanford 9 testing. Parents were told to become more concerned and involved in their children's education.

Having gotten everyone's attention, school officials set out to prepare teachers and students for the next round of tests. The mobilization took months to organize. By last winter, many schools were deeply into crash programs of preparation. At Langdon Elementary in Northeast Washington, teachers spent 21/2 hours daily drilling students for multiple-choice tests, part of a "We Push" campaign. At Wilson Senior High School, students spent two hours a week practicing multiple-choice math and reading questions.

While Charlette Hedgman and Darius were hard at work on his reading and math skills at home, principal Romaine Thomas was lighting a flame under the faculty and students at Ketcham. Thomas held after-school training sessions for teachers and placed posters on the hallway walls setting "A Performance Standard for Reading." Each student was assigned to read 25 books or the equivalent every school year, with at least five different authors and three different literary styles. A series of test-taking tips were also posted: Don't spend too much time on any one question, pay attention to directions, eliminate those answers you know are incorrect before guessing, mark items to return to if time permits. Ketcham also adopted the Drop Everything and Read program. Each child was required to have a 90-minute reading session at school every day.

Many students in the District were dismayed by the new emphasis on testing. On a chilly February morning this past winter about 100 students at Cardozo High School's Transportation and Technology Academy gathered in the bright yellow basement cafeteria for a special assembly. After speakers spoke and awards were handed out, academy coordinator Shirley C. McCall broached the delicate subject of the tests.

"I want to explain to you how important the Stanford 9 is," she said. Ninth, 10th and 11th graders would be taking the test in April. "The Stanford Achievement Test will be a barometer," she said. "It will mean whether or not you will be able to get to the next grade level. Please take the test seriously. In the near future, students who do not pass the Stanford 9 test will not be able to graduate with a diploma."

There was an unhappy buzz among the students, who were some of the most motivated and hardest working in the school. A teacher, Emma Stephens, asked for questions, unleashing a torrent of resentment. "How come the seniors are not penalized on the Stanford 9 test?" asked Shawnice Palmer, winner of an award for best grade point average in the junior class.

Candie Parrish, another high-achieving junior, answered in a stage whisper: "They're more stupid than we are."

Stephens begged for calm. "We have a lot of tests. It is not designed to penalize you. Let me say this to you, Shawnice. We are talking about standards across the country, and the Stanford 9 reflects the standards we are talking about."

The mobilization campaign had an effect. When students throughout the system took the test again last April, the results seemed to show across-the-board improvements. At every grade level fewer students scored "below basic" in both math and reading. The biggest gains in reading came from the lowest-performing students in second and fourth grades, whose scores jumped 11 percentage points. In math, lowest-performing sixth- and eighth-graders improved by 12 points.

But a closer look revealed some odd discrepancies, the most glaring of which concerned the number of students who took the test. Of the 104 elementary schools in the District, 28 reported a drop of at least 30 students -- in most cases about 10 percent or more of the total -- being tested in reading compared with the year before. District school officials say they need to complete a review of scores from special-education and limited-English students, but they think the number of 1997 test-takers was mistakenly inflated. There have been unconfirmed reports in the past that some schools have intentionally culled poor students from the flock of test-takers, but there is no evidence that any of the schools in D.C. did anything unethical to improve their scores. Still, such declines ring alarms among psychometricians. Increasing the number of test-takers tends to lower scores, while decreasing the number raises them.

Of the 28 schools that reported a decline in the number of reading test-takers, 19 showed an improvement in scores. One of the most impressive improvements occurred at Thomson Elementary in Northwest Washington. Thomson scored highest of the District's 41 lowest-income elementary schools. Only 12.5 percent of its students scored "below basic" in reading, whereas the four low-income schools with the worst results had from 40 to 50 percent of their students scoring "below basic."

But Thomson also had one of the sharpest declines in test-takers. The number of students in grades one through six changed very little, from 300 in 1997 to 291 in 1998. But the number listed as taking the Stanford 9 reading test dropped from 238 last year to 144 this year, a decline of nearly 40 percent. At the same time, the percentage of students scoring "below basic" dropped from 18 percent to 12 percent, a 33 percent improvement. The percentage of students scoring at the proficient level in reading increased 17 percent and the percentage scoring at the advanced level increased more than 140 percent.

Thomson principal Robert Bracy III, a well-regarded educator with 16 years at the school, says the improvement was the result of a hard-working, committed staff and an emphasis on phonics in a school where the majority of children are from immigrant Hispanic and Asian families. He says he does not know why the number of students taking the test has dropped, but he thought there might have been an increase in the number of children designated as non-English-speaking and exempted from the test.

Figures supplied by Sheryl Hamilton, project coordinator for data evaluation in the D.C. schools office of bilingual education, indicate the opposite is true. Last year, she says, her office told the school that 105 of its students should not take the test. This year that number dropped to 77. Patricia Anderson says many Thomson students who took the test in 1997 should not have been counted because they had limited English or other disabilities. The District's test experts took this into account, she says, and concluded that Thomson's improvement was real and not a statistical fluke.

The decline in test-takers at other schools remains a mystery, but teachers and administrators do not seem interested in delving into it too deeply. That attitude reflects a general sense of resignation regarding the tests. Parental concerns and political demands force them to release scores each year. Superintendent Ackerman's staff not only did that, but it provided enough detail to raise doubts about the validity of the schools' average scores. Ackerman says she plans to raise the issue of testing all children with the principals, but many administrators appear too busy to worry much about it.

In the same fashion, principals in the District show little enthusiasm for Ackerman's announcement that scores will count as 50 percent of their evaluations. But they recognize that is the way educators are assessed these days. Sheila Ford, principal at Mann Elementary for the last nine years, was in Memphis recently, where, she was told, test scores count for 60 percent on principal evaluations. "It is a backlash," she says. "People have not been accountable. Their feet have not been held to the fire. There have been some drastic reactions to that failure."

Michael Feuer, a D.C. parent who also heads the testing and assessment board at the National Academy of Sciences, echoes the feelings of many parents and teachers about the Stanford 9. "One needs to apply more than a single test score to decisions such as grade retention or firing a principal," he says. "I'm not saying the test has no role to play, but you need to be cautious." Ackerman says she plans to have a new test tailored to the D.C. curriculum -- including a writing assessment -- in place for grades three, five, eight and 10 by the 1999-2000 school year.

The point, says Sheila Ford, is to help children learn better and more. If the scores can help, no matter how erratic or annoying or misleading they can be, they will be used.

Charlette Hedgman received the spring Stanford 9 results for Darius in May. The two sheets of paper looked as dense and chilly as the fall report had. But many of the black bars and check marks had shifted in a promising direction.

She sat on the couch and studied them. Last fall Darius had scored "below average" on 21 indicators. This time all but six of those check marks had moved to the average column. In the fall he had been "below basic" on all six main subtests. This time he moved up to "basic" in the three reading categories, although he was still "below basic" in mathematics. The nightly reading hour had proved its worth.

"You did much better," she told Darius. "But we still have some work to do." She told him he would have to go to summer school, as a new school policy required.

Despite all of Hedgman's doubts about the test, it appeared to have moved Darius along, just as its makers in San Antonio and the administrators on North Capitol Street had hoped.

Whatever its flaws, it had been a useful guide in motivating one child to spend more time with his books. His mother and teacher had taken the Stanford 9 and the fears and uncertainties it generated, and had used them as best they could. There was no question Darius had learned a great deal in second grade. They could see it, with or without the test.

Working in the carpeted, air-conditioned learning areas of Terrell Elementary's summer school, Darius spent weeks doing math games and quizzes and more reading. The well-funded special session, a crucial part of Ackerman's new focus on achievement, included extra classroom aides. Darius had 14-year-old Christopher Henry, a high school sophomore, to take him through his exercises. Like Darius's mother, Christopher marveled at the boy's ability to multiply but not add. They focused on addition.

The summer report card reached Charlette Hedgman at the end of July. It said Darius had improved in all areas. He had escaped retention and would be on his way to third grade.

Hedgman watched him dash gleefully away. She would get him to sit down and read soon. She still didn't like the clumsy way the tests were applied, but she had the rhythm now. The summer mathematics teacher had put it exactly right at the end of the report card: "Don't work too fast, but take your time, concentrate and apply yourself."

Jay Mathews covers public schools for The Post's Metro section.

Cutline: CHARLETTE HEDGMAN and son Darius, a student at D.C.'s Ketcham Elementary School.

PATRICIA ANDERSON, a D.C. educator: The tests were "in the best interests of children."

Articles appear as they were originally printed in The Washington Post and may not include subsequent corrections.

Return to Search Results