Race,
Academic Achievement, and School Reform
An analysis of the racial consequences of state mandated testing, and an agenda for change An assessment of
the API,
California's Academic Performance Index
Harold
Berlak
1. Reforming schools by mandating tests: the API how it works and where it came from 2. Institutional racism and the achievement gap 3. Multiculturalism and standardized tests 4. Test validity: what standardized tests measure and don't measure 5. The effects of centralized control of schools 6.
Reforming assessment;
reforming schools; an agenda for change
1.1
10/18/00
Race,
Academic Achievement, and School Reform
An
analysis of the racial consequences of state mandated testing, and an
agenda
for change
A case study of the API, California's Academic Performance Index Harold Berlak
1. Reforming schools by mandating tests: the API how it works and
where it came
from
In January, 2000 California installed a statewide public
school ranking
system called the Academic Performance Index or API. The State
Department
of Education released to the public and posted on its website
comparative
rankings of every public school in the state. The rankings made the
headlines
in all the state's major newspapers, and was a lead story on the
nightly
local TV news. Newspapers were filled with pages of tables and graphs
comparing
scores of local schools and districts across the state.
What the API and all similar indexing systems provide is a
uniform scale
for measuring educational productivity statewide, what one editorial
writer
approvingly called the equivalent of a Dow Jones Average for schools.
Every
public school in California is given a number from 200-1000 based on
the
school's average score on the Stanford 9 Achievement Test or Stat-9,
a standardized, multiple-choice test published, distributed, and scored
by Harcourt Educational Measurement. Every school is also categorized
according
to the relative affluence of the area it serves, and within its
category
ranked 1-10, worst to best based on the school's average on the Stat-9.
According to officials, at some time in the future other factors may
enter
into the calculation of API (attendance, and drop out rates, scores on
the State's new high school exit exams, etc.). But for now and for the
foreseeable future, in California, students' performance on a single
test,
the Stat-9, is by state regulation the final and controlling measure of
students' academic performance, school quality, and the professional
competence
of teachers, principals and school staff.
The great majority of states have introduced some form of
statewide
standardized testing. All to one degree or another have increased or
are
in the process of increasing centralized, state government control of
schools
by a policy of tying test performance to a system of state administered
rewards and sanctions. A statewide indexing system consolidates state
control,
not only by linking specific sanctions and rewards to test performance,
but by creating a single, articulated system of state control
accompanied
by an annual high profile public display of results. A uniform index of
school quality also sets the stage for market solutions to educational
problems with private ('independent') schools and profit-making
educational
management organizations competing with the public sector to improve
students'
standardized test scores. As of this writing Colorado, Florida, Texas,
have adopted some version of a statewide school indexing system and if
the past is any indication, states will soon follow California's lead.
There are three essential features common to all existing and
proposed
state indexing policies:
(1) School districts are required to administer the same test to all students with no (or almost no) exceptions. (2) There is exclusive or very heavy reliance on standardized tests as the measure of 'academic achievement', either a 'norm-referenced' or 'proficiency' test in one or more academic school subjects. (3) Test performance is linked to a centralized state system
of individual
and institutional rewards and sanctions
Its supporters claim index systems not only raise standards,
but also
promote educational opportunity because the process is color blind and
objective, thus free of racial and cultural bias. California's API, for
example, sets a score of 800 as the standard of excellence for all
schools
regardless of the socio-economic class, race, or languages of the
communities
the school serves. Despite the great disparities in physical and human
resources between rich and poor schools, proponents say the system is
fair
because in measuring a school's progress toward meeting 800 standard
richer
schools and poorer schools are compared to their own kind.
Schools whose average scores fall below the 800 mark are
considered
below standard and failing. The principal and the teachers are informed
by the State Department of Education that annual improvement targets
must
be met. Should a school repeatedly fall short of the targets, it will
be
taken over by the state. What this means is not yet clear, but is
generally
taken to include 'restructuring' and 'reconstituting' failed schools.
The
State of California could assume direct control, or perhaps contract
with
a non-profit or a profit-making management company to manage the
schools.
The entire school staff of the failed school, principals, teachers, and
counselors, are reassigned, demoted or fired. Though not formally
integrated
into California's API policy, there are dire consequences for
individual
students who fall below Stat-9 grade-level norms. They may be required
to repeat the school year, placed in an 'opportunity' or remedial
track,
or denied entry to special academic programs, tracks and/or to high
status
academic public schools.
In addition to the sticks, there are carrots. Schools,
students, and
teachers meeting the API targets are rewarded with access to
educational
programs, opportunities for professional advancement, and cash bonuses.
A school that in the course of a year decreases the shortfall to the
800
mark by 5%, could reap an additional $150 per student (the amount
depends
on annual state educational appropriation), a considerable sum for
resource
starved schools. The current Democratic Governor, Gray Davis, at one
point
proposed that the top scoring students on the Stat-9 be awarded
university
scholarships.
'High stakes testing' is the term most often used in
connection with
tests that are used for making fateful and irreversible educational
decisions.
What marks Stat-9 as a high stakes test, however is not the
test itself,
but the fact that the test is instrumental to the state's centrally
controlled
accountability system, the API, which links Stat-9 test results to
state
sanctions and rewards. .
Goals 2000 and Indexing Policy. Both state mandated standardized testing, and indexing policies are a product of a larger educational standards and accountability agenda that marches under the flag of 'Goals 2000.' It is a policy formulated during the Bush Administration, (1) actively pursued by Clinton and still supported by the great majority of centrist Republican and Democrats at the national, state and local levels. Put simply, this policy equates higher academic standards with rising test scores. The Goals 2000 legislation that was passed by Congress in 1994 adopted six lofty promises to be met by the year 2000 (2) including achieving
'world class'
academic standards coupled with greater equality of educational
opportunity.
Clinton's proposal contained provisions for national testing in reading
and math which were eliminated because of opposition by an odd mix,
progressive
educators, child and fairness in testing advocates led by FairTest,
civil
rights groups, the Black Caucus, and right wing Christian conservative
Republicans. What remained in the bill that was passed were a variety
of
federal government incentives to induce state governments to assume a
higher
degree of state control over the schooling process by imposing
statewide
testing.
Though Goals 2000 policies and its political and educational
effects
are widely reported in the education press, they have gone almost
unnoticed
in the national and local media and journals of opinion, right, left,
and
center. In March 1994, shortly after Congress passed Goals 2000, an
education
writer for the NY Times called the shift in control in the bill
historic,
and unprecedented, yet the story itself was buried in the inside pages.
The public's and the press' indifference to Goals 2000 was, however,
understandable.
To most, it appeared harmless, did not raise taxes, and would have no
noticeable
immediate effects. Life in schools would go on more or less as before.
Impact on schools and the everyday lives of teachers, school
administrators,
parents, and students would be enormous but was still several years
away.
In January 2000, the full weight of Goals 2000 first came to
California
in the guise of the API. The API is Goals 2000 in action with one very
notable departure from the original design as espoused in government
reports,
and officials in testimony before Congress. The original Goals 2000
design
called for 'world class standards' tied to a new breed of world class
tests
that were to be capable of assessing in-depth understandings, complex
ideas
and 'higher order' thinking required by the new demanding standards.
These were to be smart tests or 'tests worth teaching to'
a
phrase coined by Goals 2000 advocates. Although there are some
continuing
efforts to create smart tests tied to smart standards, they remain
exceptions
to the rule.
California spent nearly 100 million-dollar to produce the
first ever
state mandated smart test. It was the CLAS language test linked to the
State's 1987 standards. These standards (called 'curriculum
frameworks')
sought to reform the way reading is taught. Contrary to the critics'
claims,
this framework did not abandon phonics. It did, however move away from
reliance on commercially published 'basal' readers that were tied to
workbooks,
and encouraged literature-based, basic reading instruction, and a
writing
curriculum that emphasizes making connections to the child's interests
and experience.
It was a short-lived victory for progressive, multi-cultural
educators.
The framework and CLAS very quickly became enmeshed in California's
toxic,
race-charged electoral politics. The first state-mandated smart test
had
the misfortune of arriving at the end of the second year, a low point,
in the Clinton presidency, and on eve of the 1994 mid-term elections.
The
Republicans swept to power and took control of both houses of Congress
for the first time in forty years. Newt Gingrich was elevated to
Speaker
of the House and declared victory for his brand of backlash, right-wing
Republicanism. This was also the year Pete Wilson was reelected to a
second
term as governor of California. The early years of the nineties had not
been good ones for California's economy, and Wilson, a pro-choice,
conservative
Republican entered the election season with approval rating scraping
bottom.
His problems were compounded because the politically powerful southern
California Republican right wing lobby also disliked him because of his
pro-choice stance.
Wilson won a second term by riding the tails of the voters'
overwhelming
support of two infamous propositions, 185 and 187. The former called
the
'three strikes you're out' initiative mandated 25 year to life
sentences
for a third felony regardless of its seriousness. The latter sought to
deny schooling and virtually all social and health services to
undocumented
immigrants, the vast majority of whom are Mexican. Wilson courted the
right
wing, which had bitterly opposed the new literature-based language
standards.
He launched a virulent attack on the CLAS language test, charging that
the standards and CLAS served the ideological aims of multicultural
extremists
and the far left. Among the more offensive test items cited by Wilson
as
evidence of political correctness and multiculturalism run amuck was a
passage from Maxine Hong Kingston's Women Warrior, followed by
the
directions, 'Write an essay in which you interpret the moments of
silence
or inability to speak.'
The progressive and moderate forces that had supported the
policy of
'smart' tests tied to new 'smart' standards were outmaneuvered
politically,
overwhelmed, and soundly defeated. In 1998 CLAS was replaced with the
Stat-9,
a run-of the-mill commercially available standardized, multiple-choice
achievement test published by Harcourt Measurement. In 2000, the Stat-9
became the state's instrument for calculating the API.
2. Institutionalized
racism and the achievement gap
That there is a race gap in educational achievement is not
news. Large
numbers of the Nation's children leave school, with and without high
school
diplomas, ill educated, barely able to read, write and do simple math.
But the failures of the schools are not evenly distributed, they fall
disproportionately
on students of color. (3) Even when
parents' income and wealth is comparable, African-Americans, Native
Americans,
Latinos, and immigrants for whom English is not their first language
lag
behind English-speaking, native born, white students. The evidence for
the gap has been documented repeatedly by the usual measures. These
include
drop-out rates, relative numbers of students who take the advanced
placement
examination, who are enrolled in the top academic and 'gifted' classes
and/or admitted to higher status secondary schools, colleges, graduate
and professional programs. And last but not least, are the
discrepancies
in scores on standardized tests of academic achievement, on which
teachers'
and students' fate so heavily depend.
How is this achievement gap to be explained? I focus first on
the general
question and then separately on the statistical gap in standardized
test
scores. I draw readers' attention to the distinction between academic
performance
and academic achievement as measured by standardized tests. Though
often
spoken of as though they are one, they are clearly not the same. The
failure
to separate out the standardized test question is serious. It clouds
and
confounds the educational and policy issues and misleads us in efforts
to explain and eradicate the race gap in academic performance.
Over
the years the major reasons given for the claimed superior attainments
of whites in cultural, artistic, and academic endeavors were overtly
racist.
It was said that the explanation lay in the superior genes of white
northern
European, Anglo-Americans. As the social sciences developed in the
latter
years of the 19th and the 20th centuries,
'scientific'
tracts defending white supremacy appeared with regularity. By the
1930's,
the eugenics movement managed to gain a foothold in North American
universities.
And, it is relevant to add, all the leaders of this overtly racist
movement
were the leaders of the newly emerging field of scientific mental
measurement.
Many were the same men who testified before Congress in the early
1920's
lending scientific credence to the racist immigration exclusion acts
which
barred or greatly restricted immigration from China, Japan, Latin
America
and southern and eastern Europe. The eugenics movement was considered a
respectable academic discipline until it was discredited in the wake of
defeat of the Third Reich and the immensity of the crimes committed in
the name of Nordic racial purity. (4)
In 1969, the scientific case
for racism
was revived by an article published in the Harvard Educational
Review
by U.C. Berkeley education professor Arthur Jensen. Based on his
statistical
analysis of I.Q. test scores, he concluded that African-Americans were
genetically inferior to whites in general intelligence. His racist
thesis
was widely disseminated and discussed in the popular press and in
respectable
academic and policy circles. In time, Jensen's conclusions were
thoroughly
discredited by a spate of books and articles.
(5) In 1994 once
again
using standardized test data, Charles Murray and Richard Hernstein in The
Bell Curve claimed to have proven the inferior place of
black and
brown people in the social, political, and economic order was rooted in
biology. The arguments for the genetic superiority of the white race
were
again dismembered and discredited by many geneticists and biologists.
(6) Recently a more
subtle
form of 'scientific' racism has gained some respectability. The
inferiority
of the black and brown races is now said to lie not necessarily in
genetics
but in culture and history. This more quietly spoken academic version
of
the master-race ideology has also been thoroughly dismantled, yet
racist
explanations for the race gap persist. (7)
Once all
'scientific' arguments
supporting racism are dismissed how is the ever-present gap in academic
school performance to be explained? Numerous social and behavioral
scientists
have addressed this question.
A statistical
study conducted
by Professor Samuel Meyers Jr. at the Roy Wilkins Center for Human
Relations
and Social Justice at the University of Minnesota sought to determine
whether
poverty was a primary cause of the poor performance of black students
on
the Minnesota Basic Standards Test. (8)
Passing
this test is scheduled to become a prerequisite for a high school
diploma
beginning in 2000. In a 1996 trial run in Minneapolis, 75 percent of
African-American
students failed the math test, and 79 percent failed in reading,
compared
to 26 percent and 42 percent respectively for whites.
Using standard
statistical
indices, the researchers found, contrary to expectations, test scores
were
not statistically related to school poverty, neighborhood
poverty,
racial concentration, or even ranking of schools (except in the case of
whites). They did find that African Americans, American Indians, and
Hispanics
were underrepresented in the top ranked schools. African-Americans were
4.5 times as likely to be found in schools low ranked in math, and
twice
as likely to be found in schools ranked lowest in reading.
(9) For
both
white and students of color, success on the tests was positively
correlated
to whether an individual had been tracked. However, only 6.9 percent of
students of color compared to 23 percent of white students had access
to
'gifted and talented' programs. This study suggests that tracking and
the
quality of the academic opportunities affects both the test score gap
and
the gap in academic performance generally. While these correlational
studies,
are suggestive, they do not examine basic causes, nor explain the
pervasiveness
and stability of the gap over prolonged periods of time.
A set of
experimental studies
conducted by Stanford University professor Claude Steele, an
African-American
psychologist sought to explain the circumstances and situations that
give
rise to race gap in test scores. (10)
He
and colleagues gave equal numbers of African American and white
Stanford
sophomores a thirty-minute standardized test composed of some of the
more
challenging items from the advanced Graduate Record Examination in
literature.
Steele notes all the students were highly successful students and
test-takers
since all Stanford students to be admitted must have earned SAT scores
well above the national average. The researchers told half the students
that the test did not assess ability, that the research was aimed at
'understanding
the psychological factors involved in solving verbal problems.' The
others
were told that the test was a valid measure of academic ability and
capacity.
African American students who were told that the test was a true
measure
of ability scored significantly lower than the white students. The
other
black students' scores were equal to white students'. Whites performed
the same in both situations.
The explanation
Steele offers
is that black students know they are especially likely to be seen as
having
limited ability. Groups not stereotyped in this way do not experience
this
extra intimidation. He suggests that 'it is serious intimidation,
implying
as it does that if they should perform badly, they may not belong in
walks
of life where their tested abilities are important --walks of life in
which
they are heavily invested.' He labels this phenomenon 'stereotype
vulnerability,'
In another study, Steele and colleagues found, to their surprise, that
students most likely to do poorest on the tests, were not the least
able
or prepared academically. To the contrary, they tended to be among the
more highly motivated and academically focussed. While Steele's
research
provides a psychological explanation for the gap, it does not probe the
historical, social and cultural factors that have created and continue
to sustain these stereotypes. We are left with no explanation of how
'stereotype
vulnerability' is created by, and shapes everyday life in society and
at
school.
The previously
cited studies
focus on the gap in standardized test scores. The final study cited is
one of a large number of recent 'qualitative' studies, observational,
historical, and ethnographic studies that address the achievement gap
and
the test score gap, and illuminate relationships of culture, gender,
and
race to the social relations within the classroom and school.
(11) Signithia
Fordham,
an African American anthropologist, in a study of a Washington D.C.
public
high school, focused on how the 'hidden' and explicit curriculum shape
student aspirations and achievements, and how students of differing
cultural,
racial, and social backgrounds respond to the schooling experience.
(12)
Hers
is a multifaceted and complex study, including interviews, participant
observation, questionnaires, and field notes, gathered over a four year
period. She concludes that for black students, patterns of academic
success
and underachievement are a reflection of processes of resistance that
enable
African-Americans to maintain their humanness in the face of a
stigmatized
racial identity. She shows that African American adolescents' profound
ambivalence about the value and possibility of school success is
manifest
as both conformity and avoidance. Ambivalence is manifest in students'
motivation and interest in schoolwork, which of course includes
learning
standardized test-taking skills.
The following two
quotes
are taken from interviews with two African-American men. The first is
from
a young lawyer employed in a Washington D.C. firm who had been a
National
Merit finalist and whose test scores were among the top 2% in his
state.
[Commenting on
why he
was disappointed with his career] I realized that no matter how
smart
I was [in school] or how hard I was willing to work [in the law firm]
that it wasn't going to happen for me. . . .Don't get me
wrong, integration
has been great for my life. Without it, I would be playing on a much
more
restrictive field, [but] there's no doubt in my mind that I would be
much
more successful today if I were white.
A high-performing
African-Americans
high school student offers the following view of why African Americans
often underperform in school, and also expresses his doubts that his
own
school success will be rewarded.
Well, we supposed
to be stupid
. . . we perform poorly in school 'cause we all have it thought it up
in
our heads we're supposed to be dumb so we might as well go ahead and be
dumb. And we think that most of the things we learn [at school] won't
help
us in life anyway . . . What good is a quadratic equation gonna do me
if
I'm picking up garbage cans?
Fordham found that
even the
most academically talented African-American high school students
expressed
profound ambivalence toward schooling, and uncertainty that they will
reap
the rewards of school success. Virtually all African-Americans she
interviewed
indicated that a central problem facing them at school and in larger
white
society is the widely held perception of whites that African-Americans'
are less intelligent and a continuing need to confront and deal with
this
in everyday experience.
These three
studies taken
together suggest three related explanations for the race gap in
academic
achievement and in test scores. First, is students' perceptions of the
opportunity structure in the wider society, of the options open and the
objective chances of 'making it'. Second, are the educational
opportunities
available in the educational system itself -within school districts,
schools,
and within each classroom. Third, is the cumulative psychic and
emotional
effects of living in a social world saturated with racist ideology, and
where racist practices and structures are pervasive, and often go
unnamed.
What does the
gap in scores
mean
What is almost
always overlooked
in all these explanations is the size of the test score gap itself.
Most
assume that the statistical gap in test scores between persons of color
and whites is enormous. It is not. There is an eight to ten percent
difference
in test scores on academic standardized achievement tests which has
persisted
over time, regardless of the type of test, whether it is a 'IQ' test
norm-referenced
or proficiency test, regardless of a test's publisher, or educational
level
of the test-taker, be it kindergarten or graduate school.
(13)
Figure
1 illustrates graphically an eight per-cent difference using as an
example
California's CBEST, a standardized test of basic literacy required for
entry to a teaching credential program.
Figure
1
It is important to
note that
the distributions of scores are highly overlapping. An eight to ten
percent
gap amounts to a mere handful of test items. In the illustration above,
the gap is an average of 3.2 multiple-choice test item on a fifty item
multiple-choice test.
(14) From
an
educational point of view this is a minor difference. Because of the
way
the tests are 'normed' and cut scores set, minor differences in number
of correct multiple choice test performance create greatly inflated
failure
rates for persons of color. On CBEST, for example, African -American
test-takers
are 3.5 times more likely to fail the test than whites, Latino/
Hispanics
more than twice as likely, and Asian Americans more than 1.5 times as
likely
to fail than whites. (15)
Figure
2
Number and
Percentages of
First-time Failures on CBEST 1985-95
Numerous
researchers have carefully documented the highly disproportionate
adverse
impact on students of color of standardized achievement testing.
(16) An
argument
might be made that these differences in test score while small,
nevertheless
represent real differences in performance, and that tests though
imperfect
eliminate the incompetent, those most likely to perform poorly at
school
or on the job. Steele's study suggests the opposite --that the more
talented
students are at greater risk of failure. As documented in section 4,
there
is no evidence to support the claim that standardized tests are valid
and
credible measures of academic achievement or intellectual capacity.
There
is no demonstrable connection between observed academic performance and
standardized test scores. Test scores do not predict future success in
school, the university, or in the workplace. In the case of CBEST,
several
studies were conducted to explore the link. CBEST showed no correlation
to current or future performance on the job, or to observed levels of
literacy.
While some tests do correlate statistically to future grades, this
correlation
is short lived.
(17) What
standardized
achievement tests appear to predict best are scores on other similarly
constructed tests, and parent's wealth. As reported by Peter Sacks,
socio-economic
class accounts for approximately 50% of the variance in SAT test
scores.
He estimates that for every additional $10,000 in family income, a
person
on average gains 30 points on the SAT.(18)
Among
the more commonly heard explanations for the gap in standardized test
scores
is that the tests themselves are culturally and racially biased. What
this
has usually been taken to mean is that the bias is lodged in the
content
or language of individual test items. In the early years of mental
measurement
the racism of the test items was blatant. In more recent years, major
test
publishers have made efforts to review and eliminate items with overt
cultural
and racial bias. Though item bias remains, it is implausible to
conclude
that all the publishers in all their tests knowingly or unknowingly
managed
to create tests with an almost identical ratio of biased to unbiased
items.
The fact is that scores on all commercially produced tests show
the same eight to ten percent gap suggests that the gap cannot be fully
explained by racial or cultural bias lodged in individual test items.
Rather,
the bias is systemic and structural -that is, built
into
in the basic assumptions and technology of standardized testing, in the
way the tests are constructed, administered, the way results are
reported
and in the organizational structure and administrative rules of the
accountability
system itself.
There
is perhaps no clearer illustration of how the differences among the
races
are greatly exaggerated and distorted than the numerical scales used to
report results. There is, as I have noted, an eight to 10 percent
difference
in scores between white and nonwhite students. On a 100 point scale
this
amounts to a gap of ten points. California's Academic Performance Index
or API (which is based entirely on students' scores on the Stanford
9Achievement
Test) constructs a 200 to 1000 point scale, and a 10 percent difference
in scores morphs into a formidable 100 points. The SAT, the most
commonly
used test for college admissions, also frequently used
(inappropriately)
to rank states academic performance creates a 400-1600 scale. In this
instance,
a ten percent difference becomes a 120 point chasm .
A major goal of social
reformers of
the 20th century was the elimination of legalized
segregation.
We still live in a society that is separate and unequal. To achieve
social
and economic justice, the goal for the 21st century must
become
the elimination of institutionalized racism in all sectors of social,
economic,
cultural, and political life -in business, housing, employment law
enforcement,
the courts, health-care institutions, and, of course, schools. What
makes
institutionalized racism so pernicious and difficult to eradicate is
that
racist practices are often invisible because they are accepted as
standard
operating procedures within our institutions.
Standardized tests are a
particularly
invidious form of institutionalized racism because they lend the cloak
of science to policies that have denied, and are continuing to deny the
next generation of persons of color equal access to educational and job
opportunities. An educational accountability system based on
standardized
testing, though predicated on 'standardized' measurements which are
purportedly
neutral', objective, and color-blind serves to perpetuate and
strengthen
institutionalized racism. (2886)
3.
Multiculturalism, curriculum, and learning
Tests are the
single most
important influence on the content of the school's curriculum and how
it
is taught. All tests, including those composed by teachers at the
school
level, confer upon students' attitudes, ideas, and images of what
matters,
and just as important, what does not. But the shift in power over the
assessment
process from the school and local community to state government
represents
a momentous and qualitative change. The power of the state over rewards
and sanctions imposes singular answers to the questions of what schools
are for, what constitutes genuine knowledge and learning, and what the
next generation should and should not learn at school.
Whatever does not
contribute
directly to short-term gains in test scores is marginalized --critical
thinking, interdisciplinary studies, music, the arts, physical
education,
and forms of multicultural curriculum and bilingual education that are
not add-ons, but integral to the entire curriculum. Even if tests
incorporate
some multicultural, and multiracial content, state control undermines
local
efforts by parents, teachers, principals and elected officials to
revitalize
their local schools, rethink curriculum and pedagogy, and respond in
ways
that cultivate the major assets of a multicultural society -its racial
and cultural diversity and a heterogeneity of perspectives on
knowledge,
culture, and learning.
Finally, among the
most serious
negative educational consequence of high stakes state-mandated tests is
that teachers and administrators in low scoring schools are under such
extraordinary pressure to raise test scores, that those most likely to
be first in line for a narrow and culturally truncated curriculum, and
the recipients of shrinking educational opportunities are the children
of the poor, immigrants, and people of color.
While President
Clinton and
other defenders of the excellence via testing policies are never heard
proclaiming that one of the chief purposes of government mandated
testing
and indexing policies is to employ government power to unify the
culture,
it is clear that from the beginning that this has been a chief
corollary
goal of the architects of these policies. The seminal 1992 report, Raising
Standards for American Education that launched Goals 2000 argued
that
testing tied to national standards would 'bind together a wide variety
of groups into one nation,' providing 'shared values and knowledge'
which will serve 'as a powerful force for national unity.'
Lauren
Resnick, an academic advocate for 'smart tests', a former president of
the American Educational Research Association and one of the
originators
of the New Standards Project, argued, 'Without performance standards,
the
meaning of content standards is subject to interpretation,
which
if allowed to vary would undermine efforts to set high standards for
the
majority of American students' (italics added). Nicholas Tate, chief
executive
of the British government's 'Qualifications and Curriculum Authority'
is
more forthright. He said in an interview, 'Today, we face the
widespread
belief that there are no underlying shared values in our society, that
people are no longer willing to go along with what the school says.
That
is why we are beginning to make explicit, what has hitherto been
implicit.'
It is no
coincidence that
this concerted effort by governments to gain near monopoly control over
the curriculum arrives at the time that social movements have appeared
and are challenging the cultural dominance of western Anglo-European
traditions
in the curriculum. The multicultural, bilingual movements are
expressions
of the will of men and women of differing races and ethnicities, an
assertion
of their rights, which includes the right to reclaim cultural power,
and
to forge their own cultural and social identities. But some see the
diversity
and heterogeneity of multicultural and bilingual movements as a threat
to national unity, fostering balkanization of the nation, and the
erosion
of culture and academic standards. Ironically, the Goals 2000 plan of
government
imposed common curriculum tied to mandated testing as a way to foster
social
stability and promote national unity achieves the opposite. In
practice,
it exacerbates inequalities and provokes racial, cultural, and social
strife.
As the locus of
control of
the assessment system (with its sanctions and rewards) shifts power
upward,
not surprisingly, new arenas for political culture warfare are opened
at
the national state levels, within the bureaucratic apparatus of the
executive,
legislative, and judicial branches of government. Decisions are being
made
increasingly by politically appointed 'blue ribbon' commissions, panels
of experts and consultants many degrees removed and insulated from
local
community concerns, and remote from children, teachers, classrooms, and
schools. What best serves interests of this child, this classroom, and
this community is lost as control of the curricular and pedagogical
decisions
shifts to the upper reaches of government where the 'stakeholders" are
those who are well organized and possess the financial resources and
power
required to compete in the political arena at the state and national
levels.
The stakes become
higher
as the decisions are made farther and farther away from concrete
situations,
classrooms and schools. Differences over difficult moral, cultural, and
educational questions are magnified and intensify and (as in
California)
likely to become entangled in acrimonious, racially charged electoral
politics.
The kinds of
learning required
of citizens in the modern world cannot be achieved by standardized and
centrally imposed systems of learning. Human learning to be effective
and
long-lasting requires the engagement of the learner on his or her own
behalf,
and rests on the relationships that develop between schools and their
local
communities, and between teachers and their students. Powerless school
communities and teachers cannot produce powerful, engaged citizens
committed
to social and racial justice, and the public good.
(19) (915)
4. Testvalidity:
what standardized tests measure and don't measure
.
How
credible and dependable are mass administered standardized tests as
measures
of academic achievement? All of us brought up and schooled in America
are
familiar with standardized tests. We sit in a classroom or auditorium
well
spaced from fellow test-takers, given a time limit and booklet of test
'items' --passages of text, math problems, tables, diagrams or charts.
From four or five possible responses, we choose the one correct or best
answer, and darken the appropriate bubble on the answer sheet. Some
standardized
tests include 'open- response' items, which are scored according to
standardized
procedures.
Broadly speaking
there are
two types of mass-administered, standardized educational tests used in
current accountability systems. The first are 'norm referenced' tests
(sometimes
referred to as 'standardized' tests). These tests do just that; they
create
norms, percentiles or grade equivalency scores which indicate an
individual's
or group's standing on the bell curve relative to all others taking the
same test.
The second type
are 'proficiency'
or 'basic skills' (sometimes called criterion-referenced) tests which
employ
either a pass/fail or proficiency level 'cut ' score.
From the point of
view of
the test-taker, the types are indistinguishable. They look alike,
contain
very similar test items, and are generally given under the same time
constraints
and conditions. (20)
Both use standardized scoring and reporting procedures and rely on
'normal'
or bell curve statistical models. The major difference is in the way
each
creates failure. A norm-referenced test is deliberately constructed and
pre-tested to yield scores distributed so as to approximate the
so-called
'normal' curve, The curve defines the percentages of high, medium and
low-test
performers. (See Figure 3)
Figure
3
(Figure
3 is an illustration of a normal curve showing mean, and percentages of
students expected to score at ± 1, 2 and 3 standard deviations.
The
technology of the norm-referenced test accepts as given the dubious
assumption,
that (if the sample is random and large enough) virtually all human
qualities,
traits, capacities, achievements, etc., if properly measured will
approximate
the bell-shaped 'normal' curve. Failure is created by the bell curve.
With
a proficiency or 'basic skills' tests, failure is created by a
particular
cut or proficiency score selected by a group of human beings, elected
or
appointed government officials and/or a panel of experts chosen by
these
officials or by the testing company under contract to a state
agency.
The most
fundamental problem
with both types of standardized academic achievement tests is that
there
is little evidence to support the contention that they measure what
they
purport to measure -academic achievement, or proficiency. This does not
mean that academic achievement and high standards are not vital. Rather
it is that the tests have very little relationship to actual academic
performance
of any kind. For some standardized tests there is correlation to grades
at least in the short term. But, for virtually all standardized reading
and writing tests, there is no demonstrable connection between a
person's
performance on a standardized reading test and a person's reading
abilities
in the real world --in everyday life situations, at school, work and
elsewhere
where one might be called on to read. What this means is that
contrary
to common sense, Student A's , score at the 45th percentile
and Student B's at 95th percentile on the Stat-9 reading
test
(or any other norm-referenced tests) says nothing whatsoever about the
actual or relative reading performance of students A or B.
The standardized test informs us only how every test-taker's score on
the
test compares to everyone else taking the same test.
A score on basic
skills'
or 'proficiency' test tells us only how far above or below the
established
cut-off a student's score falls. Cut scores on academic proficiency
test
are not based on actual or observed level of competence or
proficiency.
There have been numerous studies that have explored the relationship
between
test performance and actual performance, and researchers have
repeatedly
come up with the same conclusion: no (or almost no) connections.
Neither
do the tests meet the criterion of 'predictive validity'.
Norm-referenced
and proficiency tests (except for grades in the short term) do not
predict
future success in school, the university, or in the workplace.
(21) What
the
tests predict best is a person's score on similarly constructed test
and
parent wealth.
The failure to
ground cut
scores in performance applies to California's Academic Performance
Index
(API). The score of 800 established as the mark of excellence for all
schools
to aspire to reach and exceed is a wholly statistical construct. It is
not based on any direct observational evidence or documentation. It is
extraordinary and also sadly ironic that the cut score now driving
state
educational policy for achieving educational excellence is not grounded
in any way on educational excellence and high academic achievement as
they are manifest in the real world of teaching and schools.
The seriousness of
the problem
of the failure to ground standardized tests in actual performance is
now
widely acknowledged. Clinton's Secretary of Education, Richard Riley, a
longtime Goals 2000 testing enthusiast, in his final 'State of American
Education" address urged the states to stay the course, but also
cautioned
state officials about the dangers of relying on a single test for
making
high stakes decisions. The Secretary's caution is a response to
internal
and external pressures -including the US Department of Education own
Office
of Civil Rights (OCR) which has very recently (2000) issued guidelines
which assert that the use of test scores as the single factor to
determine
retention, graduation, and college admission is improper and possibly a
violation of Civil Rights law.
The OCR
guidelines' are grounded
in two recent studies conducted by the National Research Council of the
National Academy of Sciences, (22)
These two studies are but the tip of the iceberg. There is a vast
literature
in mainstream psychological and educational measurement research that
raises
fundamental question about the meaning and usefulness of
norm-referenced
and conventional standardized proficiency tests.
For the first time
in its
50 year history, the 1999 revision of Standards, for Educational
and
Psychological Tests produced jointly by the American Educational
Research
Association (AERA), American Psychological Association (APA) and
National
Council on Measurement in Education NCME) indicates that the validity
of
educational tests cannot be established without reference to how they
contribute
to the improvement of student learning and consideration of the
negative
consequences of test use. Further, the standards assert that no
"decision
or characterization" of students which has major impact on their future
should be made on the basis of a single test score, and caution against
the use and interpretation of tests for students with learning
disabilities
and with limited English language proficiency.
State mandated
testing and
indexing policies that distribute rewards and sanctions based on test
results
are common and are increasing. This in the face of almost universal
agreement
among independent experts on the technical limitations of standardized
achievement tests and that their use as a high stakes measure of
educational
achievements or capacities is destructive, misleading and
inappropriate.
Government regulations and mandates linking tests to high stakes
decisions
proliferate, at the same time that the standing of standardized tests'
as trustworthy instruments of modern social science has never been
lower.
Why are indexing policies that strengthen state control so readily
endorsed
and supported by politicians, corporate leaders, and national teachers
unions? This question is addressed in Part 6 that explores
possibilities
for fundamental changes in direction in assessment policy.
Because tests have
long been
used to determine merit and access to top academic tracks, special
programs
and high status schools and universities, they have been challenged
over
the years politically and in numerous suits alleging violations of US
civil
rights law. Much of the heated controversy over affirmative action also
rests on the continuing use of standardized testing to define merit.
(1300)
5.The
effects of centralized control of schools
An independently
funded set
of studies conducted by the Centre for Assessment Studies at
the
University of Bristol, UK sought to map empirically the consequences of
the Education Reform Act of 1988, initiated by the Conservative
government
led by Prime Minister Margaret Thatcher. (23)
This law created a nationwide school indexing system for England and
Wales,
and shifted control of curriculum from individual schools, school
councils
and local Educational Authorities (school districts) to a central
government
authority called OFSTED (Office of Standards in Education). A team of
researchers
over a period of eight years (1989-1997) studied a national sample of
primary
schools employing a wide range of systematic social scientific
qualitative
and quantitative methods. The study produced dozens of articles and
four
major books, the most recent to be published in late 2000.
(24)
The researchers
document
the grand mismatch between policy intentions and the outcomes. Rather
than
erasing educational inequalities and raising the level of academic
accomplishment
as promised, the state-mandated assessment process served to obstruct
learning,
perpetuate and increase disparities. Tests, even 'good' tests served to
distort and disrupt learning, in particular for bilingual students
whose
first language was not English. Documented also was a dramatic
narrowing
of the curriculum and restriction of the range of learning
opportunities,
increased devaluation of teacher knowledge, decline in teacher and
headteacher
(principal) morale. There were also increases in pupil anxiety and
dysfunctional
changes in school structure and governance. This included various forms
of resistance by teachers and administrators to somehow hijack the
rules
and circumvent the system. They engaged in behaviors some might call
'cheating,'
and others principled defiance of government regulations that denied
opportunity
and stunted student learning. Other studies have documented that
headteachers
once highly independent and insulated from the twists and turns of
national
electoral politics became highly vulnerable. As a consequence, it has
becoime
increasingly difficult to recruit and retain creative and talented
teachers
and administrators in schools located at the bottom of the school
rankings
--which, in England, as in the US, serve children of the poor,
immigrants,
persons of color, a majority of whom live in the nation's most
distressed
urban areas.
Though there are
structural
differences between the English and US systems, this set of studies is
significant and relevant because California's API is modeled on
Britain's
1988 School Reform Act. In 1998, in the waning months of his
administration,
the press reported that the outgoing governor Wilson had tea with
former
Prime Minister, now Lady Thatcher. He reported that his proposal to
establish
the API was based on his great admiration for the centralized system of
school accountability she had installed. The study is also important
because
it leaves little doubt that the central issue is control. The negative
effects of centralized curriculum control and indexing are evident
regardless
of the quality of the assessment instruments used. By US standards,
many
of the British assessment tools would be considered 'smart' tests.
Thus,
even if the flaws built into standardized tests such as the Stat-9
could
be remedied, or if the tests were tomorrow replaced with a new
generation
of 'authentic' tests, there is little doubt that the negative effects
of
centralized state control of the assessment system carefully documented
by the British researchers would remain in place. The British findings
are also particularly instructive to those reformers who focus almost
entirely
on eliminating or correcting the deficiencies of standardized testing,
and ignore both the race question and the fundamental issue of power.
Who
is in control of the curriculum and the assessment system
There is no
comparable comprehensive,
longitudinal study of the impact of a policy of government curriculum
mandates
in the US. There is, however, a large body of research on the use of
standardized
tests in making high stakes decisions. Numerous studies have been
conducted
by researchers from some of the nation's leading educational research
universities,
and independent R&D centers devoted to evaluation, testing and
assessment.
In a recent issue of the Educational Researcher, the lead
journal
of the American Research Association, Robert Linn who is among the
nations
respected experts on educational testing, reviewed
fifty years
of research on the use of tests and assessment in accountability
systems.
He concludes that 'common standards and testing encourages a narrowing
of educational experiences for most students, dooms many to failure,
and
limit the development of many worthy talents.' This, he adds, 'should
not
to be misinterpreted to mean that one should not have high standards
for
all students...[H]aving high standards is not the same as having common
standards for all.' Professor Linn, concludes with an extraordinary and
damming commentary on the current state of the science of educational
measurement.
As someone who has
spent
his entire career doing research, writing and thinking about
educational
testing and assessment issues, I would like to conclude by citing a
compelling
case showing that the major uses of tests for student and school
accountability
during the past 50 years have improved education and student learning
in
dramatic ways. Unfortunately, that is not my conclusion. Instead I am
led
to conclude that in most cases the instruments and technology have not
been up to the demands placed on them by high stakes
accountability…Assessment
systems that are useful monitors lose much of their dependability and
credibility
. . . when high stakes are attached to them. The unintended
negative
effects of high stakes accountability…often outweigh the intended
positive
effects. [Italics added] (25)
Among the most
egregious
examples is the use of standardized tests to drive school retention and
promotion policy. It is also a striking illustration of how the science
and technology of educational testing is used to strengthen
institutional
racism. In recent years, 'social promotion' has cited by Chester Finn
of
the right wing Fordham Foundation and by President Clinton (26)
as a major culprit in depressing the nation's educational standards.
Several
states and many school districts have responded with tough 'no social
promotion'
policies. Whether an individual's grade advancement or graduation is
considered
'social' (meaning undeserved) or not is being determined solely by a
single
standardized test score. In the public mind and in the popular press,
generally,
all reasons for promotion except for standardized test score are
considered
'social.' From the point of view of public policy this an absurdity. It
is known that students who are retained have higher dropout rates. It
is
also known that disproportionate numbers of students of color drop out.
(27) It is
impossible to sustain the argument that policies that have been shown
to
degrade curriculum and pedagogy, increase drop-outs and exacerbate
inequalities,
and that have no known educational benefits, will improve the level of
education of the nation's youth or enhance their chances of competing
in
new global economy. It is also a debasement of the social and
behavioral
sciences, when the observations and judgments of all the adults in a
child's
school life, parents, teachers, principals, counselors, teaching and
learning
specialists, those with direct first-hand experience, are
overridden,
dismissed as 'social', scientifically unfit and subjective, while
standardized
tests are valorized as the one and only scientifically valid measure of
academic performance. (1188)
6.Reforming
assessment; reforming schools
The cornerstone of
the Goals
2000 standards movement -raising educational standards by central
government
mandates that tie test scores to rewards and sanctions- is self
defeating.
The centralization of authority and the proliferation of standardized
testing
which has become pervasive in the past decade have shown no evidence of
positive results. Indeed to the extent to which these policies have
been
implemented, there is substantial evidence that they have more often
than
not served as an obstruction to the pursuit of educational excellence
and
equity. To repeat the words of Robert Linn, 'the evidence indicates
that
the unintended negatives of high stakes accountability systems probably
outweigh the intended positive effects.'
This conclusion,
however,
should not be mistaken as a rejection of the importance of raising
standards
and public accountability, nor as rejection of the need for national
and
state and local governments to use executive, legislative, and judicial
power to protect student, parent, and community rights -and to take a
strong
affirmative role in the pursuit and maintenance of high educational
standards
for all. Also, the fact that current national and state school reform
policies
are almost totally reliant on an arcane and deeply flawed test
technology
does not in any way diminish the need for accountability nor for
effective
and appropriate forms of testing and assessment.
Disputes over
educational
reform and accountability are often, sharply polarized, cast in terms
of
top-down vs. bottom-up, two apparently contradictory perspectives and
sets
of remedies for reforming and assessing the nation's schools.
Government
mandated testing linked to a uniform system of rewards and sanctions is
the defining example of the former. The bottom-up view stresses local
school
and community-based initiatives, rooted in face-to-face encounters
among
teachers, principals, and parents in collaboration with community and
local
officials.
An assessment
system in fact
must serve both, 'top-down' and 'bottom-up' functions. On the one hand,
it must provide dependable information to school authorities, advisory
and governing boards, state legislators, local officials, etc., so they
may be better informed to make policy decisions about the distribution
of public resources. A systematic assessment process is key for holding
districts, school officials and teachers accountable for the quality of
their performance. On the other hand, the system of assessment must
also
provide information that serves the educational needs and interests of
each individual child, strengthens local school and community level
reform
initiatives aimed at improving teaching and learning, and cultivates
the
integration of diverse cultural historical perspectives and language
traditions
into the school's curriculum and pedagogy. To serve the nation, and
serve
children of diverse
cultural, racial, ethnic, language, and religious traditions,
there must be an appropriate balance of power between central
government
authority and local school and community control. How the power over
rewards
and sanctions is distributed is the key.
A more, balanced,
inclusive,
effective and democratic set of national and state educational
assessment
policies is workable, possible, and not beyond reach. A shift in power
would temper and moderate the already highly disproportionate power
held
by federal and state government authorities. A shift would restore
balance
by giving significantly more voice and greater responsibility not to
state
governments but to the 'grassroots'--to individual schools, teachers,
parents,
and local communities.
What follows from
a shift
away from the center is that many of the differences rooted in
fundamental
moral religious, and cultural beliefs, including philosophical and
ideological
differences related to curriculum, pedagogy and learning, would be
resolved
face-to-face, by locally constituted groups in an open consultative
process,
rather than by panels of experts, executive or legislative commissions
appointed by state and federal officeholders and far removed from local
communities, schools, children, and teachers. The resolution of the
basic
dilemmas of teaching, learning, and curriculum would be distanced from
divisive, xenophobic, electoral politics.
Obstructions to
change
Redressing the
power imbalance
is possible but by no means assured. The political and institutional
support
for current centralized policies is strong, and the resistance to
rethinking
and reformulating of national and state assessment policy is
considerable.
Though
the Goals 2000 standards movement shows no promise of producing
excellence
and equity as its proponents promised, it has wide support by the
public
and among politicians from the presidential candidates on down. Why,
despite
the intense criticism and in the face of increasing resistance by
students,
teachers, and community activists, (28)
does the pro testing/standards perspective continue to have such a
strong
hold on popular opinion, and remain dominant?
The cultural /
psychological
barriers to rethinking assessment policy and practices are formidable
As
a culture, Americans believe in tests, standardized tests in
particular.
Tests used for making high stakes decisions have a deep psychological
hold
on us because we are part of, and surrounded by, a culture where the
need
to assign numbers to performance and to compare and rank order
individuals
and institutions is seen as self-evident. In a world where rank and
test
scores matter, we also assume that test scores will tell us whether our
children are prepared to compete in the hard, cold world. We also want
our children's local school, and school district to be among the best.
In addition, to a lesser or greater degree, most of us schooled in this
society have come to accept standardized tests as a measure of our self
worth, particularly with respect to our estimate of our and our
children's
intellectual and academic capacities and abilities. For many Americans,
an educational system without or with a very greatly diminished place
for
standardized tests is inconceivable.
Any remedy that
would disperse
power downward to schools and communities will be greeted with
skepticism.
Though Americans celebrate democracy and democratic values, when
confronted
by difficult problems or a crisis, as a nation and culture, we are
inclined
not to more democracy but less. The political and popular culture more
readily endorses solutions that promise immediate measurable results
and
that rely on hierarchical power relations backed up by a universally
applied
system of rewards and punishments.
There are also
political
and economic obstructions to rethinking and reforming assessment. A
whole
generation of mainstream politicians, governors and ex-governors (e.g.
Bill Clinton, Lamar Alexander, Richard Riley, George Bush 1&2, Al
Gore),
many state education officers, legislators, corporate and national
union
leaders, remain fixed on a standards movement predicated on raising
test
scores. Despite its failure, the policy persists, in part because they
have no other solutions to offer. Also, those now in office who were
responsible
for conceiving of and instituting these policies are not likely to
concede
that the Goals 2000 plan is a total failure.
Furthermore,
testing is big
business. Several of the largest test publishers and service providers
are divisions of the major textbook publishing firms, which in turn are
part of larger publishing and media conglomerates. According to the
Bowker
Annual, direct expenditures on tests doubled annually between 1980 and
1997 to 200 million dollars. These figures do not reflect increases in
state mandated testing programs over the past five years, mandates on
the
books, but not as yet implemented. Neither do these figures account for
indirect costs which includes a large army of experts and consulting
firms,
state and district-level bureaucrats whose livelihoods and careers
depends
upon administering, scoring, analyzing, classifying, reporting, storing
test data, and insuring compliance with government regulations. In
1993,
researchers at the Center for Evaluation and Policy Research at Boston
College estimated overhead cost at 20 billion annually.
(29)
Finally, one of
the more
formidable obstructions to significant change in assessment policy is
the
widely voiced belief that whatever the deficiencies of high stakes
standardized
testing policies, there are no alternatives or at least no economically
feasible alternatives to standardized testing. A very commonly voiced
concern
is that without centralized testing, the system of education would be
undermined
and flounder because there are no other practical ways to raise
standards,
assess educational progress, and to sort students and evaluate
teachers.
Policy
alternatives
The educational
policy issues
surrounding testing, assessment, and public accountability are
immensely
complex. This report does not offer a sweeping blueprint for reform of
the system of accountability that will tomorrow overturn and repair the
damage created and fostered by Goals 2000 policies. It does however
propose
five guiding principles for reform, and three fundamental issues that
must
be addressed in formulating and pursuing alternatives. This essay
closes
with recommendations for shifting the balance of power in order to
create
a fair and effective accountability system.
Principles:
A fair and
effective accountability
system will:
1. help to achieve
and maintain
high educational standards, but will not seek to standardize the
curriculum,
the learning process, nor attempt to impose a singular view of
knowledge,
language and culture;
2. contribute to
the education
of the nation's children to the full range of their talents and
capacities;
3. serve to assure
equitable
distribution of resources and equality of access to educational and job
opportunities;
4. serve to
encourage and
reward initiative and meritorious performance --schools, teachers and
students;
5. contribute to
erasing
institutional racism.
Three
key issues
Virtually absent
in discussions
of educational excellence by mainstream press political leaders is the
pervasiveness of institutional racism, and of the enormous inequities
in
human and material resources, between the richest and poorest
schools
It is of vital
importance
that the accountability system specifically address the legacy of white
supremacy and institutionalized racism legitimated by standardized
testing,
a legacy that lives on in the present. Institutional racism is manifest
not only in disproportionate outcomes, but is built into the
instruments
and the assessment technology itself. Racism, of course, is also about
who has power and who doesn't when basic decisions are made about
allocation
of resources, curriculum content and teaching methods, eligibility for
programs, grade advancement, and the awarding of educational
credentials.
And most important, who sets the rules, names the 'stakeholders' and
makes
the final decisions.
The accountability
system
to be fair and effective must make affirmative efforts to counter the
institutional
racism currently built into the technology of the instruments of
assessment.
Procedural
and structural protections against institutionalized racism depend on
proportionate
distribution of decision-making power with a significant degree of
cultural
control vested at the school and community levels.
Technology.
Contrary to widely
held belief,
there is no shortage of systematic evaluation instruments for assessing
teaching and school learning and for gauging the quality of 'academic'
and other forms of school learning. (30)
Some of the 'alternatives' are highly developed and have been shown to
provide teachers, parents, and local officials with useful information
for enhancing student learning and/or making local and internal school
policy decisions. Some of these approaches are more cost efficient than
conventional standardized tests because the time spent on assessment is
not lost but integral and additive to the teaching and learning
process.
It is also important to note that the use and interpretation of these
instruments
is dependent on the social context and particular situation. Thus, none
are suitable for producing a single numerical scale that serves as a
universal
measure of the educational productivity for all schools, teachers, and
students.
The technology of
multiple
choice standardized testing was developed in the first two decades of
the
20th century at a time when mechanical hole punch and manual
sorting with pins was state of the art information processing
technology.
The high-speed digital microprocessor and desktop computer technology
developed
over in last decade has transformed our technological capacity to
collect,
process, organize, and use very complex information. Other than the
introduction
in the 1930's of machines capable of reading the graphite pencil marks
on answer sheets, and their replacement with digital scanners in recent
years, the basic technology of the multiple choice test taking is
virtually
unchanged since it was invented nearly a century ago. By contemporary
standards,
the multiple-choice test technology as represented in the Stat-9
is primitive, highly limited, and static.
It is not likely
that the
innovations in testing and assessment technology will originate from
the
testing industry, which is heavily invested in multiple-choice
technology
and ill- equipped for dealing with the new cutting edge information
technologies.
They have nothing to gain and much to lose from a accountability system
that does not rely on centrally administered and scored standardized
multiple-choice
tests. Though no technology can replace human judgement, the newer
digital
information technologies have unexplored potential for fostering
responsive,
systematic, and locally based assessments that also teach. To avoid
commercialization
of the educational process and undue influence of large corporate
interests,
pursuing these paths will require public investments that stimulate
school
and community-based collaborative research and assessment
development.
In the near future
a variety
of educational tests will continue to be used in diagnosing student
needs
and assessing educational achievements. There are a number of steps
that
ought to be taken immediately by governments to protect children,
communities
and the public at large from discriminatory tests, and insuring that
the
tests used meet the dual standard of enhancing learning and advancing
equality
of educational opportunity. Forms these protections might take is
briefly
discussed in the concluding section.
Power and
Control
How is power
distributed
within the accountability system, that is who writes the rules,
distributes
rewards and sanctions, determines who are the 'stakeholders' who will
make
the fateful educational decisions. The instruments employed by the
system
of accountability are of course critical, because they define what is
valued.
But an accountability system includes a distinct organizational
structure,
a set of procedures for controlling rewards and sanctions that
represent
a particular configuration and distribution of power. The configuration
can be changed and balanced so as to give more weight and
responsibility
to schools and local communities and less to experts, government
officials,
appointed national and state boards and agencies.
A Massachusetts
group called
the Coalition for Authentic Reform (31)
(CARE) has outlined a proposed statewide accountability system that
aims
to raise educational standards and the quality of learning and teaching
in classrooms and schools. It consists of four integrated
components.
Local
Assessments.
Each school would submit its accountability plan for review and
approval
to a regional board, established by the Massachusetts Department of
Education.
The plan would outline how the school will assess progress toward a
broadly
stated set of competencies.
External
Quality Reviews.
On a three to five year cycle schools would undertake a self study and
an external auditors would review the self study visit the school,
report
on progress toward the dual goal of academic excellence and equitable
and
quality resources and learning opportunities are being provided to all
students.
Standardized
tests.
These would be limited to literacy and numeracy and would not have high
stakes decisions attached.
Annual Reports.
Each
school and school district would annually report to 'stakeholders' on a
set of 'indicators' developed by the state. These would include but not
be limited to academic performance and reported in terms of race,
gender,
low income status, special needs and limited English proficiency
The State of
Nebraska in
1998 adopted policies that emphasize that the assessment of student
academic
performance is a local responsibility that should primarily serve to
improve
instruction and increase learning in the classroom. Further, the policy
asserts that no single measure can achieve all purposes, and multiple
measures
are needed to provide complete information to teachers, parents and
policymakers.
The assessment system called School-based, Teacher-led Assessment and
Reporting
System (STARS) is set to begin in the 2000-01 school year. Under this
plan,
(32) the
Nebraska
Department of Education invited proposals from teachers and local
districts
to develop their own operational plan. One of the proposals submitted
and
approved was by a coalition of representatives from the Nebraska
Writing
Project (the home of the National Writing Project), 'The School at the
Center.' (both are networks of teachers and university faculty members)
and nine Nebraska school districts. The underlying premise of their
plan
is that teachers develop the assessments and become their own
assessment
experts. The Coalition promises to produce nine "locally appropriate,
context-sensitive
assessments" for mathematics and reading/writing. While the Nebraska
plan
has its limits (all districts are required to administer annually one
of
several state approved commercially available achievements test
batteries),
it provides funds for local school and school district assessment
initiatives,
and places significant restrictions on the use of standardized tests in
making high stakes decisions.
There are also
other living
examples of accountability systems where power is balanced between
national
and local interests and concerns. Scotland with population of just over
five million (approximately the size of Maryland, Missouri, Wisconsin,
Minnesota) governs its own primary and secondary schools independently
of the British government in London. Traditionally education in
Scotland
is organized as a partnership between the central government, local
government,
and schools. For many years it has had a system of school inspection
that
resembles the school review process proposed by the Massachusetts CARE
coalition. American style standardized tests play no role in the
assessment
process. Two recently issued government papers (33)
reassert and strengthen the policy that it is the responsibility of the
national executive authority 'to exert strategic leadership of the
national
system… by articulating after consultation the national priorities for
education, yet leave to each school supported by its local authority
[school
district] the central responsibility for its own improvement and for
raising
standards.' Further, the paper affirms national policy that specifies a
basic level of provision which specifies a minimum educational
resources
for all schools and students.
There have also
been several
notable efforts in the US and in the U.K. to develop comprehensive
accountability
models that could serve to articulate local school-based systems into a
national (or regional) assessment framework. (34)
An Agenda for
change
1. Eliminate
regulations
that directly or indirectly link federal incentives to state adoption
of
centralized statewide testing of teachers and students.
2. Require
an educational
impact statement prior to the implementation of a test or assessment
procedure
by any government educational agency. Such a statement would report on
effects on children, schools and community, level of academic
achievement,
distribution of resources and learning opportunities, drop-out, etc. by
race, gender, socio-economic status.
3. Provide
federal
and state incentives and technical services to schools that (with the
support
of locally elected school officials) take central responsibility for
school
improvement and raising standards, and stimulate the development of
partnerships
among teachers, communities, parents, to develop 'locally appropriate,
context-sensitive assessments.'
4. Strengthen
and
support efforts to set and enforce standards for tests and assessments
that protects the public from inappropriate use of tests and
assessments,
violations of civil liberties and rights of students and teachers.
Currently
there are two sets of relevant standards: the
Standards for Psychological
and Educational Tests produced by three professional associations;
(35) and Principles
and Indicators for Student Assessment Systems developed bythe
National Forum on Assessment, a coalition of children's and national
civil
rights groups. (36)
Both are useful but largely symbolic since test developers and
government
agencies are under no legal obligation or political pressure to meet
any
standards or principles. The major test publishers and service
providers
operate under a shroud of secrecy that has been sanctioned by the
courts.
Remedying the situation requires either legislation, or extra-
governmental
agreements that would insist that tests used for high stakes decisions
meet published professional standards, and that educational tests and
assessment
procedures be open to public scrutiny and independent review.
(37)
5. State
legislatures
should declare a moratorium on tests used high stakes testing in to
undertake
an orderly review existing tests to determine whether they comply with
professional standards and meet National Forum assessment
principles.
6. Federal
and/or
state government could pass legislation intended to rein in the abusive
uses of tests. Paul Welstone (D-MN) in April, 2000, introduced to the
Senate,
(and Rep. Robert Scott (D-VA) to the House) The Fairness and
Accuracy
in Student Testing Act,
(38) that
would
prohibit the use of standardized tests as the single determinant in
making
decisions about graduation, promotion, tracking or ability grouping of
students and that tests must: be valid and reliable for the purposes
for
which they are used; measure what the student was taught; provide
students
with multiple opportunities to demonstrate proficiency; provide
appropriate
accom-modations for students with limited English proficiency and
disabilities.
Political action at the state level is more likely.
Changing
assessment; changing
schools
Goals 2000
policies has led
the nation down a dangerous path by increments to a radical transfer of
power, with an increasing concentration in the hands of government,
authorities,
bureaucrats, experts, and Washington D.C. and state capital based
'stakeholders,'
all distant from children, classrooms, and schools. This power
imbalance
is educationally, and as the recent electoral politics of California
illustrates,
politically unwise, and potentially explosive. Mandated standardized
tests
because they have a disproportionately high adverse impact on
communities
of color, sustain and strengthen institutional racism. As testing
programs
authorized by state legislatures that tie tests to school promotion,
admission
to 'gifted' programs, entitlement to high school, scholarships,
diplomas,
degrees, certification, etc., are implemented over the next five to
eight
years, these adverse effects on communities of color will intensify,
and
provoke racial and cultural conflicts, and organized resistance.
Significant school
reform
is not possible without significant reform of the current system of
national
and state educational assessments. Changes will not occur of their own
accord. They will come about only in response to persistent pressure by
coalitions and tactical alliances that cut across political, social
class,
racial, and ideological lines. These include coalitions of citizen,
student,
teacher, and parents activists, children's advocates, civil liberties
and
civil rights leaders, educational traditionalists, and grassroots
political
progressives and conservatives. There were in the 1999-2000 school
year,
for the first time, numerous organized, protests, boycotts, and other
forms
of active resistance to high stakes standardized testing by teachers,
parents,
youth, and community activists across the nation. That resistance is
growing
and is becoming more militant as mandated tests tied to sanctions are
put
into place. We as a nation will continue differ profoundly on how
schools
ought to educate, what an educated person ought to know, and on how
students
learn best. In a democracy we cannot allow governments, panels of
experts
remote from communities, classrooms and students to impose a singular
view
of curriculum and learning, and to decide our and our children's
futures.
(11,982)
©2000
Harold Berlak Comments welcome: hberlak
@sbcglobal.net Harold
Berlak holds a doctorate educational reseach from Harvard. He is an
former
professor of education at Wshinton University in St. Louis, and lontime
educational activist.
1. Raising
Standards For American Education, A Report To Congress, the Secretary
of
Education, the National Goals Panel, and the American People; Wash
DC 1/24/1992 US Printing Office IBSN 0-16-036097-8. This report
explicitly
acknowledges and adopts California as the model. The 'smart' standards
tied to 'smart' test policy was introduced to California by Bill Honig
nominally a liberal Democrat who served as State Superintendent of
Instruction
from 1983-93.
2. These
goals are: by the year 2000 (1) all children will start school ready to
learn; (2) the high school graduation rate will increase to at least
90%;
(3) all children will leave grades 4, 8, and 12 having demonstrated
competency
in challenging subjects including English, mathematics, science,
foreign
languages, civics and government, arts, history and geography. (4)
American
students will be first in the world in science and mathematics
achievement;
(5) every adult American will be literate; (6) every school will be
free
of drugs, violence, and the unauthorized presence of fire arms and
alcohol
and will offer a disciplined and drug free environment. 3. Still
Separate, Still Unequal, A
Research Brief Oakland, CA: Applied Research Center, May,
2000.www.arc.org. 5.
Steven J. Gould, 'Jensen's Last Stand' , New
York Review of Books, 1980 ; Leon J. Kamin, The
Science and Politics of IQ New York 1974: Daniel M. Kohl, 'The IQ
Game:
Bait and Switch',School
Review 84:44 1976., John Wiley
6.
Russell Jacoby and Naomi Glauberman (eds.) The
Bell Curve Debate. New York, NY Times Books/Random House,
1995
7. Cultural
supremacy arguments are dismantled in Jared Diamond, Guns,
Germs and Steel New York: Norton, 1997.
8.
Samuel L. Myers Jr. and Cheryl Mandala Is
Poverty the Cause of Poor Performance of Black Students on Basic
Standards
Examination? Roy Wilkins Center for Human Relations and Social
Justice,
Univ. of Minnesota. Paper presented at the 1998 American Educ.
Research.
Assoc. Annual Meeting, June, 1997 9.
Schools were ranked in terms of resources, education and experience of
staff, number, depth and range of academic course offerings.
10.
Claude M. Steele, 'A threat in the air: How stereotypes shape the
intellectual
identities American
Psychologist, 52, 1997. Also see 'Stereotyping and its threat are
real' American
Psychologist, 53, 1998. 11. These include Lisa Delpit Other Peoples Children New York; The new Press ,1995. Joyce E. King, ' The Purpose of Schooling for African American Students' In J. King, E. Hollins and W.C. Hayman (eds.) Preparing Teachers for Cultural Diversity New York: Teachers College Press, 1996; Gloria Ladson Billings The Dream Keeper: Successful Teachers of African-American Children, San Francisco: Josey-Bass 1999. 12. Fordham, Signithia, (1996) Blacked Out, Dilemmas of Race, Identity, and Success at Capital High, Chicago: University of Chicago Press. 13. See Robert Linn 'Assessments and Accountability" , Educational Researcher 29:2 2000. He cites data from the Florida high school competency test, given annually since 1977 to illustrate a common pattern. When first a test is introduced, scores rise markedly for several years for whites and persons of color, level off, and over time decline slightly. However, the gap in test performance between the races remains virtually constant over time. 14.In CBEST 10 of the 50 items on the math and language section are not scored. They used in creating items for future version of CBEST. Eight percent of 40 items equals 3.2 items. The size of the gap in terms of test items will of course vary depending on the number of test items. 15. On some tests, particularly in mathematics and engineering, some Asian populations outperform Whites. 16. These include: Linda McNeil Contradictions of School Reform; The Educational Costs of Standardized Testing New York: Routledge, 2000; George F Madaus,. 'A Technological and Historical Consideration of Equity Issues Associated with Proposals to change the Nation's Testing Policy' Harvard Educational Review,64:1,1994; Diana C Pullin, 'Learning to Work: The Impact of Curriculum and Assessment Standards on Educational Opportunity' Harvard Educational Review, 64:1,1994. 17. Some achievement tests, the college entrance SAT for example, predict academic grades at the next level, but only in the very short run. 18. See Peter Sacks, Standardized Minds., Cambridge, MA: Perseus Books 2000. 19. The concluding paragraph, slightly revised, is taken from Deborah Meier, 'Educating a Democracy: Standards and the future of public education', Boston Review, Dec 1999/ Jan 2000. 20. Some states afford flexibility in time allotted for completing the mandated test. In some cases allowances are made for students with documented learning disabilities, or whose native or first language is not English. 21. Some achievement tests, the college entrance SAT, for example, predicts academic grades at the next level, but only in the very short run. 22.High Stakes Testing for Tracking, Promotion and Graduation, 1998, and Myths and Tradeoff: The Role of Tests in Undergraduate Admissions, National Research Council.1999. 23. The 1988 Education Reform Act proposed and shaped while Margret Thatcher was Prime Minister became law under her successor, John Major. 24.
Pollard, Broadfoot, Croll, Osborn and Abbott, Changing
English Primary Schools, Cassell; 1994; Croll (ed.) with Abbott,
Black,
Broadfoot, Osborn and Pollard) Teachers,
Pupils and Primary Schooling, Cassell,1996; Osborn, McNess and
Broadfoot,
with Pollard and Triggs Policy,
Practice and Teacher Experience (Continuum 2000; Pollard and
Triggs,
with Broadfoot, McNess and OsbornPolicy,
Practice and Pupil Experience Continuum, 2000.
25.
Robert Linn, op
cit.
26.
See Call
to Action for American Education in the 21st
Centurywww.ed.gov/updates/
PresEDPlan/part2.hmtl 27.
Bureau of the Census, October Current Population Survey 1996
28.
Movements of teachers, community and youth activists in opposition to
tests
are gathering force across the nation including California,
Massachusetts,
New York and Illinois. There have also been boycotts by students in
Chicago,
and Boston.
29.
Walter Haney, George Madaus, and Robert Lyons , The
Fractured Marketplace for Standardized Testing Boston: Kluwer,
1993.
30.Deborah
Meier Will
Standards Save Public Education Boston: Beacon Press, 2000; Linda
Darling-Hammond,
J. Ancess and B. Falk, Authentic
Assessment in Action, New York: Teachers College Press, 1995; Grant
Wiggins, Educative
Assessment: Designing Assessment to Inform and Improve Student
Performance,
San Francisco: Jossey-Bass,1998; Patrick Griffin, P. Smith and L
Burrill The
American Literacy Profile Scales: A Framework for authentic Assessment
Portsmouth NH: Heinemann,1995; Monty Neil et al. Implementing
Performance Assessments Cambridge MA: FairTest, 1996.
31.
Full proposal and list of members of the coalition www.fairtest.org/care/accountability.html 32.
See the STARS Planning Guide at; http://www.edneb.org/IPS/starsmnt.pdf
. Also see Chris Gallagher
A Seat at the Table; Teachers Reclaiming Assessment Through Rethinking
Accountability PDK http://www.kiva.net/~pdkintl/kappan/kga10003.htm
33. Scotland Executive Education
Department, School
code paper and Priorities for schools 2000. 34. Ann Filer (ed.) Assessment
Social Practice and social Product Falmer Press, London & New
York,
2000; Harold Berlak, (ed.) Toward
a new science of educational testing & assessment, Albany:
SUNY
Press, 1992; John Raven "A model of competence, motivation, and
behavior,
and a paradigm for assessment". in Berlak (ed.) Toward
a new science; Tyrell Burgess and Elizabeth Adams Outcomes
of Education,. London: Macmillan Education, 1980.
35.The American Psychological Association,
American Educational Research Association,
and National Council on Measurement in Education).
36.
Summary available at: www.fairtest.org 37.
In 1998 the Ford Foundation funded an organization called National
Board
on Testing and Public Policy, located at the Center for Testing,
Evaluation
and Educational Policy at Boston College. One of its chief purposes of
the Board would be to monitor quality by conducting independent expert
audits of tests. One of the more serious difficulties with the proposed
Board is that testing industry is elevated to be a major 'stakeholder.'
This would almost certainly retard innovation by giving precedence to
the
companies who are heavily invested in an out-of-date technology. Also a
problem is the absence of a significant presence on the governing board
of practicing teachers, parents, and local community. See George Medaus
and Cathy Horn,
Testing Technology: the need for oversight' in AnnFiler op
cit. |