A Brief History of Grading — and What That Means for Schools Today

Frederick M. Hess
7 min readJul 24, 2024

--

Over the past several months, Joe Feldman, veteran educator and author of Grading for Equity, and I have been discussing equitable grading. We’ve touched on everything from grade inflation to whether this approach can ever really yield higher standards to what it takes for schools to responsibly pursue equitable grading. Today, we talk about what the research says about equitable grading, and Joe delivers a brief history of grading practices.

— Rick

Rick: You’ve mentioned to me that there’s a mismatch between what the research on equitable grading says and the way the practice is regarded by its critics. Can you explain what you mean?

Joe: I get that “equity” has become a polarizing term of late, and I also recognize that there are probably bad or ineffective things happening under the banner of “equitable grading” that have little resemblance to the way that I’ve defined — and many others implement — those practices. But the work of more accurate and fair grading, whether you want to call it “equitable grading,” “standards-based grading,” or even “common-sense grading,” is about creating the conditions for deeper, more rigorous teaching and learning through clearer and more truthful reporting of student progress that doesn’t reward or punish students based on teacher biases or circumstances outside a student’s control.

Taking a critical view of how we traditionally grade can lead to profound and positive changes. The most convincing evidence is from teachers who share their experiences. Here’s one quote from Nick, a high school physics teacher, who told me, “I’ve told students that the homework, rather than being included in your grade, is your opportunity to practice and to see how well you understand things. Homework completion at first took a dip when I stopped counting it for points.”

But that’s not the end of the story. Before too long, Nick related, “They realized, ‘Oh, I want to get a good grade in this class. I need to understand the material,’ and then homework completion has shot up. It’s the opposite of what I feared would happen. Now they see that the purpose of homework is actually to learn the material.”

Rick: You’ve suggested that common grading practices should be regarded as the product of inertia more than evidence. Can you say more about what you mean?

Joe: Entrenched practices can persist despite compelling evidence for change. When it comes to grading, our long-held beliefs often diverge from the most recent evidence and real experiences of practitioners. This isn’t unique to education: Physicians are famously resistant when long-standing practices are upended by emerging research or new data, even by fellow physicians. One example is the adoption of handwashing in health-care settings. Ignaz Semmelweis, a Hungarian physician in the mid-19th century, discovered the importance of hand hygiene in preventing the spread of infectious diseases. Yet, despite Semmelweis’ findings and evidence, his ideas were initially met with skepticism and resistance from the medical community. It took several decades for handwashing to become widely accepted as a standard practice in health-care settings. I’m not going to claim that equitable grading is the same as handwashing, but I do think that equitable grading practices make grades less likely to be “infected” by teachers’ biases.

I have always approached this work as a dialogue — where you and I approach this work with mutual curiosity and openness. I’ve had many disagreements with skeptics that ended in our realizing that we are interested in the same goals for students — particularly those who have been historically underserved — and we agree more than we disagree about the benefits of equitable grading once we are clear about what it is and what it isn’t. I recall a Fox News interview where I was paired with a teacher who spoke about her adamant disagreement with “equitable grading.” When she shared her concerns, and I responded with clarifications, rationale, and evidence, she shifted to arguing that the biggest problem is that districts aren’t training all teachers to implement improved grading!

Rick: OK, let’s switch gears. We’ve had a number of conversations about grading as practiced today. Given all that, I’m always curious how we got here. You’ve noted in passing that the contemporary grading system grew out of the Industrial Revolution. Can you spell out what you mean by that?

Joe: Our current grading practices were developed over a century ago and shaped by that era’s beliefs about teaching, learning, and human potential — many of which have since been debunked. In the early 20th century, academics and educators believed intelligence was fixed and distributed across the population along a bell curve, with a few people at the high and low ends and most in the middle. Following the lead of universities, K–12 schools used norm-referenced grading, in which a student’s grade signified their achievement relative to others’ in the course.

Our traditional approach to grading largely stems from the century-old beliefs that too many A’s constitutes a weak, easy course and that fewer successful students indicates a rigorous course. That thinking flies in the face of what we now know about academic potential. Grades of A don’t have less value if more students achieve them. Equitable grading reinforces that the size of the bullseye doesn’t get smaller if more people hit the target. Rather, these practices reinforce the goal of great teachers, which is to get the largest number of students to hit the mark as possible.

A second example is that during the Industrial Revolution, animal trials by John Watson and B.F. Skinner supported the belief that humans were most effectively motivated by extrinsic rewards and punishments. This belief underlies the traditional grading practice of using points to incentivize — or some might say control — student behaviors, such as coming on time to class or completing homework.

But, over the past few decades, research from Edward L. Deci and colleagues and from Tony Docan-Morgan has demonstrated that this belief has severe limitations, one of which is that extrinsic rewards and punishments often undermine creative thinking and effective problem-solving. And while some might argue that using points to change behavior prepares students for the professional world, there’s no evidence I’m aware of that supports this. For example, there’s no evidence that employees who come on time to meetings do so because their teachers subtracted points for lateness or that employees who are habitually tardy had teachers with more lenient grading policies.

I believe a primary reason Industrial Revolution-era grading persists is that a critical understanding of grading research and practice hasn’t been included in teacher education or certification. For generations, teachers have had little choice but to replicate how they were graded, and many teachers were successful in school and ostensibly weren’t harmed by traditional grading — the reasoning goes something like, “I did fine, so why change anything?” We find that when teachers think critically about this underdeveloped aspect of their practice, they see the urgency to shift their grading to match modern, research-based understandings of student motivation.

Rick: You’ve previously raised the issue of grade deflation, arguing we focus too much on grade inflation and not enough on deflation. I’m not sure what to make of the argument but would love to hear you explain a little more. Can you expand on what you have in mind?

Joe: Let’s start by clarifying what we mean by grade inflation. Grade inflation occurs when a student’s grade is higher than their actual understanding. When grades are inflated, that student, their parents, college-admissions officers, and others are told that the student is prepared for a certain level of academic challenge when they actually aren’t. This inaccurate grade can have significant consequences, such as requiring unanticipated remediation, which, in college, can make students less likely to graduate on time, if at all.

Grade inflation has received particular attention since the pandemic. Interestingly, research by Seth Gershenson of the Fordham Institute published in 2018 — before the pandemic — found that grade inflation was worse in schools attended by higher-income students, while research after the pandemic suggests that, more recently, there has been a disproportionate increase in the grade inflation of students of color and those from low-income families.

Grade deflation — and I have come to believe a more useful term might be “grade depression” — occurs when a teacher-assigned grade is lower than a student’s understanding of course content. Grade depression can be even more harmful than grade inflation. Rather than grade inflation, which opens doors for an opportunity a student is not prepared for, grade depression prevents students from pursuing opportunities — like advanced coursework or postsecondary opportunities — that they are fully prepared for.

We know that traditional grading practices can cause grade inflation and grade depression due to their reliance on the common practice of combining a student’s academic with nonacademic performance in their final grades. This practice renders grades inaccurate and unreliable. The student who doesn’t know the content particularly well but compensates for that weakness by following all class rules earns an inflated grade. On the other hand, the student who has an excellent understanding of the content but doesn’t adhere to all class rules receives a depressed grade. The student with an inflated grade is able to conceal the truth of their deficient academic understanding by pleasing the teacher, and the student with a depressed grade has their excellence hidden.

In a forthcoming paper by the Equitable Grading Project, my co-authors and I compare the teacher-assigned grades of secondary students from multiple states and districts with their corresponding standardized-test scores. The findings revealed a striking mismatch between grades and test scores. Of course, this could be caused by a host of reasons related to the weaknesses of standardized testing. However, we found that when teachers deviated from traditional methods of grading and used improved, more equitable grading practices, grade-test score consistency — i.e., the similarity between grades that teachers assign and test scores — increased, meaning that the use of those practices reduced both grade inflation and grade depression. These results match what we found in 2018.

This post originally appeared on Rick Hess Straight Up.

--

--

Frederick M. Hess
Frederick M. Hess

Written by Frederick M. Hess

Direct Ed Policy Studies at AEI. Teach a bit at Rice, UPenn, Harvard. Author of books like Cage-Busting Leadership and Spinning Wheels. Pen Ed Week's RHSU blog.