Relative grade inflation is when one professor grades easier than another professor. Or when one department grades easier than another department.
Operation Birdhunt was my attempt (OK, my colleague Marcel-Cristian Voia did the actual work) to do an econometric study of relative grade inflation across departments. Birdhunt was a failure, but a noble failure.
The absolute level of grades doesn't matter. It makes no difference whether we grade on a 100-point scale from 0 to 100, or a 10-point scale from 90 to 100. OK, it might matter a bit, if we can't use decimals, so can't give a 95% student a 99.5% grade on the new scale. We lose the power of fine discrimination if every student gets an A-, A, or A+. And that might be a problem at some universities, but isn't a problem at Carleton. Our A's still mean something. So I never worried about absolute grade inflation.
What I did worry about, when it was my job to worry about such things, was relative grade inflation. If one prof grades easier than other profs, or one department grades easier than other departments, then grades become a noisy signal, and decisions based on noisy signals will be bad decisions.
If everybody knew who the bird professors and bird departments were, and everybody knew that everybody knew this...if it were common knowledge, in other words, it wouldn't be a problem. Students, the university, and prospective graduate programs and employers, would all take that information into account, apply the appropriate deflator to the grades, and base all their decisions on inflation-adjusted real grades. Just like monetary neutrality. But we don't all know this. So relative grade inflation is non-neutral.
If nobody knows who the birds are, it's just like random noise. Students get misleading signals on their ability to advance to the next level, on their comparative advantage between subjects. Grad schools, scholarship awarders, and employers get misleading signals on who the best students are.
If the students know who the birds are, but nobody else does, the students flock towards the bird courses, rather than taking the courses they are good at and enjoy.
Even if the university knows who the birds are, or at least has strong suspicions, it's not likely that universities will make promotion and graduation rules contingent on which particular professor gave the student the grade. It's like sticky prices causing non-neutrality.
What surprises me is not that grade inflation exists; I am surprised it isn't such a pervasive problem that grades can't mean anything at all. With monetary inflation, there does exist some sort of nominal anchor which can make the overall price level determinate. The central bank fixes the price of gold, or fixes the supply of money, or fixes some sort of price level or inflation target, and that means there is an equilibrium price level, at least in principle.
There is no nominal anchor for grades. Grades mean whatever we choose them to mean, and what they mean depends only on what we all think they mean. Grades are like language. The word "cat" only means cat because everybody uses the word that way. But at least with words we can point to a real cat and say "that's what 'cat' means". Now I could point to a student's economics essay and say "that's what 'A' means", but it might not mean much to a sociology professor.
At best, a common set of grading standards is a convention -- the Nash Equilibrium to a pure coordination game. Just like language, or driving, all the professors want to use the words "cat" and "A" to mean the same thing that other professors mean, and to drive on the right/left if everyone else drives on the right/left. That's at best, if we assume that professors have no incentive to grade easier or harder than other professors.
Coordination games need a focal point to get everyone to the Nash Equilibrium. For the driving game, the focal point is obvious. Just watch which side of the road everyone else is driving on. But for grades, especially for new professors, or visitors, the focal point is not so obvious. It's like me steering a boat up the Grand Union Canal for the first time (fortunately, the barge coming the other way came to a dead stop).
I think it was Dean Allan Maslove (an economist, naturally) who introduced such a focal point into our Faculty. He sent all professors a description of the typical grade distribution for the Faculty. Any professor who submitted grades too far above or below that typical distribution would need to add a short note of explanation before the grades would be approved (by me, in my old job).
In practice, almost anything would count as an "explanation". All it really did was to force all the professors to look at the same focal point, and let me know they knew what the grades meant. And it worked, more or less.
But I wasn't really satisfied. You can't tell, just from looking at the grade distributions, whether one department is grading easier or harder than another department. Maybe department X attracts better students than department Y, and so ought to be giving higher grades? How can you distinguish between a department that has good students and a bird department? (And you can't use their students' high-school grades, because those may have the same problem).
Operation Birdhunt was an attempt to distinguish the two, econometrically. The basic idea was very simple. If some students take courses in both departments, but those students on average get higher grades in X than in Y, then X is birdier than Y.
I am not good at econometrics, so don't trust any of the following.
The basic model is this:
Student i's grade in course j = student i's smarts + course j's birdiness + random error.
I think it's called a "two-way fixed effects model with panel data". We don't observe student i's smarts, so each individual student gets a dummy variable. We don't observe course j's birdiness, so each course gets a dummy variable. That's a very large number of dummy variables, for a medium sized university, even though we did it by department and year-level, rather than down to the level of specific courses.
We got the data (stripped of anything that could put a name on an individual student) and Marcel fed it into a supercomputer, which made a loud crunching sound for a long time, simultaneously estimated every student's smarts and every course's birdiness, then spat out the answers for birdiness. That was the success. The model gave me a numerical estimate of each department's birdiness.
But closer inspection revealed that Operation Birdhunt had failed miserably. The standard errors were very large -- larger than the difference between any two departments' estimated birdiness. So I was unable to say, with any confidence whatsoever, that department X was more birdy than department Y.
For a long time I couldn't figure out why Operation Birdhunt had failed. Now i think I know why.
There's a big divide in the University, between the Sciencey half and the Artsy half. And not many students cross that divide much. And some specialised departments are a bit like closed shops to outsiders. And the computer was trying to figure out the birdiness of each department relative to every other department in the University. And the data, despite the massive sample size, just couldn't give the computer what it needed to do it.
Some pairs of departments just weren't really comparable, in practice. If there are students taking courses in both X and Y, and students taking courses in both Y and Z, but no students taking courses in both X and Z, the computer can only compare the birdiness of X and Z indirectly, via comparing each to Y. And the uncertainty of any pairwise comparison adds up as the chain gets longer, and the computer was trying to tell us this, by reporting very large standard errors. A failure of transitivity: the computer could compare the birdiness of X and Y with confidence, and Y and Z with confidence, but could not compare X and Z with confidence.
Or maybe it was a failure of reflexivity. Students majoring in X have a comparative advantage in X, and do relatively worse in their Y option, just as students majoring in Y do relatively worse in their X option. So the simple additive model can't capture these interactions between students and courses.
Those are my guesses anyway. But again, I'm not an econometrician.