« More evidence on minimum wages, employment and poverty: a continuing series | Main | Stand up against the penny »


Feed You can follow this conversation by subscribing to the comment feed for this post.

It seems to me that your definition of bird course is too simple. This is of course anecdotal, but it was my experience in undergrad that bird courses weren't necessarily known for higher average grades. They were commonly known for low-fail rates or lighter course loads. It was also my experience that I could get a much higher grade in courses that were known to be difficult and had high work loads (e.g. 4th year econometrics or 4th year microeconomics) than courses commonly known as bird courses. It also seemed to me that courses known to be easy were electives rather than mandatory courses, so your analysis might change if you restricted the sample to only elective courses.

My response as an econometrician is that if the standard errors are too big, you might be able to tighten them up by imposing some more structure and reduce the number of parameters to estimate.

JDUB: Step one: look at the raw distributions of grades. Step two: do what I tried to do. Step three: look at some of the things you are talking about. Yours are good points, but I never got past step two.

Stephen: That was my view when I decided to do it by department, rather than by course. It's like imposing the restriction that all courses in the same department are equally birdy, so there's a lot fewer parameters to estimate.

But suppose that *no* Science student ever took an Arts course, and vice versa. Then the model would be unable to estimate if Science departments or Arts departments were birdier, but it would still be able to compare the Science departments to each other, and the Arts departments to each other, if you split the model in two. That, in a more extreme version, is why I think the estimates had such large standard errors.

Hmm. I see. You want to be able to 'triangulate'.

Another approach: introduce 'birdness' as a latent binary variable? Looks like a good MCMC project for a graduate student. Do you still have the data?

Not any more. I could possibly get new data, but I don't have the "need to know" that I used to have. And it took the university some effort to scrub the identifiers off the data, while giving each student a new "number". This is really just memories of my past life, now I'm just a regular prof.

Oh well. File it along with my other empirical projects that were abandoned after 30 minutes...

Stephen: About how many empirical projects do you usually abandon after 30 minutes a year? Just curious as I am a freshly graduated applied economist, and have already abandoned what seems like a huge amount of ideas for empirical projects after realizing the data or the methodology just won't work.

Too many to count! Especially with the blog - everything looks interesting, and everything looks too hard.

This is a huge problem for students, and it's too bad you couldn't squeeze down your errors. I know that this has been a big problem for my wife, who's been unable to get any scholarships in law school, because her undergrad GPA in business is competing against undergrad GPAs in education, which most people wouldn't expect to be directly comparable. The same principle is presumably causing similar inequality at the admission stage, though it didn't affect us personally, and countless other places.

It would be really useful if someone could establish a widely-accepted GPA standardization system, which would allow all graduate and professional schools, scholarship bodies and employers to compare candidates who attended different schools or different departments.

Neil: I have been on scholarship committees before, and what helps the most are not similar GPA systems, but including indicators on transcripts that show where the student's grade for a course stood in comparison to their classmates.

Neil: that's why I wanted to try to do something about it, in my old job. But even though my method, in principle, could work to compare different subjects within the same university, I'm not sure how well it could work at comparing different universities. You would need a lot of data from transfer students, who took courses at two universities.

I vaguely remember hearing about one Canadian university that tried to measure relative grade inflation at high schools, by comparing students' university GPAs with their high school GPAs.

In my day, in England, we all sat common exams at the national level, and universities based their admissions on those grades. But I hear suspicions that those (A-level) exams are easier in some subjects than in others, which ought to be testable since each student will take several subjects at A-level.

JDUB: reporting class mean or median grades on the transcript may help a bit. But if some classes attract only the smartest students, a high class average may not mean it's an easy grade. That was the problem I was trying to get around.

Just to add more possible confusion. You are assuming a students smartness is stable. Note that observed smartness is composed both of intrinsic smartness and effort. Effort is likely to be variable over time. A comment from a Software Engineering prof who was trying to determine the effects of various organizational techniques on software productivity was that those effects were swamped by boyfriend trouble or girlfriend trouble.

Jim: yep. But in my case I don't think that would normally cause a big problem, if the data set is large enough (which it was), those omitted variables ought to be random errors that roughly cancel out on average. (Unless taking a particular course happens to be correlated with boyfriend/girlfriend trouble!)

Nick, I think you have described precisely why American colleges are so attached to SAT, GMAT, LSAT, and GRE tests. Canadian universities seem unusual in having selective admissions without any uniform testing to supplement the "noisy signals" from course grades.

Has the Ontario government ever commissioned a major study on grade inflation at the high school level? I think it is a serious problem based on my experiences as a student. It seems to be a widespread phenomenon but is happening in different ways at different schools. How much can universities actually know about individual high schools?

Nick, all of your guesses seem reasonable, but ... as Stephen noted, there really are too many parameters in the model. And if you are going to try to reduce the number of parameters, wouldn't it be better to focus on i (number of students) than on j (number of courses or departments or whatever?) Surely, no matter how you measure j, i must have been bigger?

The line then would be to re-write your model to remove the dummy parameters for student intelligence and substitute something observable, such as relative course rank. The model would then be:

Student i's grade in course j = F(student i's rank in j, birdyness of j) + Error

F would be chosen according to taste, such as (rank_i + birdyness_j) or possibly (rank_i * m_birdyness_j + a_birdyness_j).

Kevin: "Has the Ontario government ever commissioned a major study on grade inflation at the high school level?"

I don't know. But if it hasn't been done, it would be a very good idea to do one. Each school sends students to many different universities, and each university accepts students from many different schools. And there are lots more students than schools and universities. So it ought to be possible from that data to get an accurate estimate of each school's birdiness and each university's birdiness.

Damn good policy research proposal. And it should be done at the Provincial government level.

Phil: I'm going to have to think about that. I can't get my head around it yet.

Wikipedia has an interesting entry on grade inflation:


Or, maybe the Ontario universities themselves should band together and do it. They all have the data, after all. (But maybe universities, like the schools, would be worried about the results being public!)

The results could be scary for many universities and high schools. At the same time, it could illuminate some cases where great teaching is being done(like my high school). I would imagine the minsitry of education and the teachers union would not like such a study being done though.

If universities were able to factor in the quality of ones high school in admissions decisions it would create a nice incentive (in the form of parental pressure) for high schools to improve.

Unfortunately, even with all this analysis of grades and distributions one thing will always remain true to me. Inputs matter. If you put garbage in, you usually get garbage out, even if the garbage follows a nice bell curve!!!.

Interesting wikipedia article. I am primarily interested in the heterogeneity between schools in Ontario.


rural vs. urban
high vs. low income

I am not surprised. I usually found the difference in professor to be more significant than the difference in subject. Some were better and others worse. Some wanted to teach and others were more intent on shrinking the size of their classes. Some were enthusiastic and supportive and others dogmatic and rigid.

Nick: "maybe universities, like the schools, would be worried about the results being public!"

Many of them surely would.  So they should be forced to participate as a condition of their provincial funding.  And while we are at it, we should track student future earnings and throw that into the model too.  Not that unprofitable (for the students) programs/universities should be cut off, but students should at least know their realistic prospects before they decide to drown themselves in student loans.  If they still choose to enroll, we can choose as a society whether that program/university has other socially redeeming qualities that make it worth funding.  

You're definitely right Nick... a lot of departments/universities definitely don't want this information out there.  

Do econometricians use Rasch analysis? It's often used in psychometrics and seems like a good fit for this problem.

Never heard of it. But a quick check of Wikipedia says that it's what we call a logit model. And yes, we do use it. My binary latent variable idea was a slightly more generalised version of this approach.

I believe some/all universities do this. The Math Faculty at Waterloo compares high school transcripts to performance in mandatory first year courses such as calculus and classical algebra to detect relative grade inflation between schools then uses this in selecting prospective students. This really is a problem, as some high schools are pumping out students with ~100% averages at highly suspect rates.

They do some neat analytics there. They may be worth contacting.

I wonder if you could use data from, say, law school applicants (or MBA or medical school applicants) since they typically are required to write a standardized exam to be admitted (be it the LSAT, the GMAT or the MCAT). That would give you a common measure of "smarts" across students(setting aside the debate about the merits of such tests - although presumably any error in LSAT score as a measure of smartness wouldn't be correlated with course birdyness), so you wouldn't have to rely solely on the individual student dummy variables to control for "smarts". You probably couldn't do course level birdyness, but a large law school like Osgoode or UofT probably gets enough applicants (they typically get a few thousand applicants a year and you might be able to collect a few years worth of data) that you could do a reasonably robust estimate of department level birdyness (though I guess you'd need to add a university specific dummy variable).

Moreover, from a practical perspective, law school applicants might be a useful dataset because (a) the admissions committee would have all the raw data you'd need (indeed, they may have already semi-processed it for the admissions process - i.e., redacted identifying information to avoid allegations of racism, sexism or nespotism in admissions), (b) a law school typically has one or two law and economics guys (or girls) who could be a co-author (and would probably be thrilled to be associated with a "real" economist) and who might make it easier to get the data (since they're "in" the faculty), and (c) the question you are trying to answer could be of real value to the admissions committee. Moreover, the problem of trying to compare "artsy" students with "sciency" students might be mitigated since it's the nature of law students that, whatever their undergrad major, they've probably taken courses that requires them to write an essay or two (i.e., Artsy course - for the same reasons, a med student dataset would also work, since med schools typically have minimum science prerequisites).

a maximum likelihood approach supposedly can distinguish relative strengths of each class in a school. All grades everywhere are compared to all classes everywhere such that persons taking easy A courses will get As, but the C in Physics still might be given more relative worth. It certainly can determine the top students, even if they take harder courses. I don't know the process much, but I know it's used in chess to determine relative strengths of players over 150 years (even though they obviously might not have played each other). Keene and Divinsky's "Warriors of the Mind" explains the use for chessplayers. The process is also used by a lot of sports computer programs to determine the relative ranking of basketball or football teams. All it requires is that there is enough overlap of teams playing each other that relative comparisons can be made. Sagarin and Pomeroy sites are examples. I'd check some of these.

The problem, it seems to me, is complicated by the fact that this could very easily be gamed by any department wanting to avoid bird designation: fit your output (the grade curve) to the expected parameters (the bell curve). As long as some reasonable proxy can be found for grades that correlates to performance elsewhere - and a pop quiz on almost anything would work - Professor Bird would probably never be identified by this method.

As another commenter pointed out, if the bird designation is more about not demanding much of students rather than giving higher grades, birdhunt would be almost impossible to run as a stats model.

Best identification method might be looking for distributions that fit too well, or profs who never justify being out of the curve.


One fundamental source of noise I can see in this model is the fact that most students will have courses they "don't care about". They will in essence have random grades in that course and will only avoid failing. Now, how do filter out this kind of 'crap data' from 'good data': i.e. the courses a student has put a lot of energy into versus the courses the student does not care about? The preferences of the students are dummy variables as well.

I think, if you want a real result then there's a much better, partly statistical and partly psychological approach to identifying birds more reliably: calculate the distribution of those students that get the worst grades in general and look for courses that bad students are (much) less likely to fail. These are typically the students that will search for bird courses the hardest way.

And a course can be a 'bird' for many reasons: it might require less effort to learn its material, the test might be artificially easy or predictable, or the test environment might allow cheating ...

Sometimes not even the lecturer is aware of a course's birdiness ...

One thing is sure: badly performing students will map out these courses sometimes with as much investment of effort as it takes to pass an exam :-)

Note that the reverse is not true: well performing students will not map out "non-bird" courses. Even them will balance their workload and will consciously select know bird courses they deem less important for their future careers.

Nick - I think you ought to look at programs that accept students from many backgrounds: say professional education programs that people take after a Bachelor's degree in a subject area. Check the acceptance criteria for that program: is it a common grade level or do they accept so many future math teachers, so many future, science teachers and so many future English teachers? How do those students do?

What this will do is impose a selection bias on your results because you will only get students good enough to complete a program well enough to get into another. But if yo are trying to find out if a B- is becoming a B+, this may be a good place to look.

... I see Bob got there first with this idea...

Some schools would be much more suitable for this sort of study than others. For instance, I go to a liberal arts college (in the US) where EVERYONE is required to complete a series of courses on the history of Western thought. Here's the really crazy part: Every professor is required to teach those classes as well. There are no professors who teach only those courses. If you were looking for a simple dataset to model, it would be an ideal place to look.

On the anecdotal side, I've wondered why it is that almost all the honors students in the School of Sciences seem to be Psychology majors...

On further thought, the other advantage of using professional schools is that they might impose a common first-year curriculum (law schools generally do). So, you might be able to use first-year grades in common courses a your "smarts" control variable, since those courses are presumably equally "birdy" (generally, at least at UofT, they had common 100% exams) for all students and should be uncorrelated with undergrad bird courses.

Two modest suggestions:

1) You argue that large standard errors are caused in part by the fact that some comparisons are hard since it is very rare to find students in both courses. Okay....give up on that. From a student's point of view, such comparisons are the least interesting since very few would consider taking both. This has a lot of potential value-added for departmental adminstrators and students if you can just get it working at a departmental level. (Given that most programs require all students to take a set of "core" courses, comparisons withing a department should be well-observed.)

2) I'm trying to think of other applications that solve the same problem. High-school ranking, sports rankings (e.g. college teams or professional players of different eras), and publication ranking seem to face similar problems; there's a limited number of observations on each unit, N is very large, the observations compare a small number of units using an idiosyncratic standard. What do they do?

The comments to this entry are closed.

Search this site

  • Google

Blog powered by Typepad