« Son Spots: The merits of dynasties | Main | Was Canada ever the best place in the world? »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Nick, excellent post.

If I had access to the university econometrician, and I was trying to figure out if there was something in the water, I would go on a fishing trip, and here's what I'd look for.

I'd begin by looking for multi-section courses, and see if boys who have male instructors in the multi-section courses do better than the ones who have female instructors. I'd compare multi-section courses with/without common exams, and see if instructor gender makes more/less difference with common exams.

I'd look at the % male students in the classes, and see if that predicts success/failure. Interact gender of instructor with % male students in the class.

Can you separate local students from students from elsewhere? Perhaps being in residence and being away from home has more of a negative effect on male performance than on female?

There's lots of fish in the Rideau, along with the odd snapping turtle, so I hope you catch something.

Also see if boys who take a gap year do better. That would be interesting.

Frances: thanks! That sample selection/imperfect proxy stuff is what I was trying (and failing) to get my head around yesterday. I was also trying to get my head around what would happen if students have some information on their own ability at university that is not captured in high-school grades ("Was I slacking off at school and will I work harder at university?"). So we had two sorts of sample-selection effects interacting. But it was too hard for me to think about.

The fishing trips you suggest might be interesting. My hunch would be on number of assignments vs exams though. And yes, lots of different fish may be in the river, and the outside job opportunities is probably a big one. But the fish I would go hunting would only be the fish we could eat. If there is no policy that could reasonably be implemented to do something about it, there's no point in catching it. (Except for scientific curiosity, of course, which is fine, but I've got to think with my policy hat on nowadays.)

Also talk to Louis-Philippe Morin at U of O who has done some work looking at the performance of boys and girls in the double-cohort, and argues that boys do better in more competitive environments.

"My hunch would be on number of assignments vs exams though."

Correlated with whether the course is quantitative or qualitative? Male students may tend to select out of essay courses also, and highly motivated students may select into sections with more assignments (and student who figure they're smart enough to get by without the assignments may select into ones with fewer). Wonder if there's any way of getting exogenous variation in sections? Could always just look at # of assignments v. exams on multi-section required courses, controlling for instructor quality and gender. Course there are some really annoying instructors like the prof for ECON 1000 section A, who don't specify the number of assignments on their course outline.


Another thing to try: single section required courses, ideally ones that are offered only once a year. See if male students do better/worse if the course is offered at 8:30 a.m.


"But the fish I would go hunting would only be the fish we could eat."

That doesn't sound like a great strategy. If your dependent variable is entirely explained by fish you can't eat (i.e. there are no edible fish in the pond), then that's a really useful piece of information. Even discovering that you have an 80% R-score on useless variables (at least 80% of the fish are inedible) tells you whether it's still *worthwhile* to keep fishing. Trying to find out exactly what's in the lake is going to be a lot more efficient than wading around for months trying to catch salmon that may not even exist because it's all catfish.

If I were doing your job, I would go after the most likely fish first in order to find out exactly what's in the pond. It'll be faster, and even catfish can be pretty tasty if you are hungry enough and you get a bit creative about cooking them.

Frances: I see via email the university econometrician has been very quick to test that. Looks to me like the "8.30am effect" is roughly the same on both males and females! Or maybe just slightly worse for males.

K: Hmm. Fair point. Maybe.

(Catfish taste very good to me, btw.)

Nick (or the university econometrician) - is the 8:30 a.m. effect positive/negative/statistically significant?

Ignore that last comment, just checked my email.

From the abstract, it does sound like they do not have data on parents income and education level. That is not relevant for the gender issue, but those two should be the important factors that both cause good grades at high school and low dropout rates at University.

hix: You can download an ungated earlier version of the paper here

They don't know parents' income and education, but they do know parent's neighbourhood, and they know income and education for each neighbourhood, so they can proxy for parents' income and education.

It isn't directly relevant to the gender issue, but indirectly it is very relevant, because the sort of effect I am talking about here for gender could apply equally well for parents' income and education.

Dont think neigbourhood as a proxy can work, at least not without further corrections (which they might well have done and i just overread it). One would expect that a much higher percentage of the age cohort go to college in the richer neighbourhoods and the paper even suggests as much. Those who do go to college from the poorer ones could have just as well-off or high educated parents as the other ones. Regarding the gender issue, men are heavily underdiagnosed for mental ilnesses compared to women. That is directly relevant, since put bluntly, maybe 2 or 3% more men might have graduated if someone had given them antidpressants and also hints at a broader problem with gender roles. College can cause lots of anxiety and the traditional male gender role expectation is to not talk about it or admit it. That does not help.

hix: I'm not very familiar with the use of neighbourhood proxies, but here's how I think it's supposed to work:

You take average income in a neighbourhood as a proxy for the income of a family living in that neighbourhood. (And average education as a proxy for education.) Obviously (unless all families in a neighbourhood have the same income) the proxy will be imperfect. And this means that the estimated coefficient on income will be smaller than the true coefficient, because of the errors in variables bias. But as long as the proxy is correlated with the true income, the estimated coefficient should still have the right sign. The danger is if the errors in the proxy are correlated with some other variables in the regression, because that would bias the estimated coefficients on those other variables.

Interesting hunch on mental illness/males.

This is somewhat off-topic…but doesn’t the apparently close empirical relationship between high school and university performance thoroughly undermine the idea of university education as a signal? The evidence seems to suggest the same abilities, traits and habits make for success in both high school and university. Why would employers ever ascribe signalling value to (and pay premiums to people possessing) university degrees when they can obtain pretty much the same information about a person’s intrinsic characteristics from their high school record?

Giovanni: off-topic, but nevertheless a very interesting point. So it's allowed. If you can predict beforehand who will signal, is the signal still a signal? Hmmmm. dunno. I need to think about that one.

The idea that the boys who get into college are "luckier" than the girls is not numerate enough for me. The boys are further into the tail of the distribution, and I don't know what the tail looks like, for either boys or girls. (I suppose that we are assuming that the shape of the distributions is the same, something that I doubt in real life. At the very least, having only one X chromosome should make for more genetic variability among the boys.)

For simplicity, let's suppose that the effect of the water is the same, and that the distributions are triangular. Suppose that 10 boys get into college (1 + 2 + 3 + 4) and the 21 girls get in (1 + 2 + . . . + 5 + 6). Let's say that half of those who barely made it don't really belong in college, and drop out. That's 2/10, or 1/5 of the boys, and 3/21, or 1/7 of the girls. So a greater percentage of boys than girls drop out, but that is an artifact.

Giovanni: "doesn’t the apparently close empirical relationship between high school and university performance thoroughly undermine the idea of university education as a signal?"

Depends upon what is being signaled. In my father's day, and to some extent in mine, there were a lot of highly competent people who did not go to college or did not finish it. In his day a B. A. was a signal of class. In my day scholarships were more available, and college was expected of good students. College was considered as time before entering the work world in which to "find yourself", a kind of luxury for many students. The practice of using the lack of a college degree to eliminate job applicants was decried. Nowadays it is widely accepted, and scholarships are fewer, so the class signal is stronger again.

Min: if the boys' and girls' distributions of Ai are the same (which is what I was semi-explicitly assuming) then the boys who get into university are (on average) "luckier" than the girls who get into university.

If instead we assume that boys have a higher variance but the same mean as girls (which AFAIK is roughly realistic, but which I wouldn't say if I wanted to be university president), then I'm less sure about that. I *think* it would still be true that boys would perform worse at university, controlling for high school grades, but I would have to do some math to check. I can't quite follow your triangular distribution example.

On the signalling. OK, if high school grades were a perfect predictor of success at university, conditional on going to university, but were an imperfect predictor of who would go to university, then I think you are right: going to university could still be a signal. Good point.

@ Nick

About the triangular distribution example:

I hope this shows up right. ;

Boys Girls
* ***
** ****
*** *****
**** ******
------------------------------ Admissions cutoff

The triangles are the tails of the boys' and girls' distributions. 10 boys get admitted, 21 girls get admitted. 4 boys just make the cutoff, 6 girls just make the cutoff. If half of those who just make the cutoff drop out, without regard to gender, that's 2 boys and 3 girls, or 1/5 of the boys and 1/7 of the girls. A smaller proportion of girls drop out, even though dropping out is gender blind.

No, the triangles did not show up right. Let's try again.

Boys . . . . . . . . . . Girls
. . . . . . . . . . . . . . *
. . . . . . . . . . . . . . **
* . . . . . . . . . . . . . ***
** . . . . . . . . . . . . ****
*** . . . . . . . . . . . *****
**** . . . . . . . . . . ******
---------------------- Admissions cutoff

Thanks Min. Got it. Pictures help!

A uniform distribution, though even less realistic than your triangular distribution, makes it even clearer. If boys had the same mean but a higher variance than girls, and if there were nothing in the water at either schools or universities, and if Hi were an imperfect proxy for Ui, we would expect to see boys having better retention than girls, because there would be a smaller percentage of boys near the cutoff.

But it will depend on the shape of the distribution, and where the cutoff is. For example, if it were normal distributions, and the cutoff were very low, so more than half went to university, we would have more boys nearer the cutoff.

For a unimodal distribution, I think the direction of the effect will depend on whether the cutoff is above or below the mode.

There are then two effects: the one you are talking about, which is about the relative sizes of f(Hbar) and F(Hbar) and how that is affected by variance; and the "something in the water effect", which shifts boys' whole distribution f(U) to the left.

Nick, you say: "If a random subset of boys and girls went on to university, this would make the estimate for b smaller (errors in variables), but it wouldn't affect the estimate of c, which would still be zero."

That's not generally true. Consider the case you go on to present with U=A-S and H=A-S+e, and we regress U on a constant, H, and S. It is true that the coefficient on H is biased towards zero, but it isn't generally true that the coefficient on c is still centered on zero. One way to see this is to consider an extreme case in which we make the variance of e so large that almost all of the variance in H is due to variance in e, so H is almost just noise. Then we are regressing A-S on irrelevant noise and S, so c will be centered on something very close to -1. If on the other hand we let the variance of e approach zero, we'd be recover your case where we get b=1 and c=0, since we are regressing A-S on A-S and something else.

In the more realistic case in which U and H are both imperfect proxies, the bias in the coefficients depends on the covariance structure of the noise terms.

Here's a little simulation of the case in which H is noisy and U is not. Draw n=10,000 observations on a random dummy S, a standard normal ability term A, and noise in high school e. Then let U=A-S and H=A-S+e.

set obs 10000
gen s=uniform()>0.5
gen a=invnorm(uniform())
gen e=invnorm(uniform())
gen u=a-s
gen h=a-s+e

And now regress U on H and S (apologies for the screwed up formatting):

. reg u h s

Source | SS df MS Number of obs = 100000
-------------+------------------------------ F( 2, 99997) =31762.65
Model | 126072.803 2 63036.4013 Prob > F = 0.0000
Residual | 198454.844 99997 1.98460798 R-squared = 0.3885
-------------+------------------------------ Adj R-squared = 0.3885
Total | 324527.647 99999 3.24530892 Root MSE = 1.4088
u | Coef. Std. Err. t P>|t| [95% Conf. Interval]
h | .5001726 .0015813 316.31 0.000 .4970734 .5032719
s | -.5028246 .0047261 -106.39 0.000 -.5120877 -.4935614
_cons | .0049026 .0031496 1.56 0.120 -.0012706 .0110758

So in this little model the true effect of S on H and on U is -1, but this (misspecified) regression recovers an estimate of S on U, conditional on H, of about -0.5.

It turns out in this case that selection into university has little effect on the estimates, although I haven't sat down and figured out why. If we suppose that only kids with H>0 go to university,

gen dropout=h<0

and estimate using only the subsample who attend university, we get very little change in the estimated parameters:

. reg u h s if ~dropout

Source | SS df MS Number of obs = 25234
-------------+------------------------------ F( 2, 25231) = 5119.52
Model | 5125.49736 2 2562.74868 Prob > F = 0.0000
Residual | 12630.2238 25231 .50058356 R-squared = 0.2887
-------------+------------------------------ Adj R-squared = 0.2886
Total | 17755.7212 25233 .703670636 Root MSE = .70752
u | Coef. Std. Err. t P>|t| [95% Conf. Interval]
h | .4959091 .0061616 80.48 0.000 .483832 .5079862
s | -.4910323 .0099647 -49.28 0.000 -.5105636 -.471501
_cons | .008722 .0104105 0.84 0.402 -.0116832 .0291271

Chris: "That's not generally true."

Hmmm. You are right.

Damn. Someone did a blog post on that very topic just a few weeks back. It wasn't you, was it? I was sort of getting the intuition for it from that blog post, but couldn't fully understand it intuitively, and was too shy to ask dumb questions. Let's see:

Suppose you have Y = a + bX + cZ + e

Now suppose we start measuring X with error. That biases b towards zero. OK so far. What about c? Does c get biased away from zero if X and Z are positively correlated? (That sort of feels right to me.) And in my example, S and H *are* correlated.

Am I on the right track?

Nick: wasn't my blog post. I think it's correct to say that measurement error in X but not Z yields biased estimates of both coefficients if X and Z are correlated. But here I phrase the problem the issue differently: high school grades are an imperfect proxy for ability because the water affects boys in high school too, so the estimate on the coefficient on the sex dummy is still biased after even conditioning on high school grades.

oops, I meant, "But here I would phrase the problem differently...."


In the errors-in-variables case you describe OLS will produce an estimate of c such that:

(1) E(c^) = [b - E(b^)]cov(X,Z)/var(Z)

where b^,c^ = OLS estimate of b,c. This comes directly from the sum-of-squares minimization condition for c:

(2) cov(X,Z)b^ + var(Z)c^ = cov(Z,Y).

Suppose X = X* + n, where n = measurement error, and that the true relationship governing Y is:

(3) Y = a + bX* + s

Assuming n, s are independent of Z:

(4) cov(Z,Y) = cov(X*,Z)b = cov(X*+ n,Z)b= cov(X,Z)b

Using (4) in (2):

(5) cov(X,Z)b^ + var(Z)c^ = cov(X,Z)b

which gives (1) after taking expectations and rearranging.

So, yes, if X and Z are positively correlated then c^ will be biased away from zero to an extent proportional to the bias of b^. Intuitively, when X is large because X* is large OLS will tend to see Y, X and Z all large together. But when X is large because the measurement error n is large then OLS will tend to see X large with Y and Z close to their means. OLS will account for these latter instances in part by overestimating the influence of Z on Y.

The comments to this entry are closed.

Search this site

  • Google

Blog powered by Typepad