You can follow this conversation by subscribing to the comment feed for this post.

Frances: Nice post. I knew we weren't that far apart!

BTW: I used to play in a rock band about 45 years ago. In the unlikely event that we re-form, I'll be pushing for calling us "mfx".

Frances, thanks for these posts. It has been so long since I have done any empirical work that I don't even really know how to start...these help.

"It is a social norm: a husband should be taller than his wife."

Really? That last time someone told me that was my grandmother when I was in grade school. I thought that it was old-fashioned when I first heard it. (I am not Canadian, though. :))

Suppose that a woman is 5'4" tall. A man who is 6'1" is taller than she, but so is a man who is 5'6". Yet she may well prefer the taller man. (In fact, tallness in men is generally a favored trait.) Even if our only explanation is in terms of physical characteristics, tall men may be favored in the "marriage market", even without a social norm that says that husbands should be taller than their wives. Maybe the point is that they are taller than other men. Any comparison of men to men misses that distinction.

Good post Frances. A few thoughts:

- Logit, probit, and LPM are all generally misspecified models. The correct question is not which one is "correct" (we know the answer to that question), the question is which is most useful, or perhaps least misleading, in some specific context.

- In any context in which LPM is clearly the wrong model it is probably also true that probit and logit are also no good. You want to use a semiparametric or other flexible approach if you're really worried about this aspect of the specification. Such estimators are now routine in estimating propensity scores, for example, a context in which LPM is usually no good.

- Using a robust covariance matrix estimator gets around the heteroskedasticity problem introduced by LPM.

- Dealing with interaction terms in nonlinear models is a pain. I think I'll put something on my blog about this issue later today.

- I think the example I put up of how to easily produce nice tables in Stata was a little overly complex. Try removing the looping over estimators, ie, just write in "regress" or whatever you like where I've got `estimator'. A block of code to produce a nice table in .html looks like:

regress y x1
estimates store e1
regress y x1 x2
estimates store e2

esttab *, b(%8.3f) t(%7.2f) html

Dave, Linda, Chris - thank you for the comments, and for the code too. It's really valuable to get your reaction.

Min - a couple of students in class today had reactions similar to yours.

Yes, there is a literature suggesting that, within a relatively homogeneous population, height is correlated with health and other good outcomes, presumably via nutrition and early environment. So a preference for height could be just a preference for health etc. This tends to be the way that the biological literature approaches the issue - women prefer taller men (up to a point) because height signals health.

And it is also true that, when you look at women, you don't pick up a strong inverse height/marriage relationship, though the biological literature that I've read finds that very tall women have a lesser chance of "reproductive success" (as the biologists put it), and the women with the maximum fertility are those with slightly below average height, see for example, http://rspb.royalsocietypublishing.org/content/269/1503/1919.short

Great post Frances.

Before running the regression, I like to have a quick look at what the "descriptive data" looks like. In this case, I would probably have created height categories (1.50 to 1.55 m, 1.55 to 1.60m , etc..) and graphed the actual proportion of married men by height category.

I think it adds to the story to be able to say "Men that are 1.80m or more are married 10 percentage point more often than men that are 1.60m or less. What explains this?" Then a Blinder–Oaxaca decomposition would (if this were an OLS model) allow us to say that half of this difference is explained by income or age and the other half is only explainable by the "height difference"

Also, if you're interested in computing "average marginal effect" instead of "marginal effect at the mean", you might want to consider the *margeff* stata function. I'm not sure if it is still up to date as I havent used Stata in a while. It is presented here:
http://econpapers.repec.org/software/bocbocode/s445001.htm

SimonC, yes, that would be a very good idea. I was thinking of adding height squared to capture non-linearities in the effect of height, but I just got so carried away with those diagrams. Alternatively one could include dummy variables indicating that a person is in either tail of the height distribution.

As someone whose wife thinks she's taller than I am, I found this interesting. Maybe my higher income makes up for it.

Now can you explain what I fairly often observe, the pairing up of overly tall men with relatively short women?

Jim - in response to your second point - perhaps there's a marriage market penalty to being overly tall. When I updated the table to include a height squared term it fit the data much better, suggesting that the height/marital status relationship is indeed non-linear.

Just a nit-pick - In Quebec, cohabitation is most certainly not legally the same as marriage:

http://www.justice.gouv.qc.ca/english/publications/generale/union-a.htm

In much of the rest of Canada cohabitation does, in the limit, approach being 'real' marriage.

But I get what you meant - as a practical matter, people in QC who think of themselves as married are, more often than not, not in fact married.

Patrick -

Yes, you're right, and I should have been clearer on that, and I've changed the post. What I had in my mind is that in terms of economic and other variables, cohabiting couples in Quebec are more like married couples in the rest of Canada. E.g. if you have a model and you have "married" and "cohabiting" as two different explanatory variables, then it's a good idea to include an interaction variable cohabiting*Quebec to capture the fact that cohabiting couples in Quebec are different from cohabiting couples elsewhere.

But they're different in part because people choose cohabitation in Quebec in order to avoid particular property regimes.

Thanks Frances, I wish I had read this on Wednesday when you wrote it! I was working with one of our policy guys who had run a logit regression and he was asking me to help him with the specification and the problem of collinearity amongst his variables (parallel in this case is the height-income relationship). I have to admit I was getting myself stuck a step before that with fully understanding the coefficients and how to present the results. It's one thing to conclude the model specification is correct but it still needs to be explained in a verbal language!

As you say, Who/what does my data apply to - and how is that understood by my audience? Choose the model carefully (grounded in theory) and be able to explain it in a way that makes sense. At the end of the day, he went with a set of scenario's to present as comparators e.g. Tall & Rich in Ontario vs Short & Rich in Ontario vs Tall & Rich in PEI.

Key point, and worth stressing numerous times, there needs to be theory first (otherwise use OLS and allow calculations of 115% - without theory, why not? The coeeficients would even look nice y = .43 + .02*Height + 0.00005*AnnualIncome + .35*Maritimes - .2*Prairies + .005*Age)

The comments to this entry are closed.

• WWW