[Update: I'm getting an awful feeling this post may be totally wrong. Does (1/b)/[1+Var(V(i))/Var(Y(i))] = b if the Lamarck model is true? If so, my proposal won't work. Wish I could do math.] [Update2. Nope, I think it will work. But Majromax in comments has come up with a simpler way to do the same thing. See my 01:19 pm comment.]
Economists normally run a regression of son's income on father's income, and interpret the estimate beta as a measure of "intergenerational mobility". (High beta means low "mobility"). I think we could learn something interesting by reversing the regression -- run a regression of father's income on son's income -- and comparing the results of the two regressions.
This post is about: econometrics (errors in variables bias); genetics (regression toward the mean); and mobility/inequality. I don't know much about any of those three areas, so I will probably get things wrong. I am not 100% confident my proposal will work, but I would like people with better technical skills than me to take a look at it.
Ideally, we should all agree on how we would interpret the results before we run any regressions.
Forget economics for a minute. Consider two models of the relationship between son's height H(i) and father's height H(i-1). (Yes, I should have mother's height in there too, but then I would have to think about assortative mating, and I would get muddled.) All variables are defined as deviations from the population mean (so I can get rid of the constant terms).
1. "Darwin" model
H(i) = G(i) + V(i)
G(i) = G(i-1)
H(i-1) = G(i-1) + V(i-1)
Where G(i) is some unobserved characteristic shared by father and son, and V(i) is an uncorrelated random error.
2. "Lamarck" model
H(i) = b.H(i-1) + V(i)
Where 0 < b < 1 is a structural parameter.
If the variance of V(i) is constant over generations, and if each father has one son, both models would give a stationary distribution of height in a given population.
If an econometrician estimates the regression H(i) = B.H(i-1) + U(t), the estimate for B will be positive but less than one in both models. Both models will exhibit "Regression toward the mean". And both models will generate a "Great Gatsby Curve", in which the econometrician's estimate of B will be positively correlated with the population variance of height when we look at different populations.
If the Larmarck model is true, the econometrician's estimate B will be an unbiased estimate of the true parameter b.
But if the Darwin model is true, the regression suffers from an errors in variables bias. The econometrician wants to estimate H(i) = B.G(i-1) + U(t), but G(i-1) is an unobserved variable. And so the econometrician is forced to use H(i-1) instead, but H(i-1) is an imperfect proxy for G(i-1). So the estimate B will be less than one, and will be biased towards zero. The amount of downward bias depends on the ratio between the variance of V(i-1) and the variance of G(i-1).
There are two ways an econometrician can distinguish between these two models:
1. Look at the grandson too. (I think this is roughly what Gregory Clark did, and it's a pity Hemingway never wrote a book with the title "The grand sun also rises".) If the Darwin model is true, regression to the mean stops after the first generation. If the father is 10cm taller than average, and the son is 8cm taller than average, the grandson (and great grandson) will also be 8cm taller than average. But if the Lamarck model is true, the grandson will be 6.4cm taller than average, and the great grandson 5.1cm taller than average. (I'm ignoring mothers, of course, so this would only be strictly true under perfect assortative mating.)
But we need data on three generations to do this.
2. Reverse the regression. Regress father's height on son's height. If the Darwin model is true, and if the variance of V(i) is constant over time, you should get exactly the same estimate B whether you regress son's height on father's height, or regress father's height on son's height. Because you get exactly the same errors in variables bias in both directions.
But if the Lamarck model is true, you should get a biased estimate for 1/b if you reverse the regression, because the regression is correctly specified if you run it the normal way, but has an errors in variables bias if you reverse the regression. If the Lamark model is true, the estimated coefficient R in the reverse regression should equal (1/b)/[1+Var(V(i))/Var(Y(i))]. And the econometrician will already have estimated b and Var(V(i))/Var(Y(i)) when running the regression the right way round.
So, by reversing the regression of son's height on father's height, the econometrician can look at both equations and learn which of the two models fits the facts. (Or better, we could construct a hybrid model which is a weighted average of Darwin and Lamarck, and estimate the weights on Darwin and Lamarck.)
And we only need data on two generations to do this.
What's this got to do with economics? Just replace height H(i) with income Y(i). What causes son's income to be positively correlated with father's income? What causes (lack of) "mobility"?
One theory ("Lamarck") is that father's income, plus "luck", determines son's income. Sons inherit their father's luck. If the father wins the lottery, or gets a good job just by being in the right place at the right time, that makes the son richer too. Son's income is determined only by his own luck, his father's luck, his grandfather's luck, and so on. Ability is not inherited.
The second theory ("Darwin") says that both son's and father's income are determined by some shared genotype, plus their own luck. (Sons are not clones of their fathers, given sexual reproduction, but I don't want to get into the complications of assortative mating, and what I said about regression toward the mean working the same in both directions, for just one generation, still holds true.) If the father wins the lottery, or gets a good job just by being in the right place at the right time, that has no effect on the son's income.
Or maybe it's a bit of both theories. But how much of each?
Why does it matter? Actually, I'm not sure it does matter. If "Darwin" were 100% true, that still leaves open the question whether the shared genotype is "ability to produce useful goods" or "ability to rip off others". If "Lamarck" were 100% true, that still leaves open the question whether fathers use their income to invest in their son's "ability to produce useful goods" or "ability to rip off others".
But maybe it has some policy implications, of what policies would or would not change the variance of the population distribution, and how much. Banning lotteries, for example, would cause a bigger reduction in inequality if the Lamarck model were true than if the Darwin model were true.
I thank Miles Curtis for his comments via email on a draft. Miles tells me that researchers in this area normally interpret Y(i-1) as father's lifetime income, and if we interpret it as father's income in any particular year then estimates of the Lamarck model will also have an errors in variable bias, if it is father's lifetime income that causes son's income. That sounds right to me, but I am unsure how it would affect my proposal here. Miles is not responsible for my opinions and mistakes in this post.
I really wish I could get my head around the mothers question.