[Update: I'm getting an awful feeling this post may be totally wrong. Does (1/b)/[1+Var(V(i))/Var(Y(i))] = b if the Lamarck model is true? If so, my proposal won't work. Wish I could do math.] [Update2. Nope, I think it will work. But Majromax in comments has come up with a simpler way to do the same thing. See my 01:19 pm comment.]
Economists normally run a regression of son's income on father's income, and interpret the estimate beta as a measure of "intergenerational mobility". (High beta means low "mobility"). I think we could learn something interesting by reversing the regression -- run a regression of father's income on son's income -- and comparing the results of the two regressions.
This post is about: econometrics (errors in variables bias); genetics (regression toward the mean); and mobility/inequality. I don't know much about any of those three areas, so I will probably get things wrong. I am not 100% confident my proposal will work, but I would like people with better technical skills than me to take a look at it.
Ideally, we should all agree on how we would interpret the results before we run any regressions.
Forget economics for a minute. Consider two models of the relationship between son's height H(i) and father's height H(i-1). (Yes, I should have mother's height in there too, but then I would have to think about assortative mating, and I would get muddled.) All variables are defined as deviations from the population mean (so I can get rid of the constant terms).
1. "Darwin" model
H(i) = G(i) + V(i)
G(i) = G(i-1)
H(i-1) = G(i-1) + V(i-1)
Where G(i) is some unobserved characteristic shared by father and son, and V(i) is an uncorrelated random error.
2. "Lamarck" model
H(i) = b.H(i-1) + V(i)
Where 0 < b < 1 is a structural parameter.
If the variance of V(i) is constant over generations, and if each father has one son, both models would give a stationary distribution of height in a given population.
If an econometrician estimates the regression H(i) = B.H(i-1) + U(t), the estimate for B will be positive but less than one in both models. Both models will exhibit "Regression toward the mean". And both models will generate a "Great Gatsby Curve", in which the econometrician's estimate of B will be positively correlated with the population variance of height when we look at different populations.
If the Larmarck model is true, the econometrician's estimate B will be an unbiased estimate of the true parameter b.
But if the Darwin model is true, the regression suffers from an errors in variables bias. The econometrician wants to estimate H(i) = B.G(i-1) + U(t), but G(i-1) is an unobserved variable. And so the econometrician is forced to use H(i-1) instead, but H(i-1) is an imperfect proxy for G(i-1). So the estimate B will be less than one, and will be biased towards zero. The amount of downward bias depends on the ratio between the variance of V(i-1) and the variance of G(i-1).
There are two ways an econometrician can distinguish between these two models:
1. Look at the grandson too. (I think this is roughly what Gregory Clark did, and it's a pity Hemingway never wrote a book with the title "The grand sun also rises".) If the Darwin model is true, regression to the mean stops after the first generation. If the father is 10cm taller than average, and the son is 8cm taller than average, the grandson (and great grandson) will also be 8cm taller than average. But if the Lamarck model is true, the grandson will be 6.4cm taller than average, and the great grandson 5.1cm taller than average. (I'm ignoring mothers, of course, so this would only be strictly true under perfect assortative mating.)
But we need data on three generations to do this.
2. Reverse the regression. Regress father's height on son's height. If the Darwin model is true, and if the variance of V(i) is constant over time, you should get exactly the same estimate B whether you regress son's height on father's height, or regress father's height on son's height. Because you get exactly the same errors in variables bias in both directions.
But if the Lamarck model is true, you should get a biased estimate for 1/b if you reverse the regression, because the regression is correctly specified if you run it the normal way, but has an errors in variables bias if you reverse the regression. If the Lamark model is true, the estimated coefficient R in the reverse regression should equal (1/b)/[1+Var(V(i))/Var(Y(i))]. And the econometrician will already have estimated b and Var(V(i))/Var(Y(i)) when running the regression the right way round.
So, by reversing the regression of son's height on father's height, the econometrician can look at both equations and learn which of the two models fits the facts. (Or better, we could construct a hybrid model which is a weighted average of Darwin and Lamarck, and estimate the weights on Darwin and Lamarck.)
And we only need data on two generations to do this.
What's this got to do with economics? Just replace height H(i) with income Y(i). What causes son's income to be positively correlated with father's income? What causes (lack of) "mobility"?
One theory ("Lamarck") is that father's income, plus "luck", determines son's income. Sons inherit their father's luck. If the father wins the lottery, or gets a good job just by being in the right place at the right time, that makes the son richer too. Son's income is determined only by his own luck, his father's luck, his grandfather's luck, and so on. Ability is not inherited.
The second theory ("Darwin") says that both son's and father's income are determined by some shared genotype, plus their own luck. (Sons are not clones of their fathers, given sexual reproduction, but I don't want to get into the complications of assortative mating, and what I said about regression toward the mean working the same in both directions, for just one generation, still holds true.) If the father wins the lottery, or gets a good job just by being in the right place at the right time, that has no effect on the son's income.
Or maybe it's a bit of both theories. But how much of each?
Why does it matter? Actually, I'm not sure it does matter. If "Darwin" were 100% true, that still leaves open the question whether the shared genotype is "ability to produce useful goods" or "ability to rip off others". If "Lamarck" were 100% true, that still leaves open the question whether fathers use their income to invest in their son's "ability to produce useful goods" or "ability to rip off others".
But maybe it has some policy implications, of what policies would or would not change the variance of the population distribution, and how much. Banning lotteries, for example, would cause a bigger reduction in inequality if the Lamarck model were true than if the Darwin model were true.
I thank Miles Curtis for his comments via email on a draft. Miles tells me that researchers in this area normally interpret Y(i-1) as father's lifetime income, and if we interpret it as father's income in any particular year then estimates of the Lamarck model will also have an errors in variable bias, if it is father's lifetime income that causes son's income. That sounds right to me, but I am unsure how it would affect my proposal here. Miles is not responsible for my opinions and mistakes in this post.
I really wish I could get my head around the mothers question.
No, I don't think this will work.
Your question is "can we gain anything new by time-reversing the system?" Unfortunately, without introducing a third generation, the answer is 'no'.
Let's look at your Darwin and Lamarck equations, but drop the time indices:
A = cG + [V]
B = cG + [V]
and
A = cB + [V]
B = B
The first set is obviously time-independent, and the second set isn't, but for Lamarck:
cB = A - [V] ->
B = (1/c) A + [-V/c] = (1/c) A + [V']
and for Darwin:
cG = A - [V] ->
G = (1/c)A + [-V/c] ->
B = (c)(1/c)A + c[-V/c] + [V] ->
B = A + [V] + [V] (keeping in mind that random variables don't add neatly; sqrt(2)[V] if the process is Gaussian)
these processes look nearly identical; the only way to distinguish Lamarck from Darwin is that the latter has a coefficient of unity on the slope of the intergenerational regression, whereas the former may not.
Ultimately, your question is whether income-over-generations is a random walk (Lamarckian model, where transient shocks persist) or independently distributed (Darwinian model). Unfortunately, this is not at all easy when looking at pairs.
In the worst-case, the Lamarckian model is "right", but with a strong intergenerational correlation. Here, the only way to discern the difference will be by the amount of residual variance -- a Darwinian model will produce seemingly more variance as outlier results revert to the "genetic" mean.
Posted by: Majromax | November 04, 2014 at 10:22 AM
Well, biologists learned something by reversing the regression. They learned that regression towards the mean is an error. It is an error because Galton thought that it was a biological phenomenon instead of a mathematical one. Reversing the regression revealed that, while tall fathers tended to have sons who were less tall, tall sons also tended to have fathers who were less tall.
I expect that the same thing would true about, say, incomes at age 40, but if it did not, that would be a real find. :)
Posted by: Min | November 04, 2014 at 10:40 AM
Thanks for checking on this Majro. But I'm not sure I understand you, and I'm not sure you're right:
"the only way to distinguish Lamarck from Darwin is that the latter has a coefficient of unity on the slope of the intergenerational regression, whereas the former may not."
When we reverse regress the Darwin model, we are estimating:
H(i-1) = H(i) - V(i) + V(i-1), observing only H(i-1) and H(i)
But V(i) will be positively correlated with H(i), so the estimated coefficient will be biased towards zero, and so will be less than unity. Just like if we estimate Y = bX + e, if e is correlated with X, our estimate of b will be biased.
Posted by: Nick Rowe | November 04, 2014 at 10:46 AM
Min: "Well, biologists learned something by reversing the regression. They learned that regression towards the mean is an error. It is an error because Galton thought that it was a biological phenomenon instead of a mathematical one."
Bingo! That's what I hoped to hear. Because what I'm saying is that if my "Darwin" model of income is true, then regression of income to the mean is a mathematical (econometric) phenomenon, and not an economic phenomenon. But if my "Lamarck" model is true, then regression to the mean is an economic phenomenon.
I didn't know that biologists had already done the same thing (reversed the regression). (And probably did it 100 years ago too.) If biologists can use it to distinguish Darwin from Lamarck, so can economists. (But I don't yet fully understand the econometrics.)
Posted by: Nick Rowe | November 04, 2014 at 10:56 AM
> But V(i) will be positively correlated with H(i), so the estimated coefficient will be biased towards zero, and so will be less than unity. Just like if we estimate Y = bX + e, if e is correlated with X, our estimate of b will be biased.
Hmm, maybe we can get somewhere with differences?
For the Darwinian model, H(i) - H(i-1) = V(i) - V(i-1) = V(i) + G - H(i-1). For a single generation but across families, we have many observations of H pairs, but V(i) and G are unknown. This suggests that we should get a correlation of slope -1 between income increment and father's income, but that correlation strength should be relatively weak since it depends on inter-family and individual random factors.
For the Lamarckian model, H(i) - H(i-1) = V(i) + (b-1) H(i-1). Here, we'll still get a negative correlation (b<1), but the slope will be flatter than -1. For absolute Lamarckian inheritance (b=1), then there would be no correlation at all between intergenerational differences and the father's income.
Posted by: Majromax | November 04, 2014 at 11:47 AM
Of course, your models have nothing to do with Darwinian theory or Lamarckian theory. ;)
Main text: "1. "Darwin" model
H(i) = G(i) + V(i)
G(i) = G(i-1)
H(i-1) = G(i-1) + V(i-1)
Where G(i) is some unobserved characteristic shared by father and son, and V(i) is an uncorrelated random error."
Things would be clearer of you simply used G, since it is a constant. And you are also assuming that the distribution of V is the same, so simply using V would also be clearer.
That gives us this model:
H = G + V
for both father and son.
Main text: "2. "Lamarck" model
H(i) = b.H(i-1) + V(i)
Where 0 < b < 1 is a structural parameter."
Again, it would be clearer to write
H(i) = b.H(i-1) + V.
Note that this actually is a model of regression towards the mean. H approaches 0 in the long run. Then we should expect that the variance of H will approach the variance of V in the long run. If b is small enough, then an F test of the variances of H(i) and H(i-1) should reveal this "actual regression".
Posted by: Min | November 04, 2014 at 11:57 AM
Nick Rowe: "I didn't know that biologists had already done the same thing (reversed the regression). (And probably did it 100 years ago too.) If biologists can use it to distinguish Darwin from Lamarck, so can economists."
I don't know who discovered Galton's error. I wish that the term would have died out by now. It clouds men's minds and leads people to think that Galton was right. But the discovery of Galton's error did distinguish Darwinian theory from Lamarckian theory. It distinguished biology from mathematics.
As for deciding between your models, a good ole F test will do, as I indicated in my previous note. :)
Posted by: Min | November 04, 2014 at 12:05 PM
I mean, I wish that "regression to the mean" had died out. We are stuck with "regression" but that's not so bad.
Posted by: Min | November 04, 2014 at 12:23 PM
Majromax: Hmmm. I *think* you might be onto something there. Changing your proposed "difference" method just slightly:
First we estimate H(i)= B.H(i-1) + residual
Then we estimate [H(i-1)-H(i)] = C.H(i-1) + residual
If we find that B+C=1 then Lamarck is right.
If we find that B < C = 1 then Darwin is right.
Which is a lot simpler than reversing the regression.
Is that right?
Posted by: Nick Rowe | November 04, 2014 at 01:19 PM
I wrote: "H approaches 0 in the long run. Then we should expect that the variance of H will approach the variance of V in the long run."
I suspect that that is wrong, that the variance of H in the limit will depend upon both V and b.
But the variance of H will still diminish in the long run, and the F test for equality of variances will be appropriate.
BTW, it is easy to see that H -> 0 in the limit in model 2. Just run the model backwards.
Posted by: Min | November 04, 2014 at 01:31 PM
Min: "I wish that the term ["regression to the mean"] would have died out by now. It clouds men's minds and leads people to think that Galton was right. But the discovery of Galton's error did distinguish Darwinian theory from Lamarckian theory. It distinguished biology from mathematics."
Totally agreed. It clouded my mind for over 40 years, ever since I read about it as a teenager. I bet it still clouds the minds of many economists (and not just economists).
"Of course, your models have nothing to do with Darwinian theory or Lamarckian theory. ;)"
Well, I wouldn't say "nothing". My "Lamarck" theory does have (partial) inheritance of acquired characteristics/wealth, while my "Darwin" theory does not! That's a good enough metaphor for me.
"Things would be clearer of you simply used G, since it is a constant. And you are also assuming that the distribution of V is the same, so simply using V would also be clearer."
We have pooled cross-section/time series (panel) data. G does not differ between father and son, but it does differ between father and father. And the distribution of V is the same, but each individual gets a different draw from that distribution.
"If b is small enough, then an F test of the variances of H(i) and H(i-1) should reveal this "actual regression"."
You lost me there. If the Lamarck model has been running for a long time, so we can ignore initial conditions, the variance of H(i) should be the same as the variance of H(i-1).
Posted by: Nick Rowe | November 04, 2014 at 01:35 PM
Nick Rowe: "My "Lamarck" theory does have (partial) inheritance of acquired characteristics/wealth, while my "Darwin" theory does not! That's a good enough metaphor for me."
OK, bueno! ;)
Moi: "Things would be clearer of you simply used G, since it is a constant. And you are also assuming that the distribution of V is the same, so simply using V would also be clearer."
Nick Rowe: "We have pooled cross-section/time series (panel) data. G does not differ between father and son, but it does differ between father and father. And the distribution of V is the same, but each individual gets a different draw from that distribution."
Yeah, but your subscripts are temporal.
Moi: "If b is small enough, then an F test of the variances of H(i) and H(i-1) should reveal this "actual regression"."
Nick Rowe: "You lost me there. If the Lamarck model has been running for a long time, so we can ignore initial conditions,"
Ah! the economist's penchant for assuming that we have reached the limit! (Or close enough.)
Nick Rowe: "the variance of H(i) should be the same as the variance of H(i-1)."
OK, a couple of toy examples. Suppose that we have a binomial distribution, without the random effects, of four mothers (who reproduce asexually), with values 1, 0, 0, -1. And the random effects are +/- 0.1, yielding observed values of 1.1, -0.1, -0.1, -0.9. And suppose that b = 0.5.
Then for the daughters we have values before the random effects of 0.55, -0.05, -0.05, -0.45. Then say that the observed values are 0.65, 0.05, -0.15, -0.55. (I have cheated a little to make the variance large.) Obviously, the variance for the daughters is smaller than the variance for the mothers.
Now let's assume that the model has been running for a long time, and that we have values for the mothers without random effects of 0.1, 0, 0, -0.1, with random effects of +/- 1.0, yielding observed values are 1.1, -1, -1, 0.9.
Then for the daughters we have values without random effects of 0.55, 0.45, -0.5, -0.5. Then say that the observed values are 1.55, -0.55, 0.5, -1.5.
It looks like the variance for the daughters is greater than that of the mothers. So in this case the F test is not going to be a good guide. But the situation is dominated by the random effects. I do not believe that that is the kind of model that you are interested in. The random effects should be small relative to other effects, so that we can consider them to be errors, and in that case the variance of H should decrease over time.
Posted by: Min | November 04, 2014 at 05:01 PM
Oh, yes, remember my proviso that 'b' be small enough. If it is close to 1, then the between generation effect on the variance should be small.
Posted by: Min | November 04, 2014 at 05:05 PM
OK, let b = 1, so that model 2 is a kind of random walk. In that case the next generation will look like the current generation in either case. However, both the mean and the variance of the random walk will drift, and that drift should be perceptible over time. Like a century or two. :)
Now, in the case of the U. S., the variance has increased over the past several decades, but the difference does not look like drift, but like something else is going on that is not in either model, but is not random, either.
Posted by: Min | November 06, 2014 at 03:11 AM
I'd be grateful if you would clarify whether this relates to "total least squares," which I think minimizes the sum of the squared residuals from the distance point-line, rather than the distance point-projection. It's something I've been meaning to look into. ;-)
also of relevance, I think, the linear regression that goes under the name of Deming regression, which is based on the distance point-horizontal_projection (as opposed to the standard vertical projection)
Posted by: annoporci | November 07, 2014 at 12:46 PM
annoporci: Hmmm. I would love to clarify. But I know even less than you about "total least squares" (which sounds like halfway between running it forwards and in reverse) and "Deming" (which sounds like running it in reverse).
Posted by: Nick Rowe | November 07, 2014 at 01:06 PM
Slightly off topic, but one thing I took away from Clark's book is that if the true status is merely proxied by income (aka G(i) in this post), then the Great Gatsby curve will provide (almost?) no useful information. Because the lower the income inequality, the less of a predictor income is of status, and the higher the income inequality the better predictor it is of status. The Great Gatsby curve becomes just an indicator of how much does income correlate with true social status.
Posted by: Felipe | November 07, 2014 at 01:59 PM