Econometricians spend their lives trying coming up with new and better estimation techniques. Some of the ideas are excellent but impractical ("Just find a suitable instrumental variable"), and some complicate matters for minimal benefit (some argue that using logit or probit rather ordinary least squares estimation falls into this category).
So when I stumble across some well-intentioned econometric advice, one of first questions I ask myself is: does it matter?
Yesterday I was working on a paper where we estimate a model like this:
where X are continuous variables and D are dummy variables (variables that take on a value of either zero or one, e.g. male=0, female=1.)
People usually interpret the coefficients in such models as percentages. For example, if X was age and b was 0.02, I would write something like, "Income increases by 2 percent with each additional year of age."
Dave Giles - in a post entitled Dummies for Dummies -argues that, while this approach is fine for continuous variables like age, it is inappropriate for dummy variables like gender. As he puts it:
So, according to Giles, if I have a variable that takes on a value of 0 for male and 1 for female, and the coefficient in my regression is something like -0.1, I shouldn't say "females earn ten percent less than males, all else being equal." I should get out my spreadsheet, type the formula = exp(-.1)-1, and let it tell me that women actually earn 9.5 percent less than males.
It's a hassle to do this calculation for every variable in the model. So, honestly, is it worth it? (After all, what are the odds of getting a referee who will catch you up on this?)
I did a few calculations to work it out.
(this was generated by going to www.wolframalpha.com and typing in plot x from x=-1 to 1 and plot exp(x)-1 from x=-1 to 1) and then using the screen capture macro command-shift-4 on my mac).
That picture shows that, for coefficients between -0.2 and 0.2, it really doesn't make much difference if one takes the extra step and calculates the percentage changes directly. For coefficient values above 0.5, it makes quite a difference.
Note, too, that interpreting coefficients on dummy variables as percentages causes asymmetric errors. If the coefficient is positive, and we're talking about a swtich in the dummy variable from 0 to 1, then the coefficient-as-percentage-change approach cases one one to underestimate the true impact of a change in the dummy variable. On the other hand, if the coefficient is negative, or we're talking about a switch from 1 to 0, the coefficient-as-percentage-change approach overestimates the true effect.
Here's another picture, this one showing how much the coefficient-as-percentage approach under- or over-estimates the true impact of a change in a dummy variable:
No. But if the coefficient is big enough - if it's above 0.3 or 0.5, say - I'll at least think about doing so.
(The hat tip goes to my co-author Casey Warman, who insists on calculating the marginal effects properly, even for coefficients of 0.06).