« Trill Perpetuities, and dynamic inefficiency under uncertainty | Main | A libertarian at liberty »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Yup. After several diagnostic tests are executed and reported and perhaps a graph of residuals is added to the paper, I'm all in favour of getting the full distribution of marginal effects with dummy variables right.

But I would prefer not to see sophisticated dummy variable coefficient estimations as an excuse to hand wave and ignore the fundamental statistical modelling issues where applied inference implicitly and almost always assumes normal distribution.

But then the reputation of applied econometricians appears best explained by an open access model (no rules) that inevitably converges to a social dilemma of untrustworthy output. The incentives to cheat and hand wave are simply overwhelming.

westslope - the thing is, there often isn't an easy test for fundamental modelling issues - statistical or otherwise - something that tells you whether you've done it wrong or right. Things like sample selection and treatment of missing observations often make a huge difference to the results of an analysis, but because they're more of an art than a precise science, they receive little treatment in econometrics courses.

There is a solution to untrustworthy output, though, isn't there? By which I mean requirements that people make their data available to other researchers, more replication studies, more academic points for replication studies, presentation of results for multiple different specifications/samples - with referees appendices and on-line appendices if necessary.

Frances - Nice post; and the second para. of your response to westslope is right on target!

My only gripe relates to your comment: "After all, what are the odds of getting a referee who will catch you up on this?" I'm sure you wouldn't dream of driving under the influence! :-)


Isn't there the same problem with continuous variables as with dummy variables? The percentage change interpretation only works when considering infinitesimal changes so a one year increase in age leads to x% change isn't technically correct either. The only difference is that it's more practical to conduct the proper analysis for two value dummy variables then with infinitely valued continuous variables. Technically we should calculate the marginal effects at different values but we never do, so if we don't practice the proper care with continuous variables then why should we necessarily do so with dummy variables.

Joseph, you're right. Suppose the 'continuous' variable is years of education, and one is interested in the impact of getting an undergraduate degree on earnings - dx is not necessarily a good approximation of the impact of a change from 12 to 16 years of education.

The problem I see with calling it a probability in some cases is that it suggests that all individuals have that lower probability. Say you're looking at employment. If AD is lower you could do a bunch of math and say "the probability of employment (or of being employed) decreases by x% with for each percentage decline in AD" when the effects are at the margin. Some people are almost guaranteed to be laid off if they are in marginal positions/companies in relevant sectors, but other workers can count on keeping their jobs. Often, it's best just to say "the level of employment is x% lower." I think selection models are an interesting way around this, but labour markets aren't perfect in that way either. Of course if the entire market is said to pay labour at its productivity level, then things can be made more complicated, with both wage and labour effects in various sectors and groups of workers... Similarly, you could try to say that a policy increases the probability of employment, but the effects are targeted rather than being an average probability across the population (a fact that can be exploited). Not sure if it's the best example, but sometimes I read stuff that makes me wonder about the way the causal effects are viewed as expressing themselves (or sometimes even the implied direction of causality) in the actual economy.

As for the difference between the 10% and 9.5%, being aware of that, and also knowing that things are not perfect in other ways, you could say "a quick calculation shows that..." or perhaps "for a relatively low effect we can interpret this as a ..." but then where's the cutoff where it's important? 5%? 25%? I'd say if you plan on using decimal places of any sort, and you know a method will give a more precise result then you should always use the more precise method. Imagine this: you say the effect is 6.1% but you know it's 5.7% (guessing) using the other approach. Saying "around 6%" is still fine, but saying "6.1%" is both unprecise and misleading about the precision of your results due to the use of decimal places. Also important, is whether it's being used simply to show the direction of change or whether the results are fed into other parts of other models to reach some other conclusions. It sometimes seems as though folks will carry on numerous such estimations then presenting their results to three or four decimal places, with no apparent awareness of how "significant digits" communicate precision. Fine enough for a pedagogical example to show the direction of change, or as a side note about a direction of change and a loose indication of magnitude, but not as an input for further procedures...

Lots of the time, probably not a big deal.

Surely what matters is the size of the uncertainty in the co-efficient. If the co-efficient is 10% +/- 8% then getting it correct at 10.5% is meaningless. OTH, if the data nail the co-efficient precisely, then you have to calculate it at least as precisely. Far more important is the error estimation on the parameter.

Chris J, Nathan: Your general point about reporting an appropriate number of significant digits is a good one. One of my pet peeves is people who are estimating a dependent variable that is in dollars, and then report coefficients to three decimal places (you're talking 1/10th of a cent here - no, it doesn't matter).

It's also the case that, if one is transforming the reported coefficient as suggested by Dave Giles above, it's probably a good idea to transform the error terms as well. E.g. a coefficient that's between 0.2 and .3 with 95% confidence represents a proportional change of X to Y with 95% confidence.

At the same time, it's not an either/or - it's possible to do a better job in terms of reporting marginal effects accurately *and* pay attention to reporting the appropriate number of significant digits.

I don't see why finding an instrumental variable is impractical. Surely we agree there is no dearth of economic data, especially financial data. How spoiled can we be? Sure, one needs to think outside the center, be creative and clever, and tell a compelling story, but that is the same criteria for good theoretical research. Finding a good instrument is akin to writing a good theory: they both require significant thought and effort.

Endogeneity abounds, so finding a suitable instrument, aside from running a barrage of regressions that increasingly isolate the partial effect of interest and that indicate the same story is being told, is really the only correct way to isolate the exogenous variation of a partial effect and to get nearer to the true coefficient. In light of today's increasingly complex microfounded macromodels, which often times use empirical micro studies parameter estimates as inputs, I think taking the time to get the coefficient right is worth it.

The discusion is about how close log (s1/s2) is to (s1/s2). The two are close enough when s1/s1 is within +0.01 and – 0.01, but grows apart beyond the range.

But things are more complicated than this. The estimates from the model using log earnings as the outcome will be different from the estimates in the model using earnings as the outcome because the log transformation changes the distribution of the outcome variable. Taking exponential of the both side does not give the same results as using (un-logged) earnings as the outcome. So in the end, no matter what you do with the coefficients from the logged model, we are still not certain to what extent the coefficients represent the ‘real’ group differences in earnings.

I think a perfectly legitimate although underused alternative interpretation could be as follows (assuming female=1 and its coefficient is -0.06): Females earn exp(-0.06) times as much as males, holding constant other independent variables (provided there are no interaction terms).

On that note, I think it is very useful to think even harder about the interpretation of a dummy variable in a model with a log dependent variable when there is an interaction term as is often the case for such models. 100*[(exp(b1+b2*X))-1]

Colin: "I don't see why finding an instrumental variable is impractical."

Sure, there are some papers out there, e.g. Nathan Nunn's work on aid and conflict in Africa, that use really smart and convincing instrumental variables. But I would say that 80%+ of the papers that I see with instrumental variable techniques use some instrument that leaves me scratching my head and saying "really?" because the connection between the instrument and the variable being instrumented is so tenuous I just can't believe the results.

Feng - absolutely. It's difficult to overstress the importance of doing exploratory analysis - just plotting the distribution of the dependent variable and asking yourself 'what does this look like? does it look normal? if I take a log does it looks vaguely normal?" Often it's worth considering doing some kind of quantile regression.

primedprimate - very true.

I wasn't so much going on about sig figs as I was about uncertainty estimation. Data are only so good and hearing a discussion about how to calculate parameters without a discussion of how to calculate errors is a bit odd to a physicist.

If you do a chi square estimation you have a clear.way to define both goodness of fit and get the error on the derived parameters and their covariances.

Discussion of errors in parameters is trickier. And if the coefficients have non linear transformation to the meaningful values then so do the unvertainties.

frances - Making data available for replication is an excellent idea. Been talked about for decades.

But with all due respect, I don't buy your argument that simple diagnostic testing or a graphic representation of residuals is all that difficult to do. It is not just "exploratory analysis" that should go unreported in the final paper. An explicit description of the data mining process might also be helpful.

Agreed on instrumental variables.

Modeller choices between different configurations of essentially the same model is another complication.

Part of the problem is that even within the economics profession, most economists, including academics, are not steeply schooled in econometrics and so their ability to assess modelling and inference exercises is limited. Then there is the widespread cynicism that tends to reject the statistical modelling results out of hand.

The comments to this entry are closed.

Search this site

  • Google

Blog powered by Typepad