People make elementary errors when they run a regression for the first time. They inadvertently drop large numbers of observations by including a variable, such as spouse's hours of work, which is missing for over half their sample. They include every single observation in their data set, even when it makes no sense to do so. For example, individuals who are below the legal driving age might be included in a regression that is trying to predict who talks on the cell phone while driving. People create specification bias by failing to control for variables which are almost certainly going to matter in their analysis, like the presence of children or marital status.

But it is rare that I will have someone come to my office hours and ask "have I chosen my sample appropriately?" Instead, year after year, students are obsessed about learning how to use probit or logit models, as if their computer would explode, or the god of econometrics would smite them down, if they were to try to explain a 0-1 dependent variable by running an ordinary least squares regression.

I try to explain "look, it doesn't matter. It doesn't make much difference to your results. It's hard to come up with an intuitive interpretation of what logit and probit coefficients mean, and it's a hassle to calculate the marginal effects. You can run logit or probit if you want, but run a linear probability model as well, so I can tell whether or not anything weird is going on with the regression."

But they just don't believe me.

## Recent Comments