Right now I'm handling most of the wealth papers submitted to Review of Economics of the Household.
Wealth data is, almost invariably, messy.
The distribution of wealth has a long, thick, right hand tail - a good number people have wealth holdings in the million dollar range (most owners of mortgage-free single detached homes in Vancouver), and a non-trivial number have wealth holdings in the multi-million dollar range.
The standard solution to the problem of skewed data - the solution most commonly used for wage or earnings regressions - is to take a log transformation. That brings all of the extreme values closer to the middle, so they don't have such a large effect on the results. (Other solutions are to drop Oprah-like outliers, or run quantile regressions, that look at each part of the distribution separately.)
Unfortunately log transformations don't work well for wealth data, because a substantial fraction of the population has no wealth at all, and ln(0) is undefined. Sure, it would be possible to drop people with no wealth, but that's not a satisfying solution, because it involves throwing away information, and ignoring a significant segment of the population. Another solution is to assume that everybody has some wealth, even if it is only a quarter down the back of the sofa, and 75 cents in a jacket pocket, and recode all the zeros to ones. But that feels like cheating.
Happily, there's an easy solution to this problem: the inverse hyperbolic sine transformation. It sounds intimidating and impressive; it isn't.
The inverse hyperbolic sine transformation is defined as:
Except for very small values of y, the inverse sine is approximately equal to log(2yi) or log(2)+log(yi), and so it can be interpreted in exactly the same way as a standard logarithmic dependent variable. For example, if the coefficient on "urban" is 0.1, that tells us that urbanites have approximately 10 percent higher wealth than non-urban people.
But unlike a log variable, the inverse hyperbolic sine is defined at zero.
So why don't people use it? Why did I find myself this morning, once again, writing a revise-and-resubmit letter along the lines of "and re-do the estimation using a inverse hyperbolic sine transofrmation."
It's not that the inverse hyperbolic sine is fancy and new - John Burbidge, Lonnie Magee and Les Robb wrote a nice paper on it back in 1988, and that paper cites a 1949 piece by Johnson.
I think it's just a matter of ignorance. Most of the time, a log transformation will do the job, so that's what most people are familiar with. Plus now there are newer and sexier alternatives to the IHS, like quantile regression.
And I have no problem with the new and sexy alternatives.
But, please, if you're thinking about writing a paper using wealth data, and feel an uncontrollable urge to use nominal values of wealth as your dependent variable, or find yourself longing to just drop the zeros and log everything, just stop, pause, and remember - there is a better way.
Update: if you're interested in learning more about the IHS, here are some references -
MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, vol. 31(2), pages 315-39, May.
Pence, Karen M. 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," Contributions to Economic Analysis & Policy: Vol. 5: Iss. 1, Article 20.
Available at: http://www.bepress.com/bejeap/contributions/vol5/iss1/art20