Right now I'm handling most of the wealth papers submitted to Review of Economics of the Household.
Wealth data is, almost invariably, messy.
The distribution of wealth has a long, thick, right hand tail - a good number people have wealth holdings in the million dollar range (most owners of mortgage-free single detached homes in Vancouver), and a non-trivial number have wealth holdings in the multi-million dollar range.
The standard solution to the problem of skewed data - the solution most commonly used for wage or earnings regressions - is to take a log transformation. That brings all of the extreme values closer to the middle, so they don't have such a large effect on the results. (Other solutions are to drop Oprah-like outliers, or run quantile regressions, that look at each part of the distribution separately.)
Unfortunately log transformations don't work well for wealth data, because a substantial fraction of the population has no wealth at all, and ln(0) is undefined. Sure, it would be possible to drop people with no wealth, but that's not a satisfying solution, because it involves throwing away information, and ignoring a significant segment of the population. Another solution is to assume that everybody has some wealth, even if it is only a quarter down the back of the sofa, and 75 cents in a jacket pocket, and recode all the zeros to ones. But that feels like cheating.
Happily, there's an easy solution to this problem: the inverse hyperbolic sine transformation. It sounds intimidating and impressive; it isn't.
The inverse hyperbolic sine transformation is defined as:
log(yi+(yi2+1)1/2)
Except for very small values of y, the inverse sine is approximately equal to log(2yi) or log(2)+log(yi), and so it can be interpreted in exactly the same way as a standard logarithmic dependent variable. For example, if the coefficient on "urban" is 0.1, that tells us that urbanites have approximately 10 percent higher wealth than non-urban people.
But unlike a log variable, the inverse hyperbolic sine is defined at zero.
So why don't people use it? Why did I find myself this morning, once again, writing a revise-and-resubmit letter along the lines of "and re-do the estimation using a inverse hyperbolic sine transofrmation."
It's not that the inverse hyperbolic sine is fancy and new - John Burbidge, Lonnie Magee and Les Robb wrote a nice paper on it back in 1988, and that paper cites a 1949 piece by Johnson.
I think it's just a matter of ignorance. Most of the time, a log transformation will do the job, so that's what most people are familiar with. Plus now there are newer and sexier alternatives to the IHS, like quantile regression.
And I have no problem with the new and sexy alternatives.
But, please, if you're thinking about writing a paper using wealth data, and feel an uncontrollable urge to use nominal values of wealth as your dependent variable, or find yourself longing to just drop the zeros and log everything, just stop, pause, and remember - there is a better way.
Update: if you're interested in learning more about the IHS, here are some references -
MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review, vol. 31(2), pages 315-39, May.
Pence, Karen M. 2006. "The Role of Wealth Transformations: An Application to Estimating the Effect of Tax Incentives on Saving," Contributions to Economic Analysis & Policy: Vol. 5: Iss. 1, Article 20.
Available at: http://www.bepress.com/bejeap/contributions/vol5/iss1/art20
Yes, but couldn't we just substitute j for i? Much better, thanks.
This nitpick was brought to you by my engineering undergraduate education. ;)
Posted by: Determinant | July 05, 2011 at 01:33 PM
Determinant, I could see you finding fault with my failure to define yi, just assuming the reader can somehow guess that yi represents the observed value of the dependent variable, y, for the ith individual.
But what's the difference between yi and yj, i.e. the value of the dependent variable for the ith individual or the jth individual?
yi is faster to type, and it's easy to remember i=individual.
Posted by: Frances Woolley | July 05, 2011 at 01:56 PM
I would be much more concerned with negative wealth than zero wealth when using a log transform. After all, log(assets + $1) is not really addding a lot of misclassification relative to normal measurement error (as there are day to day changes in estimates of wealth larger than this due to market changes and consumption patterns). But what if debt > assets?
Or is wealth defined only as assets in economics?
Income, on the other hand, is rarely negative and it is the variable that I use far more often. Of course, the challenge for epidemiology is to get investigators to measure it as a continuous variable . . . Income > $100,000 per year is a profoudingly irritating category to include in an analysis unless this just happens to be how you'd like like bucket wealth into a series of indicator variables.
Posted by: Joseph | July 05, 2011 at 03:26 PM
I must confess that I have never seen this transformation before. Cool.
Posted by: Stephen Gordon | July 05, 2011 at 03:34 PM
Joseph brings up a really good point. After an asset/credit bubble bursts negative wealth is probably common and should be handled.
On the other hand, if you're talking only assets then nobody should be at zero since we're all endowed with a minumum level potential labour.
Posted by: Adam P | July 05, 2011 at 03:51 PM
Trying to think through the underlying implicit theory.
It's not obvious to me that Oprah should be excluded, or downweighted. If we are really sure, for example, that the "true model" has wealth as a *linear* function of X, Then one observation on someone like Oprah tells you a lot more about the slope than one observation near the middle of the distribution. I expect our intuitive sense that we ought not give that much weight to Oprah is our intuition's way of telling us that we aren't really sure that the true relationship is linear.
And the fact that some people have zero wealth is telling us that it is mathematically impossible that the true relationship is Log(Yi) = BXi.
So, ultimately, what we want is some way of estimating an approximate relationship which reflects our uncertainty about the functional form, and is reasonably robust?
Alternatively, isn't there some version of the probit/tobit/whatever thingy that could be used here? Truncated distributions or something? Like, when, given Xi, Yi "wants" to go negative, but can't? Does anybody understand what I'm trying to say here?
Posted by: Nick Rowe | July 05, 2011 at 03:52 PM
Joseph: "wealth" can be defined many ways in economics. "Net wealth" is assets minus liabilities. "Present Value of future income minus future liabilities" would be a broader definition of wealth. From the context, the papers Frances is talking about must be defining wealth as assets.
Posted by: Nick Rowe | July 05, 2011 at 03:57 PM
Actually, I'd probably be inclined to follow Nick's line of reasoning. If your econometrics requires you to throw away data, you're doing it wrong. The model should be re-specified so that it conforms with the available data.
Posted by: Stephen Gordon | July 05, 2011 at 04:11 PM
Joseph: "I would be much more concerned with negative wealth than zero wealth when using a log transform."
The IHS is well-defined for negative wealth values, but if you stick any negative value into the IHS transformation it will spit out something pretty close to zero. Which is not necessarily helpful; you're losing lots of variation, and Nick's point (does this transformation capture the underlying economic reality) becomes important.
So that does argue for using some kind of quantile-type regression on the nominal values. But IHS is, at least, an improvement over things like 'drop the zeros and take the log.'
Nick, yes, you could take a log and do a tobit, but again you're throwing away information. It's not that the zero values are unobserved, they're zero, you know they're zero. I think that taking a log and doing a tobit would be better than just dropping the zeros, but I once had a co-author who disagreed with me strongly on this point, so perhaps I just don't understand.
Your point, though, about the underlying functional form is a good one.
The problem is that people in the tails, especially when it comes to wealth or income, are in the tails because they are people who are special in some unobservable way. Because they inherited money or have family connections or are just incredibly unbelievably creative/talented/hard working/lucky.
What happens in the regression analysis, however, is that those people's wealth or income gets attributed to their observed characteristics - so Oprah's wealth would be attributed to her being a self-employed single African American woman.
On the other hand, if what is happening in the tails is some kind of unobservable phenomenon, then perhaps we should ignore it, which is what the quantile approach does.
This is a paper by Karen Pence that's fairly recent and talks about the appropriate transformation to use for wealth data.
Posted by: Frances Woolley | July 05, 2011 at 04:19 PM
Frances:
Now you threw me.
I was thinking "i" as in i=sqrt(-1). Electrical Engineers prefer to use j instead where j is simply defined as a 90 degree clockwise phase shift.
Hyperbolic and trigonometric functions turn on i. The math is intimately related. Economists likely don't give much thought to complex numbers.
If you are going to use a series, n is more customary in my world.
Posted by: Determinant | July 05, 2011 at 04:37 PM
I thought the "standard" wealth distribution was a shifted log-logistic distribution. E.g., If you want negative wealth up to 3 standard deviations, you can use a parameter to shift the log-logistic distribution to the left. Given the data, you can estimate what the parameters should be. http://en.wikipedia.org/wiki/Log-logistic_distribution
Re: "wealth", maybe use market wealth and human wealth? They should be different, because next period, your market wealth may grow by (1+r)M, but your human wealth will not grow by the same amount. In fact, your human wealth has an expiration tag in a way that your market wealth does not, and the variations in return on human wealth due to changing economic conditions are not identical to the variations in the short rate.
Posted by: RSJ | July 05, 2011 at 04:39 PM
RSJ - "I thought the "standard" wealth distribution was a shifted log-logistic distribution. E.g., If you want negative wealth up to 3 standard deviations, you can use a parameter to shift the log-logistic distribution to the left."
Not used much in applied labour economics - though that might be for the same reasons that the IHS transformation isn't used a lot - not everyday-bread-and-butter-useful enough to be part of the standard toolkit, not sexy and novel enough to taught in grad econometrics courses.
Posted by: Frances Woolley | July 05, 2011 at 04:46 PM
Determinant: the i subscript for observations is standard notation in statistics as well as econometrics.
Posted by: Stephen Gordon | July 05, 2011 at 04:57 PM
Frances,
Perhaps a good way to think about this is to just step back and see what is going on with the IHS (or logistic) or linear distribution. The reason why the probability is close to zero at zero is because that is the end of the support. If you don't have this, then you have a uniform distribution, or something with a jump discontinuity at the beginning of the support. Unless there is a reason for this, then you don't want such discontinuities.
You don't want to fight this continuity at the start of the support -- you want to use it for your advantage, because the slope of the distribution at the start of the support is valuable information, and similarly the rate at which people's wealth increases as you go from the smallest (negative) level of wealth is also important, and can be used to calibrate your distribution.
If you actually have people with negative wealth, you will want to shift the distribution so that the start of the support of the distribution is the start of the support of your sample. Then, instead of A = BX, you have A = BX - C.
Posted by: RSJ | July 05, 2011 at 07:48 PM
How meaningful is the concept of negative wealth in an environment with limited recourse loans or, more generally, easy access to bankruptcy. In that scenario, there isn't a world of difference between the fellow with no assets and the fellow with negative net wealth.
Interesting discussion, Frances.
Posted by: Bob Smith | July 05, 2011 at 07:57 PM
Why not just use a generalized linear model rather than transforming? This is increasingly the standard approach in my own field of ecology, where we often have to deal with non-Gaussian data, including highly-skewed and heavy-tailed data.
Posted by: Jeremy Fox | July 05, 2011 at 08:09 PM
@Nick and Frances: Thank you for the clarifications.
@Bob: Some forms of debt can't be discharged easily by bankruptcy (consider US student loans) or a US resident with income above the median. But I can see the argument for treating these people as being close to zero in wealth.
The real issues seem to be the problems with defining the underlying function form of wealth for the model in question and the issue of unobserved confounders. In epidemiology we have some really dramatic examples of outliers being different due to unobserved factors resulting in some quite misleading inferences.
Posted by: Joseph | July 05, 2011 at 08:13 PM
Frances:
John Burbidge was one of my thesis advisers and he introduced me both to non-parametric estimation as well as the inverse hyperbolic sine as ways of dealing with data with outliers. I took to the non-parametric estimation but aside from puttering around on my own with the inverse hyperbolic sine I have generally submitted wealth work using the log of wealth transform. That has been the convention and conventions are hard to shake. Moreover, the wealth data I have used has very few zero observations. Nevertheless, I feel inspired to again try the inverse hyperbolic sine. Thank you.
Posted by: Livio Di Matteo | July 05, 2011 at 09:16 PM
I'd agree with Jeremy on this one... some form of generalized linear model, such as a gamma distribution could work for the data skew. I would say, however, that it might be worth it to model the zeros separately from the rest of the data; if there's some process that you'd predict would lead to "negative" wealth on average, but how you measure wealth means zero is the lowest possible value, it'll introduce bias no matter how you transform zeros. I'd turn your data into ones (non-zeros) and zeros, and run a logistic (or probit or robit, whatever your preference is) on that, then run the GLM on the strictly positive numbers. This method gets recommended a lot by ecologists when you have data with both zeros and continuous response data.
[ps: it looks like you're becoming the economics blog of choice among ecologists! A sample size of 2 is enough to prove that, right?]
Posted by: Eric Pedersen | July 05, 2011 at 09:53 PM
Jeremy, Eric: I confess I had to Google "generalized linear model" (I'm not an econometrician). Found the Wikipedia entry: http://en.wikipedia.org/wiki/Generalized_linear_model
Not sure I properly understand it. But seems to be a way of dealing with heteroskedasticity? Sounds like it should handle the Oprah problem? We don't want to throw away information on Oprah, but we do want to recognise that there is probably some important omitted variable that explains Oprah. And the effects of that omitted variable get included in the error term, but it does mean that the error term could have a vary big variance at the Oprah end of the spectrum. So the Oprak observation gets weighted lower than in regular LS.
There are very probably more ideas we could steal/learn from the ecologists. Lots of similarities to economics. I can only understand barely half of what they are talking about though. http://oikosjournal.wordpress.com/
Posted by: Nick Rowe | July 05, 2011 at 10:26 PM
Generalized linear models (GLMs) are an extension of the standard linear model, where the response variable is allowed to follow some distribution other than normal (logistic regression, for instance, is a form of GLM where the data is assumed to follow a binomial distribution). In some cases, it can be used for heteroskedastic data, but its not its only purpose.
I agree there's a lot of ecology/economics similarities; I did my undergrad degree in economics and biology, and my econ courses gave me a lot of insight into how to understand ecological processes. When you get right down to it, they're both disciplines trying to understand what actions actors take with limited resources and information.
Posted by: Eric Pedersen | July 05, 2011 at 11:14 PM
And I always thought that ecologists were fluffy bunny types ;-)
Eric: "When you get right down to it, they're both disciplines trying to understand what actions actors take with limited resources and information."
Agreed. But more importantly, I think, we both try to model the *interaction* between those actors, in some sort of "equilibrium".
Posted by: Nick Rowe | July 05, 2011 at 11:27 PM
Oh, a lot of us are fluffy bunny types, we just like to back it up with quantitative data analysis and dynamic system theory. :)
I'd definitely agree about modeling interactions[1], but I'd quibble about equilibrium; there's a lot of ecological theory about what can cause stable predator prey cycles, for instance. Even if we expand equilibrium to mean "on some sort of attractor", there's a growing body of work that shows that transient dynamics can last a long time, and can play a large role in ecological problems, such as conservation.
[1] With the caveat that sometimes the interacting actor can be part of the non-living environment
Posted by: Eric Pedersen | July 06, 2011 at 12:01 AM
The hyperbolic sine transformation seems to me to introduce more problems than it solves. It complicates interpretation of the coefficients, you have to (well, you should) deal with the retransformation problem, and it seems very arbitrary, a functional form chosen purely for convenience. An alternate approach which has become common when dealing with data with lots of zeros and skewness is to estimate a finite mixture model, which deals in an appealing manner with the properties of the dependent variable at the cost of having to estimate a more complicated model.
Posted by: Chris Auld | July 06, 2011 at 01:00 AM
Frances, to me it seems like your fundamental "problem" is that Oprah (and others like her) are outliers. Is that correct? If so, I think the proper way to address that question is with deeper questions. What makes Oprah and outlier and why is that significant in this situation?
On a somewhat less opaque note, how much of the problem here stems from a default assumption of normality (or at least moments existing)? Is that a piece of this at all? Of so, perhaps the problem here indicates that the statistical framework being applied is not really helpful/suitable.
I could easily be totally off base with both questions.
Posted by: Blikktheterrible | July 06, 2011 at 04:00 AM
Neat!
Why is this preferable to log(1+y_i) though? It seems to me that because of the +1, neither is truly scale invariant, and log(1+y_i) approaches log for large values, as well as being simpler and more obvious in what it's doing, without needing to plot and ponder its limit.
Are there any good articles that consider the alternatives and how well they work?
Posted by: Joe | July 06, 2011 at 04:42 AM
Joe - "Why is this preferable to log(1+y_i)" The inverse hyperbolic sine can handle values less than zero also. On the various alternatives - there's that 2006 article by Karen Pence on the IHS transformation but, sadly, I don't know of many "cookbook econometrics" articles. It would be a good topic for a blog post, though.
Chris - I don't think the interpretation of the coefficients or the arbitrariness of the transformation is really an issue. An IHS transformation is no less arbitrary, and no harder to interpret, than a log transformation, and people do log transformations all the time. It's just a less familiar procedure than logging the dependent variable, which is why it seems strange.
On finite mixture models - do you think these are appropriate when the zeros are, in some sense, similar to the ones, i.e. drawn from the same population?
And as a practical matter: you have a paper before you where the authors have done a regression using the nominal value of wealth as the dependent variable. You know that, as likely as not, their interesting and unusual results would disappear if they dampened down the extreme values by logging the dependent variables. The authors don't appear to be particularly strong econometricians. What do you do - reject, even the authors have interesting ideas or data? Ask them to use some econometric techniques that they don't particularly understand/might not be able to do? Give them appropriate references and tell them to learn how to do finite mixture models? As a practical matter, it's hard to beat the robustness of ordinary least squares regression - it works pretty well, most of the time, and alternatives often add a lot of complexity without explaining the data much better.
Blikktheterrible, I don't think you're off base, but I don't know if I have answers to your questions. There are two ways that people get really large quantities of wealth. The first is when a whole series of advantages compound: people who have rich parents *and* are in the right part of the country *and* are male *and* get married/stay married *and* have an education *and*... can start accumulating really large quantities of wealth. The second way of getting a really large quantity of wealth is by having some attribute that wouldn't be measured in standard data set like the Survey of Financial Security, e.g., the ability to stop 99% of the shots on goal during the stanley cup finals. The two types of extreme values call for different solutions. The first suggests that the underlying model should capture the interaction between various explanatory variables; the second suggests that the model needs an error structure that's flexible enough to deal with super-rich stanley cup goalies - and, yes, that might mean dropping the assumption of normality.
Posted by: Frances Woolley | July 06, 2011 at 07:56 AM
Eric - oddly enough, I'm a bit of a frustrated ecologist myself. Arguably a lot of economics works better in explaining ecosystems than economic systems.
"Generalized linear models (GLMs) are an extension of the standard linear model, where the response variable is allowed to follow some distribution other than normal (logistic regression, for instance, is a form of GLM where the data is assumed to follow a binomial distribution). In some cases, it can be used for heteroskedastic data, but its not its only purpose."
True, but one still needs to find a functional form that will handle lots of zeros and skewedness. Suggestions?
Posted by: Frances Woolley | July 06, 2011 at 08:04 AM
I'm skeptical about the idea of zero or negative wealth. If you have more debt than assets (where assets include human capital) then the debt just isn't worth its nominal value. The most relevant definition of wealth is the expected present value of your future ability to consume. (This is like the value of a stock being always positive, regardless of the fact that the nominal amount of debt may be far greater than the value of assets). Even a slave whose human capital is owned by someone else can be thought of as "owning" his future consumption (food, shelter). So I agree with the commenters who propose to add some additional wealth to make the zeros/negatives go away. In Canada, for example, you can't do much worse than a life on welfare with full healthcare and old age security. That must be worth at least a couple hundred thousand dollars on average.
And once we stick to positive wealth, I don't see much of an argument against log wealth, also a good proxy for utility.
Posted by: K | July 06, 2011 at 08:33 AM
Well, the skewedness aspect can be handled by a gamma distribution with a suitably large shape parameter (which can also be estimated). There's also a distribution called the Tweedie that's essentially a mixture of a gamma distribution with positive mass on zero. I've never tried it, though I know a GLM based on it has been implemented in R.
My personal preference in these cases though, as I said above, is two - stage modeling, with the zeros modeled with a logistic regression, and the positives with a GLM (alternatively, just log-transform the positives and run a linear regression on them). This technique is a kind of fast-and-loose mixture model, and the coefficients are pretty easy to understand; for every set of predictor variables, you can say: "This is the predicted probability of observing zero wealth, and conditional on positive wealth, this is the expected wealth distribution".
Posted by: Eric Pedersen | July 06, 2011 at 08:53 AM
Eric: "My personal preference in these cases though, as I said above, is two - stage modeling, with the zeros modeled with a logistic regression, and the positives with a GLM (alternatively, just log-transform the positives and run a linear regression on them)."
A person, throughout their life, goes through a fairly predictable wealth pattern - starting off with no financial wealth, and gradually accumulating it. People without wealth aren't fundamentally different from people with wealth. (Here wealth accumulation models are different from, say, models of time spent gardening; we can't assume that the no gardening people would suddenly become gardening people if their observed characteristics changed. But we can safely assume that a student with zero wealth might become a person with positive wealth if their age and employment status changed).
If you were to do an IHS transformation on the data and then estimate a tobit with a zero lower bound you could estimate in one model three marginal effects: the change in the probability of having non-zero assets, the change in expected wealth conditional upon having non-zero assets, and the combined effect - the change in expected wealth associated with, say, higher education, taking into account both the effect of education on the probability of having any assets, and the effect of education on the amount of assets held, given the individual has positive assets. (My co-author, Marcel Voia, calculates these three types of marginal effects in our paper on hotness - if necessary I could dig through my files and find the STATA code.)
Wouldn't that be nicer than trying to look at two sets of results and mentally try to combine them?
K "So I agree with the commenters who propose to add some additional wealth to make the zeros/negatives go away." In an academic paper, if you use an inverse hyperbolic sine transformation, people will say "that's a nice little transformation." If you add $1 to make the zeros go away, people will say "ad hoc, unscientific, unrigorous, reject." Although in fact, unless your y values are in the <10 range or large and negative, log (y_i + (y_i^2+1)^(1/2)) is indistinguishable from log(y_i+1).
The fact that one is acceptable and not the other might tell you as much about academic economics as anything else.
Posted by: Frances Woolley | July 06, 2011 at 09:14 AM
Frances: "If you add $1 to make the zeros go away, people will say 'ad hoc, unscientific, unrigorous, reject.'"
I don't want to add $1 to make the zeros go away; I want to add a couple of hundred thousand to the low end of the distribution to make the wrongness go away.
I'm sure you are right about the preference for fancy ad hoc mathematical transforms over simple ad hoc mathematical transforms. But what about well founded and properly justified transforms? Real wealth is more like an option (call) payoff: There's a fixed baseline threshold from government programs/charity etc. Wealth is expected value of the greater of the baseline and personal human capital. You could model it (e.g. Black-Scholes), but the principal effect would be to put a floor on wealth at a couple hundred thousand dollars (declining with age). I can't imagine that a careful, valid estimation of real wealth would be considered ad hoc or unpublishable. If someone is using zero or negative values, I wonder if they have given real thought to what it is they are measuring. If your wealth is literally zero, it means to me that you are going to be dead within a few days.
Posted by: K | July 06, 2011 at 09:45 AM
K - "There's a fixed baseline threshold from government programs/charity etc." Yes, and part of the messiness of wealth data is that it's really hard to measure some of the most important kinds of wealth, e.g. entitlements to government programs.
A slightly different, but related issue is that government programs create incentives to have lots of wealth or none at all. If your only retirement savings (apart from government programs) are $50,000 in an RRSP then all that those savings do for you is decrease the amount of Guaranteed Income Supplement that you receive from the government. Never worth it. Unless you can accumulate serious amounts of cash you're better off with none.
But this is an argument for using some kind of mixture model, or modelling the decision to hold wealth separately from the decision about how much wealth to have.
Posted by: Frances Woolley | July 06, 2011 at 10:51 AM
"There are very probably more ideas we could steal/learn from the ecologists. Lots of similarities to economics."
No kidding. And the end of the day, they both involve the study of complex systems with a virtually infinite number of variable interacting with one another without (generally) the ability to engage in experimental research. By comparison physics or chemistry (the traditional "hard" sciences) are easy (which comment, no doubt, will spur a flurry of invective from physicists asking me to explain string theory).
In undergrad, economics students had to take the same lighweight calculus course as the biology majors (as opposed to "hard core" calculus required for physics or engineering students). Thank god!
Posted by: Bob Smith | July 06, 2011 at 11:20 AM
Frances: "But this is an argument for using some kind of mixture model, or modelling the decision to hold wealth separately from the decision about how much wealth to have."
I don't see it that way. By wealth I don't mean financial assets. I mean expected value of future available consumption. So you can't choose to have more wealth, since wealth already assumes optimizing behaviour (i.e. you *never* choose to have less: if it's less optimal to hold $50000 of RRSP then why would you do it? - just buy 10 years worth of canned food or something). That's why I discussed wealth as the value of an option. You have choices and the value of your wealth includes your strategy for optimizing the expected outcome of those choices.
But however difficult it may be to carry out such a calculation (and I don't think a first order stab at it would be that tough), my main point is that *however* you do it, non-positive values of wealth don't make sense. Even in the most destitute parts of the world, expected future consumption is positive and adds up to tens of thousands of dollars per capita.
Posted by: K | July 06, 2011 at 11:42 AM
K - I don't think my response was very clear - the mixture model addresses the issue of a bimodal wealth distribution created by incentives in government programs, not the issue you're thinking about - i.e. we all have wealth (e.g. kidneys and eggs fetch a pretty good price).
Posted by: Frances Woolley | July 06, 2011 at 12:18 PM
John Burbidge sent me a message via email suggesting that responses to some of the comments raised in the post (e.g. Chris Auld's) can be found in:
MacKinnon, James G & Magee, Lonnie, 1990. "Transforming the Dependent Variable in Regression Models," International Economic Review,
Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol.
31(2), pages 315-39, May. [Downloadable!] (restricted)
Posted by: Frances Woolley | July 06, 2011 at 12:20 PM
Hi Frances: I don't agree that all transformations are equally arbitrary. I can state in plain English what I'm assuming if I log the dependent variable, but I cannot if I use the IHS. As you say when you praise OLS, there is value in simplicity.
IHS is more flexible than log(1+y), and allows the data to partially determine functional form, so I think there's good reason to more skeptical of log(1+y) than of IHS.
I don't understand your comment that the IHS doesn't make interpretation of the coefficients more difficult. The coefficients when the dependent variable is in levels or in logs are readily interpretable, but the coefficients under the IHS transformation are not: I learn dH(w)/dx, yet I want to know dw/dx, and I have to work more. The paper you cite, for example, simply reports the coefficients from the model in levels, but then reports the IHS results by numerically evaluating derivatives at several wealth levels (and I'd have to read the paper and the background papers more carefully, but my spidey sense tells me the paper's method doesn't deal with the retransformation problem correctly). Standard errors also have to be bootstrapped. To be clear, I'm not suggesting these disadvantages necessarily overwhelm the benefits, but they are in my view disadvantages.
Finite mixtures are definitely appropriate when zeros are not driven by some other process. If you had two components, for example, you'd get something that could be interpreted as "low wealth type" and "high wealth type," and zeros are not special. You can also use both finite mixtures and transformations of the dependent variable, for example, you could use mixtures of gamma models for wealth.
If I were refereeing a paper in which the authors just ran OLS on levels of wealth, and I didn't like that, and I thought the authors could not do something fancy that I did like, I suppose I'd reject the paper. That said, I'm not comfortable with thinking about transformations as a method to deal with outliers. If outliers were my major concern with the paper, I'd ask the authors to show me how robust the results are when highly influential observations are dropped, or if some fairly simple canned routine that is less sensitive to outliers like, as you say, median regression, is used. And I'd ask the authors to tell me where they think the outliers came from.
Tobit=evil, and two-part models fail to deal properly with selection.
Posted by: Chris Auld | July 06, 2011 at 12:39 PM
Frances: I don't distinguish between consumption from government programmes and other forms of wealth (e.g. my RRSP, my house, my left kidney). My future consumption of Medicare is every bit as valuable and totally fungible with the amount of money required to purchase an equal insurance policy from a private provider. It's all just plain and simple wealth. First thing you have to do is add it all up. Then decide how you transform it. And if it's positive (which it is), IHS is not appropriate.
And what is this definition of wealth that can be zero or negative? If it doesn't measure expected consumption, what use is it?
Posted by: K | July 06, 2011 at 12:57 PM
K - "And if it's positive (which it is), IHS is not appropriate. "
Whether a transformation is appropriate or not depends upon the underlying structure of the model.
Using a linear model with no transformation of the y variable will generally not give a very good fit for wealth data. E.g. if you're looking at the differences in wealth between someone who is married and someone who is living common-law, you're more likely to find that, say, married people have, all else being equal, 10% higher wealth, than that married people have, all else being equal, $15,000 higher wealth.
Once one has accepted that it is a good idea to transform dependent variables (and of course a number of the commentators here completely reject the idea of transforming dependent variables, arguing instead for general linear models or other solutions) I don't see how one can say IHS is a worse transformation than a log transformation.
I can see your point that imputing the value of, say, a kidney and adding it to everyone's wealth might be a good idea. This is similar to RSJ's idea of shifting the whole distribution up. I am thinking of doing a follow-up cookbook econometrics post and looking into it.
I think one worry with the line of reasoning you suggest is the possibility of adding measurement error. E.g. your kidney is probably more valuable than mine, given that I spent time in England during the BSE crisis, and Determinant might have higher expected value from health care entitlements than you do, given some health issues that he's mentioned on the blog before. And a person's expected value of future Canada Pension Plan benefits depends upon a whole load of things, e.g. gender, health status, marital status, previous marital history, spouse's labour force participation history, etc.
Posted by: Frances Woolley | July 06, 2011 at 01:38 PM
Frances, suppose I run a wealth regression and transform wealth using logs. My statistical package spews out some OLS estimates, and I see the coefficient on "married" is 0.154. The robust t-ratio is 3.4. I conclude that, all else equal, being married is associated with (e(0.154)-1)~=16.6% higher wealth, and that that effect is statistically significant. Then I go lie in the sun and read a novel.
I want to estimate the same model but using IHS. I can't just run OLS, as I need to estimate the parameter in the IHS function. I have to either concentrate the likelihood with respect to that parameter, then use OLS, and accept that my second-stage standard errors are inconsistent, or do full-blown maximum likelihood imposing distributional assumptions. After I actually have the estimated parameter vector, I cannot interpret the parameter on "married" in an economically meaningful fashion: the coefficient tells me how H(w) varies with marital status, but I don't care about that. When the error is heteroskedastic I can't just invert H(w) and learn the effect in levels; I probably need to do some sort of numerical simulation to recover an average partial effect on levels or percentage changes (having looked again, I am now willing to state the cited paper by Pence gets this wrong, the argument on page 6 implicitly assumes the errors are homoskedastic). And finally to do valid inference I need to resample for my covariance matrix estimate. No lying in the sun reading a novel for me now.
I'm sorry to partially repeat myself here, but there really are very good reasons to avoid a transformation such as IHS. I think perhaps you're focusing on the fact that economic theory usually doesn't give us much guide as to whether we should prefer w, ln(w), or H(w), but that's not the only issue. We would need to think the benefits of using IHS trump all the hassles in the preceding paragraph, and the example of the Pence paper shows there are traps we need to avoid, too. I would offer that these reasons and similar pragmatic issues explain why the ratio of papers using levels or logs to those using IHS or Box-Cox or other nastier transformations is roughly a zillion to one.
Posted by: Chris Auld | July 06, 2011 at 05:10 PM
I haven't heard of the Tobit model before, and it definitely looks interesting for dealing with just this sort of truncation problem (I wish I had known about it a few months ago!); I'd be just as happy to see someone using that as a two-stage or mixture model.
Also, I'm definitely not someone who'd completely reject transformation (that'd make me extraordinarily hypocritical)! I like the log-transform for cases where variables are likely to act in a multiplicative way on the dependent (like with your married/unmarried example) and errors are log-normally distributed. I'd just say that it's important to try and determine what your distribution of errors might be, and test afterwords to see if its a reasonable approximation. I still don't think I'd end up using the inverse hyperbolic sine though... it seems a bit overly complex, considering how little difference between it and log(2x+1) and it would take a lot of extra time to explain to ecology reviewers.
Posted by: Eric Pedersen | July 06, 2011 at 05:29 PM
Lets say you have a positive random variable with a complex distribution that you don't really know. Is that licence to estimate it to be zero? I'd bet that there are almost no Canadians under the age of 75 for whom the expected value of healthcare consumption is less than $50K. If you add half that much to the wealth distribution I don't see how it is conceivable that you could be increasing rather than decreasing the potential for error. Of all the straightforward estimates of the value of someone's expected healthcare consumption, zero seems like about the worst, and the mean seems like a pretty good idea (at least it has the right *mean*).
As for IHS, its shape is determined by the unit of account. If we lopped a few zeros off the currency it would change everything. And why would you use a transformation on the real line if your variable is constrained to the positive half? And as you point out, it's no different from log for positive values that are big enough to make any difference. So the real question remains, what is that useful definition of wealth that can be zero or negative? Economists have employed log wealth utility for ages, exactly because wealth is positive, and zero wealth is a very horrible condition (i.e. the end of consumption), so horrible in fact that it cannot be counterbalanced by any probability of any finite amount of wealth.
Posted by: K | July 06, 2011 at 05:47 PM
Eric: The Tobit model has been steadily falling out of favour in applied econometrics for, oh, the last two decades or so. Summing up the many reasons people don't like it: it imposes extreme assumptions, and it's notoriously fragile to all of them. For example, the estimates are inconsistent if the errors are not normal and/or not homoskedastic, in stark contrast to OLS. Also note that if you're interested in the effect of some variable on the conditional mean of the dependent variable, OLS is the right estimator even if the dependent variable is censored.
If the sample is reasonably large, it isn't important to worry about the distribution of the residuals, as you can invoke a central limit theorem and make valid inferences under pretty minimal assumptions on the distribution of the errors. Trying to transform things so the residuals look more normal is only worthwhile in tiny samples, and even then there are better approaches.
Posted by: Chris Auld | July 06, 2011 at 06:24 PM
Chris - right now I'm just keeping quiet hoping that someone else will leap to the defence of the IHS.
But for my problem - i.e. people doing OLS on levels, when the underlying model is non-linear - what do you propose? You said earlier that you'd recommend a canned median regression routine, and that's helpful advice when the problem is simply that there's lots of outliers, but when there's underlying non-linearity?
Or would you respond that underlying non-linearity and lots of zeroes just doesn't happen very often, so there's no need to worry about it?
Frances
Posted by: Frances Woolley | July 06, 2011 at 07:26 PM
Hi Frances - I confess I haven't been following the recent literature on wealth, but exactly the same issues arise in analysis of health care expenditures, perhaps even more severely. Health care expenditures are a big deal in the U.S. and there's lots of papers on dealing with a high proportion of zeros and extreme skewness in the non-zero observations in these data. Not precisely my area, but when I manage to stay awake when someone is giving a methodological paper on health expenditures I gather that finite mixtures are favored these days, followed by double hurdle or other selection-type models. GLM approaches are also quite common, which is uncommon in econometrics.
Posted by: Chris Auld | July 06, 2011 at 08:50 PM
Chris, thanks, Frances
Posted by: Frances Woolley | July 06, 2011 at 11:07 PM
Chris: I'd quite happily agree that normality and heteroskedasticity are not the real problems with OLS most of the time; your estimates will be approximately right, and unbiased, as long as the actual relationship is linear... however, I'd say if you're interested in inference about values near zero, no amount of data'll save you from the fact that "my model is predicting negative ten-thousand dollars of wealth for these people, when they're actually at zero."
Frances: What about generalized additive models for non-linearity? They can handle all the same distributions as GLMs , but they allow for non-linear/non-parametric relationships. You lose a little bit compared to OLS or GLM since you can't summarize the model with a table of coefficients. However, if the true relationship's non-linear, the simpler models will just give you an easily interpretable wrong answer.
Posted by: Eric Pedersen | July 06, 2011 at 11:11 PM