Stephen Gordon recently posted an excellent analysis of trends in income inequality in Canada and elsewhere. Stephen, like almost all of the other authors cited in his post and the subsequent discussion, measured inequality using the Gini coefficient.

The very next day, I saw a paper by Francesca Greselin arguing that the Gini is inferior to the new Zenga inequality index and should be replaced.

Talk about deja vu all over again. Various limitations of the Gini inequality index have been known for years. Tony Atkinson described some and proposed an alternative to back in 1970; other indices for measuring inequality are the Theil index, and the Hoover index. Greselin and co-authors set out new arguments, and make a convincing case for replacing the Gini. But I don't expect to see the Zenga index in wide use any time soon.

We keep on using the Gini because that's the way people have always done it. But why did people even start using the Gini in the first place?

In statistics, the standard deviation is usually used to measure how spread out or "unequal" a distribution is. If a "unit free" measure is desired - one that does not depend upon whether income is measured in Euros or dollars - the coefficient of variation, that is, the standard deviation divided by the mean, or some similar measure can be used.

So why, when people started measuring income inequality, didn't they just use the the standard deviation or the coefficient of variation? Why did people even begin using the Gini?

Max Lorenz can assume part of the blame.

In 1905, as a 28-year-old PhD student, Lorenz published one of the first English-language investigations into income inequality measurement. (A topic which had nothing to do with his PhD thesis, which was on the Economic Theory of Railroad Rates.)

Lorenz begins by considering and rejecting the numerical methods that were being used by American economists at the time to measure inequality. He also appears to reject the frequency distribution approach to graphing income distributions:

*Turning now to the graphic measures, a simple plotting of wealth along one axis and the numbers of the population **along another is not satisfactory for the reason that changes in the shape of the curve will not show accurately changes in the relationships of individuals.*

Instead, he advocated representing income distribution through what is now known as a Lorenz curve - like this:

The horizontal axis shows the cumulative percentage of income, the vertical the cumulative percentage of wealth. So the diagram shows, for example, that in Prussia in 1901, the poorest 60 percent of the population had 32 percent of the income.

Today Lorenz curves are generally drawn the other way around, like this one (source):

This picture shows how inequality in receipt of health care spending decreases as people get older. At age 65, 80 percent of the population have received almost no medicare spending; almost all spending is on a small proportion of high need patients. Twenty years later, however, there is much less inequality - the cheapest 80 percent receive, over their lifetimes, about half of Medicare spending.

Lorenz did not advocate any numeric measure of inequality. His point was the value of graphic representations. Indeed, there are good reasons to prefer pictures to numerical measures. If Lorenz curves do not cross, as in the Medicare spending diagram above, just about any measure will show inequality decreasing over time, so in some sense it does not matter which measure you use. But what if the Lorenz curves do cross, as in this picture:

Compare, in this example, UK 1913 with Brazil 1980. The bottom 80 percent of the population had a larger income share in 1913 UK than in 1980 Brazil. But the 8th decile did better in 1980 Brazil than in 1913 UK - in 1913 income was more concentrated among the wealthiest of the wealthy. So which distribution is less unequal? Which distribution is, in a sense, better? It depends upon how one weights the interests of different groups in society. With a sufficiently high weight on the upper middle class, the 1980 Brazil distribution could conceivably be preferred to the 1913 UK distribution.

It was many years before people recognized this fact. Instead, they continued searching for the Holy Grail of income inequality, a single numerical measure that perfectly captures the shape of the income distribution.

For example, Warren Persons, in a 1909 Quarterly Journal of Economics article, drawing inspiration from the latest research in biology, concluded, "The coefficient of variability is recommended as the most satisfactory measure of variability."

Around the time that Lorenz and Persons were writing, Italian statististicians and sociologists such as Corrado Gini were independently developing methods of inequality measurement. (Savour for a moment, if you wish, the irony of the most widely used measure of inequality being named after the author of The Scientific Basis of Fascism)

Hugh Dalton, in 1920 brought these two strands of inequality measurement together, introducing English readers to Italian research in a highly influential article. He concluded that both the coefficient of variation and "Professor Gini's mean difference" were satisfactory measures of inequality. But, he argued,

*If a single measure is to be used, the relative mean difference [Gini coefficient] is, perhaps, slightly preferable, owing to the graphical convenience of the Lorenz curve.*

What does that mean? It turns out that the Gini coefficient is equal to two times the area between the Lorenz curve and the line of complete equality. Just look at the Lorenz curve diagrams above - you can see the Gini coefficient.

Simple. Intuitive. *Because people were used to visualizing inequality with Lorenz curves, and not with frequency distributions. *But the Gini coefficient is seriously defective as a measure of inequality, for two reasons.

Reason 1. Suppose society is composed of two groups, reds and greens. One group is more privileged than the other. A reasonable question to ask is: how much of the inequality in society is attributable to differences *between* these two groups, and how much inequality is attributable to differences *within* these groups.

The Gini coefficient cannot answer this question, because it is not easily decomposed into between group and within group inequality. Other measures can be, and that's one reason to use the Theil or another index in preference to the Gini.

Reason 2. Suppose that there is diminishing marginal utility of income - that is, people get more satisfaction from their 100^{th} dollar than their 100,000^{th}. In this case, the best way to get the maximum possible amount of happiness from a given amount of resources is to distribute them as equally as possible. Just like serving a blueberry pie to a group of friends and family. Sure, some might end up with a bit less because they're dieting or a bit more because they're growing. But basically happiness is maximized by sharing the pie, not letting one or two people to hog it all.

In othe words, the reason we care about inequality is that it reduces the happiness achievable from a given amount of income. How much depends upon the happiness/income relationship. Does the marginal utility of income fall rapidly? Or is the happiness from the 100,000th dollar almost as great as the happiness from the 100th?

The Atkinson index is a measure of inequality can be adjusted to take into account society's attitudes towards inequality - placing either more or less emphasis on the extremes of the distribution, depending upon the "inequality aversion parameter" chosen. Because the Atkinson measure makes explicit the welfare judgements underlying inequality measurement, it is preferable to the Gini.

I could go on. The Zenga measure, for example, is preferable to the Gini because it can be estimated with a smaller margin of error.

So why do we use the Gini? I suspect it's the same as the reason why we use QWERTY keyboards. People are familiar with it, it's a lot of effort to change, and it's good enough for most intents and purposes.

Ecologists went through similar debates once upon a time, looking for a single best "diversity index". The starting point is usually data on how many species there are at a site, and how abundant each of them is. How can we summarize the "diversity" represented by those data? We could just count the number of species, but that neglects the fact that a site with one common species and lots of really rare ones doesn't seem very diverse--it's not really all that different from a site with just one species. Which led to lots of arguments about how best to collapse the entire species-abundance distribution into a single number that would summarize its "diversity". Which of course is impossible because you can't capture every feature of most frequency distributions with a single number, so any single-number summary ends up throwing away some information and behaving weirdly in some applications. As in economics, the most popular indices retain their popularity because they have straightforward interpretations, and because they are traditional.

Posted by: Jeremy Fox | September 22, 2011 at 06:02 PM

Jeremy - interesting. Once again, ecology and economy have so much in common.

- what are the popular indices in biology?

- I think there's a difference between *seeming* straightforward and *being* straightforward. E.g. the Gini. It's more sensitive to changes in some parts of the distribution than others, but I always have a really hard time remembering which parts, or working it out from the picture. Put another way, a straightforward graphic interpretation, i.e. the difference between the line of equality and the Lorenz curve, doesn't imply a straightforward real world interpretation.

Posted by: Frances Woolley | September 22, 2011 at 07:38 PM

Should point out that I used the Gini because it was the only statistic readily available for the purposes of my post. Which of course begs Frances' question.

Posted by: Stephen Gordon | September 22, 2011 at 08:21 PM

Stephen - supply creates its own demand?

Posted by: Frances Woolley | September 22, 2011 at 08:29 PM

Yep. Everyone looks at the Gini because everyone looks at the Gini.

Posted by: Stephen Gordon | September 22, 2011 at 08:34 PM

hi Frances,

In biology the fallback ones are, shannon, simpson.

> wiki, in depth

Posted by: edeast | September 22, 2011 at 09:46 PM

Correct use of 'begging the question' on a blog?! What next? Superluminal neutrinos?

Posted by: Patrick | September 22, 2011 at 10:22 PM

edeast - "Related to diversity indices are many income inequality indices, such as the Gini index and the Theil index. Generally these measure a lack of diversity, but the only difference with the measures mentioned above is a minus sign.

The Theil index in particular is the maximum possible diversity log(N) minus Shannon's diversity index. It is the maximum possible entropy of the data minus the observed entropy. The Theil index is called redundancy in information theory."

So the Theil index of inequality is essentially the same as the Shannon index of diversity - so presumably the Shannon has those nice decomposibility properties of the Theil?

Posted by: Frances Woolley | September 22, 2011 at 10:57 PM

Yes think so, except for being the residuals of each other.

The theil write up explains what the equivalent measure is in biology.

Posted by: edeast | September 23, 2011 at 07:12 AM

What edeast said--the two most popular diversity indices in ecology are Shannon "information" (called that because the index was originally developed in information theory), and the Simpson index (the greater the diversity, the greater the probability that two randomly-sampled individuals will be different species).

In ecology, issues of the decomposability of different diversity indices mostly come up not in the context of looking at the diversity of species at a single site (although they do come up there). Instead, they mostly come up when trying to understand the total diversity across a collection of sites. One component of that total is within-site diversity, another component is the among-site diversity (do different sites have the same or different species?) Debate centers on how precisely to measure within- and among-site diversity, so that total diversity decomposes into either the sum or the product of within and among-site diversity (much debate is about whether we should prefer an additive or multiplicative decomposition). It turns out that some popular single-site diversity indices have no obvious, decomposable multi-site extension.

In ecology, all these debates are kind of old-fashioned (as well as totally played out), and these days only really excite a small and fairly closed circle of people, who are viewed with some bemusement by everyone else. Different indices behave differently, so there is no one best one. Period.

Posted by: Jeremy Fox | September 23, 2011 at 02:31 PM

Jeremy - "In ecology, all these debates are kind of old-fashioned (as well as totally played out)"

I suspect that might be in part because ecology ended up at a better equilibrium, i.e. with a diversity measure that is decomposible, as opposed to one that isn't. It might also be that there aren't as many underlying conflicts within ecologists about values, norms, policy implications.

Posted by: Frances Woolley | September 23, 2011 at 02:57 PM

As mentioned by others, I fully agree the Gini is used because it is used. I think it is also because (as Frances' points out) for the most part it is "close enough" to providing the answer to the question. If economists are anything, they are efficient with their own mental energy. Without intending to misdirect the thread, I think this is why GDP persists as a measure of economic well-being. It is relatively easy to calculate and understand, has a long history of comprehension by non-economists, and is already embedded in the mental tool-box.

If economic analysis is intended to a) explain the world, b) inform and/or recommend decisions, does the use of other inequality measures change the conclusions?

In other words, would a different index lead to different conclusions / actions when comparing UK 1913 with Brazil 1980?

Posted by: Peter | September 23, 2011 at 07:11 PM

Frances, policy implications.

The second part of the in depth blog post I linked to; was inspired by a paper in the 90s where the authors realized the futility of saving all species so came up with a measure of keeping some of a similar species. Measures diversity. Ecological economist trying to pin a value to biodiversity with odds of finding a cure. Similar species have similar probabilities.

The blog post is by tom leinster who proves some general properties of indices, he later writes a paper on it http://arxiv.org/abs/0910.0906 and mentions the Herfindahl–Hirschman Index, which is apparently equivalent to the simpson index.

And now the guys have been bashing Tsallis vs Renyi entropy together, in which simpson/shannon are a subset of.

Posted by: edeast | September 25, 2011 at 02:40 AM

Jeremy,

Leinster's aware of the between, among, global diversity differences. It's just intersting to watch mathematicians try to prove the general cases.

Posted by: edeast | September 25, 2011 at 03:01 AM