The Ontario government publishes an annual "Sunshine List". This is a dataset containing detailed salary information on every Ontario government employee earning over $100,000 per year. The list includes the salaries of public servants, also salaries of people who work in Ontario universities, hospitals, and other government agencies.
A couple of conversations I had earlier today inspired me to look through the list and try to figure out which names are the best paid, and which are the worst. For example, do Nicks, on average, earn more than Franceses? How does the average Livio compare to the average Stephen?
It's not an easy question to answer. Some names occur much more frequently in the dataset than others. Here are the twenty most frequently-occurring first names, and the average salary associated with each. This is the 2016 Sunshine List data, which gives 2015 salaries.
Jennifers and Susans earn less than Michaels, Davids and Johns, possibly reflecting gender differences in average salaries. Peters and Williams earn more than Jasons, perhaps because the typical Peter or William is older than the typical Jason (see here), though possibly for other reasons also. However, for every one of these twenty most common names, the average salary earned is within $10,000 of the mean salary for the entire sample - $127,251.60.
Typical names tend to have fairly typical salaries on average. Likewise, unusual names are often associated with unusually high or low salaries. If we look at all names, without placing any restriction on the frequency with which the name occurs in the data set, we find that the names with the highest average pay all (a) appear only once in the data set, and (b) belong to a chief executive officer or medical specialist.
So the average salary of all of the Gadis on the Sunshine List (all one of them) was $578,633 per year, while the average salary of all the Ehrens on the Sunshine List (all one of them) was $410,543.95 per year.
To avoid having the best- and worst-paid names list dominated by one person with an unusual name in an unusually high- or low-paid job, I decided to focus only on the 223 names appearing at least 100 times on the Sunshine list. Of these commonly-occuring names, the 20 highest paid are:
Whereas the lowest paid are:
The highest-paid common first name list is dominated by names that that were popular for babies born in the 1950s or 1960s, like Alan and Barry, and is almost entirely dominated by boys' names. The lowest-paid name list has more names that were popular in the 1970s or 1980s, like Chad and Kyle, and contains far more girls' names.
By this point you may be wondering what the point of this whole exercise is. An illustration of correlation without causation? An illustration of why it's important to be very careful when interpreting any study that makes claims about names and salaries? Possibly. Perhaps we can learn more from looking at the best- and worst-paid last name in the data.
Again, I chose to focus only on relatively common names, restricting the analysis to last names that appeared 50 times or more in the dataset. Here are the best paid last names:
And here are the commonly-occurring last names with the lowest average salaries:
The highest names list suggests that many well paid jobs in the Ontario public sector still go to people with traditional Scottish, English, Welsh or Irish names. Is this an artifact of age? Perhaps the average Sinclair or Mitchell, like the average Barry, Alan, Charles or George, tends to be older than the average person on the Sunshine List? Or perhaps there is indeed discrimination in the job market? It is also notable that people of British descent no longer have a monopoly on well-paid jobs - the Lius and the Lis are doing o.k.
How I did this:
First, I downloaded a speadsheet with the salary disclosure information in it here: https://www.ontario.ca/page/public-sector-salary-disclosure-2016-all-sectors-and-seconded-employees.
Then, I used the "text to column" feature in Excel to clean up the first name column, so that each person had just one first name. (I should also have done a search/replace and eliminated "Dr." from the first name column, but I didn't do that).
I then imported the data into Stata. I played around with a couple of different approaches, e.g. using egen to create group averages. In the end I created counts for the first names and last names using commands like this:
bysort first_name: gen first_name_count = _N
Then I saved my cleaned up dataset, dropped everything but the salary paid, the first name, and the frequency counts for the first name, and collapsed the dataset so I had a list of first names, and the average salary paid and frequency of each name.
collapse (mean) SalaryPaid first_name_count, by (first_name)
Then I created a list of the average salaries of the highest paid first names using this command:
tabulate first_name if first_name_count>=100, summarize(SalaryPaid) means
I cut-and-pasted that to Excel, and used Excel and Word to sort the data and create the tables you see here.
test
Posted by: Stephen Gordon | February 05, 2018 at 09:30 AM
"The highest names list suggests that many well paid jobs in the Ontario public sector still go to people with traditional Scottish, English, Welsh or Irish names. Is this an artifact of age?"
The first place I'd look is not *age* but rather *number of years in Canada*. Recent immigrants are far less likely to have had time to work their way up through the ranks, especially in such a seniority-driven system as the public sector.
Posted by: Colin Percival | February 06, 2018 at 10:40 PM
FWIW, this immediately reminded me of "Chicken Pox and Name Statistics", from about February 2, on XKCD: https://xkcd.com/1950/
Maybe great minds think alike... and certainly, correlations often work weirdly.
Posted by: PaulS | February 13, 2018 at 02:02 PM
How'd you get MacDonald on both the highest and lowest paid list? If it's due to case-sensitivity, I can assure you that us Mac's have no control over how our names are entered, so you should set that aside.
Posted by: jj | February 13, 2018 at 10:13 PM
jj - I hadn't noticed that! There is a similar anomaly with pay variations among different variations on Steve/Steven/Stephen:
I wonder if there is a correlation between age and case - so people who were hired in 1980 or 1990, and had their names entered into old computer system, don't get the capital D, whereas more recent hires' names were entered into computer systems that could accept more spelling variations.
Posted by: Frances Woolley | February 14, 2018 at 01:41 PM
Paul, yes, very much the same kind of thing! I don't remember if I saw that xkcd cartoon before or after writing this post.
Posted by: Frances Woolley | February 14, 2018 at 01:43 PM