Over the past year or so, cities across Canada have been creating open data portals (see, for example, Vancouver, Calgary, Ottawa, Montreal, and Halifax). But Toronto's is special - it has data on cat and dog licences.
The data reveal the most popular dog breeds in Toronto in 2011:
A spreadsheet with the full list of licensed dogs by primary breed is here.
The City of Toronto also provides data on the number of licensed dogs (and cats, if anyone is interested) in each of Toronto's 95 Forward Sorting Areas (FSAs). An FSA is the first three digits of a postal code, e.g. M3S.
Statistics Canada provides detailed profiles of each FSA in Canada through its Census (and now National Household Survey) profiles. How hard could it be to merge the two data sets, and figure out the determinants of dog ownership?
Much harder, it turns out, than I expected. The .csv file I downloaded from the Statistics Canada web site turned out to be useless, because the data was in just three columns, instead of a nice matrix. The alternative download option, B2020, requires custom software that only runs on Windows. That software does produce a file that's useable, but far from clean. For example, missing values are coded as "-" rather than "." which causes major headaches when importing the data into Stata. Variables aren't properly labelled, for example, the same label, "unemployment rate", is used for the unemployment rate of males, females, males 15-24, and so on. I created a 2006 FSA Census Profile file for Stata that's downloadable here (.dta). It's useable, but not pretty. A slightly less processed spreadsheet file is here (.csv).
Using the 2006 Census data and the 2011 Dog registration data I generated a rough estimate of the number of licenced dogs per capita in each Toronto FSA, and the distribution of dog ownership. I was surprised that the dog to people ratios were so low (and because the population data is out of date, those ratios err on the high side).
I spent a little bit of time playing around with the data to discover some correlates of dog ownership (see the table below).
Given that attitudes towards dogs vary from culture to culture, I included "proportion immigrants" in the FSA as an explanatory variable - it was negative and significant.
Some high rises ban dogs; in any event it's harder to let a dog outside to do its business in a multi-story building. The proportion of households living in apartments five stories or higher is also significantly negatively associated with dog ownership.
I would have expected to have found some sort of relationship between dog ownership and employment, but didn't. I suspect this may be because employment has opposing effects on dog ownership: the income and stability associated with employment would tend to encourage dog ownership, but the demands of work discourage it. I also couldn't find a clear link between family composition and dog ownership, but this may simply be because I wasn't using quite the right measure of family composition.
Median income in an FSA is positively correlated with the number of licenced dogs per capita in that FSA. Whether that is due to a positive association with dog ownership and income or to an association with dog licensing and income is something that cannot be assessed with the data available.
It's wonderful that Canada's cities are engaging in these open data initiatives. But there's a frustratingly large gap between what the data is and what the data could be. For example, the City of Toronto dog and cat licensing data is, apparently, refreshed annually. I don't want data to be refreshed, I want it to be archived, so I can look at trends and changes over time! The various cities differ, too, in the amount and types of data posted on-line. This is a particular issue in the Lower Mainland, as the City of Vancouver is only one of a number of municipalities in the Greater Vancouver area.
There is an even larger gap between what could be done with the data and what I, personally, am able to achieve. The Everyday Analytics blog, for example, did quite a pretty analysis of the Toronto cats and dogs data. What would be really neat would be to map the FSAs, load in data on the location of Toronto parks, calculate the distance between an FSA and the nearest off-leash (or large, dog-friendly) park, and then use distance from the nearest park to predict dog ownership rates in an FSA.
Unfortunatley I have no idea how to go about doing this. The Everyday Analytics blog uses Tableau mapping software. Is this the best? What are the alternatives? So I'll end this blog post with a bleg - is it worth investing the time and effort that it would take to learn how to use mapping software? What's the best software to use? Any thoughts would be greatly appreciated.