Worth noting that if Toronto is anything like the DC metro area a substantial share of pet owners, even a majority, neglect to license their pets. Whether that biases these results is unclear.

I guess I would be curious if controlling for the percentage of dogs that were "large" in each FSA might have an impact on dogs per capita. Large dogs eat more and would be more costly to support.

Squarely rooted - I'm sure lack of licensing biases all of this. People who don't licence their dogs will differ in income or risk tolerance from those who do. Non-licencing people, therefore, probably buy different dog breeds, and that would affect the breed list. The relationship between income and dog ownership is also probably generated by non-licencing. Policing of dog licencing may be more enthusiastic in some areas than in others - e.g. in my area, by-law officials tend to swop down on dog owners when one of the residents complains about bad dog behaviour. No or little enforcement = no or little need to licence.

Livio, unfortunately the data doesn't break down the breeds at the FSA level. Even if confidentiality prevents a breakdown of breeds at the FSA level, it would still be useful, as you suggest, to have small/medium/large at the FSA level, because the predictors of each are probably different.

"I don't want data to be refreshed, I want it to be archived, so I can look at trends and changes over time!"
Right on!!!!

Dave - yup.

One issue, I think, is that a lot of this data comes from spreadsheets generated by administrative personnel whose training is in making tables look pretty, as opposed to making tables usable for statistical analysis. Hence the use of characters such as - for missing data, the use of heading and subheadings instead of unique variable names, and the insertion of random empty rows (I remember once having a long discussion with an administrative person at Carleton on this, trying to convince her that random empty rows are evil - but to no avail. I'm sure she's still producing exactly the same table.)

I don't know if there's any way of breaking down these silos and communicating to admin folks the needs of data users.

B.t.w., do you have any knowledge of those various mapping programs? Are there any that are compatible with stata? What about this geogratis program that NRC has put out?

Frances: Maybe you've seen this already, but this blog seems to be tracking municipal open data projects in Canada:


The prevalence of this kind of information, and some of the quality issues you raise, are typical of the cusp we are on right now in terms of large-scale analytics with unconventional sources (that is, not specifically collected by academics or statistical agencies). There is a ton of information out there that is not otherwise available, or even retrievable through academic initiatives, but its usefulness is a big unknown. A lot of the "big data" talk out there has a hype aspect to it--a common theme is that the sheer size of these databases (online purchases, public transit use, etc.) overcomes any potential bias or other methodological concerns, or at least softens them. However as the recent debate about the Oregon Health Study is reminding us, large numbers are a necessary but not sufficient condition for useability.

I'm reminded of a presentation by a fellow who measured, amongst other things, the state of the California economy by the number of ladders found by the side of the highway (the intuition being that the number of yahoos who are too dumb to properly tie a ladder to the roof of their truck and who fancy themselves to be contractors is directly proportionate to the state of the housing market. Also, if times were tough, they might be more inclined to come back and look for it). I can't remember much else about the presentation (it was done over the course of a dinner where many very fine bottles of exquisite wine was served), but I remember thinking that that was seriously cool.

Frances, those people sure be using Crystal Reports or equivalent to be turning properly formatted data into reports with the white space and headings/subheadings desires (to be human readable).

Andrew F - one hopes.

Bob, I like that.

