Tories scrap mandatory long-form census: For the first time in 35 years, the census will not feature a detailed, long form that Canadians are obliged to send back to the government.
Instead, a mandatory short form will go out to everyone for next year's census, with basic questions about how many people live in the household and their ages and genders.
The voluntary “national household survey,” with detailed questions about ethnicity, income and education, will be sent to one-third of homes. That's an increase from the 20 per cent of homes that used to get the mandatory long-form...
It's not often that sample selection bias becomes an issue of national importance, but then again, it's not often that census sampling design is outsourced to drunken monkeys. The people who thought that increasing the number of long forms to be sent out will somehow compensate for making the long-form responses voluntary simply do not understand the gravity of the error they are making.
Suppose that we want to estimate the average height of Canadians, and the following sampling design is proposed:
- Find a group of people who are more than 6 feet in height.
- Ask them how tall they are.
- Take the average of the answers you get.
Those of us who are sentient will see that no matter how big you make the sample, this procedure is fundamentally flawed and will produce worse-than-useless results. If you don't see this, worry not: you may yet have a promising career as Minister of Industry.
There is no reason to believe that response rates will not be correlated with the features we are trying to measure: certain groups will be more likely to answer, others will be less likely. These variations in response rates will generate distorted data. In fact, the problem is worse than the simple example described above: we don't know which groups are systematically excluding themselves from the census.
One of the most pressing policy questions facing Canada is how best to develop our skills and improve upon education attainment levels. We need the best possible data to develop policy, and this measure would provide us with data that are at best badly flawed, and are likely to be completely useless.
The census is an exercise that consumes a enormous amount of time and resources. They shouldn't be wasted in this way.
Update: Here's a more temperately-worded explanation for why this is a bad idea.