From my inbox:
[O]ne of my colleagues insists that he does not know of any way to correct for biases in mandatory surveys from outright lying or refusal to answer (in spite of any penalties for doing so), and I had to admit I didn't really know either. Can one easily detect fabricated or implausible information in a mandatory census and do long-form census takers do so?
The idea that respondents are less likely to lie (for want of a better word) in a voluntary survey than in a mandatory census is plausible, and it cannot be dismissed out of hand. But then again, it can't be used to defend the decision to make the long form voluntary. At least, not yet.
Statistics Canada is under no illusion that its census database is error-free. Willful misrepresentation is only one reason for incorrect answers: respondents may forget things or misunderstand questions, or correct answers may be miscoded into the database by StatsCan employees.
The following passage is taken from a page on Statistics Canada's website discussing how the 2006 census was conducted. Presumably, it was written before the recent proposal was made (and it will hopefully remain up after I've brought attention to it - these are crazy times):
After the capture, completeness and coverage editing and corrections, and coding operations are complete, the data are processed through the final edit and imputation activity which is mostly fully automated. The editing process detects the errors, and the imputation process corrects them.
A series of detailed edit rules that identify any missing or inconsistent responses are applied. These missing or inconsistent responses are corrected most of the time by changing the values of as few variables as possible through imputation. Imputation invokes "deterministic" and/or "minimum change hot deck" methods. For deterministic imputation, errors are corrected by inferring the appropriate response value from responses to other questions. For minimum change hot deck imputation, a record with a number of characteristics in common with the record in error is selected. Data from this "donor" record are borrowed and used to change the minimum number of variables necessary to resolve all the edit failures.
The Nearest-Neighbour Imputation Methodology (NIM), developed for the 1996 Census to perform the edit and imputation for basic demographic characteristics such as age, sex, marital status, common law and relationship to Person 1, was expanded and implemented into a new generic system for the 2001 Census called CANadian Census Edit and Imputation System (CANCEIS), which is a PC-server-oriented system as opposed to the main frame, to process demographic (age, sex, marital status, common law and relationship to Person 1), mobility, labour, place of work, and mode of transportation variables.
The 2006 Census saw a further extension of CANCEIS to all of the remaining variables. This continued to allow more extensive and exact edits to be applied to the response data while preserving responses through the minimum change hot deck imputation.
In other words, Statistics Canada does its best to identify and correct obvious outliers. But in its view, these sorts of errors are less serious than problems associated with coverage - that is, people who either didn't receive a questionnaire or who didn't respond :
While several factors such as those noted under the heading Error Detection can influence census data accuracy, the accuracy of census counts and data is first affected by the degree to which the total population is missed in the census (undercoverage). The census count of 31,612,897 persons in 12,506,814 dwellings does not include persons living in dwellings missed during census enumeration, or persons mistakenly omitted from the questionnaires of responding dwellings. (The count of dwellings includes occupied private and collective dwellings and responses received from outside Canada. It excludes unoccupied dwellings and dwellings occupied only by foreign and temporary residents.) Nationally, the final estimate of the 2006 Census net undercoverage rate is 2.8%, compared with 3.1% for the 2001 Census. Net census undercoverage varies from one province and territory to another and from one age group to another.
How does Statistics Canada check the accuracy of its census data? It compares them to estimates from other sources:
An assessment of the quality, comparability and limitations of the 2006 Census data is carried out as an integral part of release and dissemination activities. All variables are certified before release, by way of a set of brief studies designed to judge the consistency of the data with that of previous censuses and that of alternate data sources. This process is augmented by measures of data quality provided by evaluation studies. The studies provide indications of the quality of the census data as affected by potential sources of error--e.g., coverage, response, non-response, processing and sampling--and of the impact on individual variables.
Before anyone says "Aha! So there are alternatives to the census!", remember that the census is the only database where so many variables are observed simultaneously. Income can be checked against tax records, education can be checked against school records and housing can be checked against CMHC data, but no other source puts them all together.
It is possible that individual files will have fewer errors if the census is voluntary, but these gains look to be small and - in the absence of empirical evidence - mainly hypothetical. In contrast, the losses associated with self-selection bias are large and well-documented. If the government wishes to pursue this idea, then it should be field-tested before making it a basis for policy.
But the main reason why I am skeptical of claims that a voluntary survey will yield more 'honest' results is the way the government has handled the file. After a summer of mockery and dismissiveness, the government and its supporters have created a significant constituency that now believes that the census is a tool of its political opponents. We're going to get the worst of both worlds: a census with a biased sample and a higher rate of inaccurate responses.
"After a summer of mockery and dismissiveness, the government and its supporters have created a significant constituency that now believes that the census is a tool of its political opponents."
This is the greatest harm caused by the whole census debacle - the erosion of trust and social capital.
Posted by: Frances Woolley | September 13, 2010 at 10:04 AM
I agree Frances. If they burned up Statcan's hard-earned reputation for some kind of positive political return, at least I could understand why they did it. Not like it, but understand it. But, and perhaps this is why I'm a pointy-headed academic instead of a political strategist, I can't even see where they gained politically on this. What a waste.
And this is what makes me drift to pondering Stephen Taylor-esque theories that the attack on SC's reputation was a *goal* and not a *tool* in the pursuit of some greater electoral goal.
Posted by: Kevin Milligan | September 13, 2010 at 01:27 PM
Stephen Can you do an update on this post. http://worthwhile.typepad.com/worthwhile_canadian_initi/2006/06/factor_shares_c.html
Posted by: edeast | September 13, 2010 at 07:49 PM
I was a census taker this year (in the US). I was expecting lots of refusals, and lots of blatantly false answers. Surprisingly, though I did have a few refusals, I had virtually no responses that smelled of mendacity. I'm confident that everybody who talked to me gave me correct information. So the premise of this question seems to me to be unsound.
On an unrelated note, I was also surprised that people's political views had little influence on their attitude toward the census. I expected that liberals with Obama stickers and Gay Freedom flags would be happy to talk to me, and conservatives with NRA stickers and Don't Tread on Me flags hostile. That turned out to be totally off-base. There was absolutely no correlation between political views and attitude toward the census.
Posted by: Will | September 21, 2010 at 04:29 PM