On May 8th, the results of the 2011 National Household Survey (NHS) will be released.
The voluntary nature of the NHS will compromise the quality of the data collected. For example, the National Household Survey asks people about their religious beliefs. Yet religion has a strong influence on volunteering and civic engagement. The National Survey of Giving Volunteering and Participating found that people who are religious volunteer more. Other studies argue that some religions discourage participation in civic life - but reinforce the message that religion matters.
There is, therefore, likely to be a correlation between a person's willingness to fill out the National Household Survey and the strength of their religious beliefs. How, then, can we possibly know whether the NHS information on the religious beliefs of the Canadian population is accurate?
To take another example: from the previous census, we know the number of immigrants who were in Canada in 2006. From administrative records, we know how many immigrants have been admitted to Canada since then. Hence we can work out how many new immigrants there should be in Canada. But what if the NHS finds a lower-than-expected number of new immigrants in Canada? Is this evidence of a growing class of non-resident permanent residents, astronauts who live and work elsewhere? Is it evidence that some new Canadians have found that life in the great white north does not meet expectations, and have returned home? Or does it simply reflect a lower survey response rate, that new immigrants are here but not filling out the NHS form? It's impossible to know.
In the coming weeks, professional economists will find themselves on the horns of a dilemma. To discuss the NHS data release dignifies it and gives it legitimacy, suggesting that economists might be better off boycotting the whole affair. Yet the media abhors a vacuum. If sensible economists refuse to speak to the NHS data, journalists will search until they find someone who is willing to give them a soundbite. Perhaps speaking to the data is the lesser of the two evils.
Moreover, it's not obvious that the voluntary nature of the NHS renders the data completely useless. Information from the 2006 census and other sources, for example, can be used to adjust for non-response. The more serious issue is the on-going deterioration in data quality. With each passing year, it will become more and more difficult to adjust for non-response bias, because without the long-form census, we simply won't know enough about Canada's demographic composition.
So this is the question: how should economists respond to the National Household Survey release?
These are great points, Frances, and an excellent question. I think the answer, though, is simple. Economists should respond to the NHS release as they would any dataset. We regularly look under the lamp-post for our lost keys, even if we dropped them elsewhere (as the joke goes). It is no different here. We can comment on the patterns we find, and carefully evaluate the reliability of the data in each particular context. We should discuss shortcomings in the data, where they matter, and what kind of bias they might introduce. We have lots of data at our disposal, none of which are perfect. To boycott the NHS data is to discard information. Perhaps the information is less valuable than the previous census releases, but it is valuable nonetheless.
Posted by: Trevor | April 27, 2013 at 10:46 PM
Francis,
How about simultaneously boycotting and filling the vaccuum. By this I mean, economists with integrity should put their effort into demonstrating ways in which the data has become contaminated by making the survey voluntary, and making sure that there is a steady flow of press releases on that topic, drowning out any releases from those with less integrity claiming to have learnt something real.
Posted by: Seamus Hogan | April 28, 2013 at 06:38 AM
Oops, Didn't proof read. Sorry about misspelling your name, Frances.
Posted by: Seamus Hogan | April 28, 2013 at 06:40 AM
As well as using the data responsibly as we should any dataset, we should also carefully critique any attempts to treat the NHS with the same respect and tools that we used the long-form census by other. For our own part, we should make sure that we make appropriate use of any confidence interval information we are able to obtain or create and share the CI techniques if needed. We should also be careful to make sure that invalid or inappropriate comparisons of data sets are appropriately evaluated and flagged. And IMHO, we should continue to critique the loss of a good tool for political purposes.
TABE is hosting a session in Toronto (see www.cabe.ca May 16th) specifically to start our process of understanding the issues with the NHS.
TABE May 16th meeting
Posted by: Jciconsult | April 28, 2013 at 07:50 AM
I've been wondering bout this as well. IIRC, StatsCan made promises to the effect that they'd only release data in which they had confidence. We know that there will be far less useful information in the NHS than there was in the census, and a smaller data set may be the way that loss is reflected.
On the other hand, if StatsCan is forced to push dodgy data out the door in order to make the Conservative govt look good ("See! We told you the NHS would be just as good as a census! Look at all this data!"), then we're going to have to take a very close look at the documentation about response rates, strategies to correct for response bias and diagnostics about data quality.
And if Statscan pushes it all out the door without any meaningful documentation, well, then we have a serious problem.
Posted by: Stephen Gordon | April 28, 2013 at 08:15 AM
Seamus: we (including Canadians-in-exile) need talking points. What should they be? That the data quality is so compromised that the results can't be believed? That the people in stats can are doing a good job? That we don't know how good the data is until we see the results?
As soon as we admit that it's even possible to correct for response bias and generate meaningful results, then the anti-mandatory-census side has won the debate.
Stephen - without access to the raw microdata, I don't really see how we're going to know just how good the data quality is. Sure, there's documentation, but there are just some things that are unknowable, e.g. the difference in response rates between religious and non-religious people (because religiosity is something that we can't get data on in any other way e.g. admin records.) At least with the raw data it will be possible to compare the 2011 demographics to the 2006 demographics and see if anything looks weird.
Unfortunately, only a few people will ever get their hands on the raw data - most of us will use the public use microfiles, which will be massaged to look good.
Do you mean "fewer results" rather than "a smaller data set"?
JCI (Paul) - So the talking point is: "A good policy tool has been sacrificed for purely political purposes." I like that.
On the other suggestions - again, do you think there's much that we can do at this point in terms of critiquing CIs etcc before any of the microfiles have been released? I think we can be pretty sure that the folks at Statscan will do the best that they can, and they're going through enough already with layoffs etc right now - I'm not sure we really want to make their lives worse.
Posted by: Frances Woolley | April 28, 2013 at 08:49 AM
I was thinking fewer variables in the dataset. And you're right about the raw data-public use datafile distinction.
Posted by: Stephen Gordon | April 28, 2013 at 09:17 AM
Stephen, they're releasing immigration, place of birth, citizenship, ethnic origin, visible minorities, language and religion on May 8th. These are ones that are relatively hard to correct for, because we know there are fairly big differences in response rates across these categories, and we don't have a lot of other good data on, say, religion.
Other info e.g. education, income, occupation is easier to adjust for non-response bias, since we can use data from the Labour Force Survey (which IIRC is still mandatory) and from income tax records.
My sense if that if they're releasing the ethnicity and religion data, they're going to release everything.
Posted by: Frances Woolley | April 28, 2013 at 09:39 AM
Yikes. I didn't realise that. I was thinking that they might limit themselves to releasing education, income and the other variables for which there other sources to check against.
This is going to be a problem. I wonder if it'd be the good idea for the CEA (and other interested bodies) to write letters to research journals advising them of the problems with the NHS data?
Posted by: Stephen Gordon | April 28, 2013 at 09:47 AM
George Dionne is currently CEA President, Thomas Lemieux will become president at the Annual General Meeting at the end of the CEA meetings. If we think this is an issue that the CEA should be speaking to, we can suggest a motion for the CEA Exec meetings or bring it up at the annual general meeting (which unfortunately would require us to hang around until Sun and go to the meeting, which I hadn't been planning to do).
Posted by: Frances Woolley | April 28, 2013 at 10:26 AM
Part of the reason that I wanted StatCan in Toronto is to get as much of chance to ask questions. A pumf would solve some of the issues. They have released Census Pumfs before. The release schedule is up on the Stat_Can web site. Some of you may remember the problems we once had with questions on income from one of the census cycles. Just think what we are going to have now. Income is August 14th. I may try to get another session with them scheduled at that point. The real problem is the talking heads who will use the data like it was a census just like they misuse the LFS etc.
Posted by: Jciconsult | April 28, 2013 at 10:31 AM
Frances - I'm presenting in the last session on sunday morning, so I'll be there anyway. And I'll be driving, so I don't have to rush away. Maybe I'll do just that.
Posted by: Stephen Gordon | April 28, 2013 at 10:35 AM
It seems to me that you treat the NHS the same as any other large-sample voluntary survey. You can correct for sampling bias by using whatever information is available in the short form census, but the NHS itself can't be used to correct for sampling bias in other surveys.
Posted by: Neil | April 28, 2013 at 05:54 PM
3 Worthwhile articles regarding this release:
No personal comments about us, Statistics Canada warns employees (May 6th 2013)
http://o.canada.com/2013/05/06/0507-statscan-code/
New code of conduct stresses "loyalty to the government"
http://news.nationalpost.com/2013/03/15/library-and-archives-canada/
PCO micromanages everything, even ads. (Globe and mail Apr. 27 2013)
theglobeandmail.com/news/politics/little-known-war-of-1812-a-big-deal-for-ottawa/article11588326
Posted by: ABC ABC | May 06, 2013 at 11:09 PM
I am appalled that the survey does not look at the one third of the GDP that is unpaid labor. Women should notice how sneakily this question has been taken off the survey when it moved from the long form census. I realize Dr. Woolley may not agree with me that care roles in the home are useful work in the economy and critical to the nation's health care system but maybe we can agree that facts are worth knowing. The survey simply ignores data that would have helped businesses, legislators and economists quantify why some people are not always able to be at their paid work. I have always felt that the third wave of women's rights is to notice what we have just taken for granted, the anchor roles in the home. How cute that the survey intentionally does not notice.
Posted by: Bevrley Smith | May 07, 2013 at 04:08 AM
Bev - thanks for writing. I've written on this issue before here: http://worthwhile.typepad.com/worthwhile_canadian_initi/2010/08/a-shortlived-feminist-victory.html. The question was taken off the census in part because no one was using the data - the quality wasn't good enough, and many of those who were interested in unpaid work had neither the skills nor the means to utilize the data. Most people preferred to look at the very detailed information on time use that has been collected through the general social survey.
I have a close friend who is a homemaker, an advocate, and very good with numbers. Her major frustration with StatsCan data is that she is unable to get access to the data that she would need to do her advocacy work, e.g. detailed breakdowns on the number of people with disabilities by age at the community level. Sure it's available at a price - but community groups aren't able to afford that price. This is an issue that's worth taking on.
Posted by: Frances Woolley | May 07, 2013 at 06:10 AM