« Clutch Hitting, Frank Robinson and Opportunity Costs | Main | Cash as the real real option -- to do anything »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

A bit OT: any opinion on folks who are suggesting the American experiment with a voluntary ACS is evidence for the viability of Clement's suggestion? Seems like comparing apples to oranges to me.

bigcitylib: One thing I'll say is that the experience was nowhere near as positive as Veldhuis made it out to be. From the report:

The use of voluntary methods had a negative impact on traditionally low response areas, that will compromise our ability to produce reliable data for these areas and for small population groups such as Blacks, Hispanics, Asians, American Indians, and Alaska Natives (see page 13). The current ACS design, based on mandatory methods, results in reduced levels of reliability for some areas and population groups due to low levels of mail response. This is a direct consequence of the additional number of cases that are subject to subsampling. In a voluntary setting, the problem is magnified. The proportion of completed interviews after mail and telephone attempts decreased to less than 50 percent for Black and Hispanic households. This has a significant impact on the total number of interviews and thus, on the reliability of estimates for these population groups.

The response rate on the Survey of Household Spending in 2008 was 63%. It has fallen by more than 10pp over the last decade. It is a voluntary survey.

The SHS is used for everything from forming the CPI to measuring LICO to analyzing the HST. The census is used both in designing the sampling frame and in constructing the weights. Without proper stratification and weighting, the information from the SHS becomes less accurate.

If you a) spent a lot of money on advertising b) spent a lot of money having census workers chase you down then you might get response rates higher than 63% for the proposed NHS.

Or you could make it mandatory (admittedly at some non-zero cost to liberty) and save a lot of money and get really reliable data that is tremendously socially useful.

here are SHS response rates.

Like all voluntary surveys, the data become usable through weights and survey design that use a population benchmark--such as the census.

Mike, I know what you mean by WASPy but I have issues with the term. First lots of WASPy people, aren't Anglo Saxon, they are Normans or Celts or Franks. Second why do we call them WHITE Anglo Saxons. Are there some Black Anglo Saxons? Third it is a pajorative, and I think it best to avoid pajoratives, even in the lighthearted way you used the term WASP. Am I the only person who is bugged by the term WASP? I think I am particularly sensitive about pajoritives because I have a son with intellectual disabilities.

I have been reading the census posts with enthusiasm and have learned lots. Thanks everyone. I was interested by the person who said that they had been chosen to do the follow up survey about diabetes and had found the questions well thought out. I had the opposite experience. When my disabled son was 12, he was also chosen for one of these long form follow up surveys. To start with the questioner kept calling my son a girl. He has a gender neutral name so I could see making a mistake once, but she made it over and over again. But worse the questions were poorly thought out. I was asked if my son's disability interfered with his paid employment. I would answer, "he is 12 years old," only to hear "is that a yes or a no?" Next I would be asked if my son's disability interfered with his ability to drive!! It went on for ages while my disabled son and my baby fussed and pulled on me.

Rachel

Right, Mike Moffat, but I'm interested in the contention that those issues could have been overcome if the Feds were willing to throw money at the problem (ie spend 59 mill or so per annum to ensure an adequate sample size).

Just to be clearer: the paper describing the volountary ACS suggests that the U.S. woul need to spend 59 mill extra (per annum?) to "maintain the quality of the results", is I believe the term they use. I'm trying to figure out what they mean with that phrase.

http://www.theglobeandmail.com/news/politics/flaherty-defends-tory-census-plan/article1651237/

Flaherty's response in support of the voluntary plan is ... that Canadians will do it voluntarily for the good of the country.

Huh?

A few quick thoughts on that:
-Flaherty may be right only in one circumstance: basically, if the response rate is voluntarily so high as to be 100% or identical to that under the mandatory, which is presumably also below 100%. And not identical in total response rate (absolute numbers), but in the distribution.

An argument that is absurd, since - if everyone is so patriotic that they respond - then by definition the supposed basis for objection to the mandatory long-form ('forcing' people to do this) cannot be an issue, since clearly the same people feel so strongly about doing things for the good of the country, they don't need to be compelled.

Okay, I could posit the existence of a sub-group that feels so strongly about being compelled to do anything, that are also quite certain the punishment is unlikely but object philosophically, that they will hide from the Government Inspectors if they believe it is mandatory, but feel so grateful that they are not being compelled that they will submit to the same Government Inspectors if it is voluntary. Heck, in this fantasy, the census could get MORE accurate.

This is First Order Idiocy. They're not even trying to pretend any more.

I predict: within a few short news cycles, they will start accusing those who object to their plan of doubting the Good Spiritedness and Civicmindedness of True Canadians, and asking 'Why do they hate Canada so much?'

bigcitylib,

I only took a quick look at the report http://www.census.gov/acs/www/Downloads/Report03.pdf, particularly Section 4.2 and Appendix 4, and from I can gather, the 59 million dollar increase does not specifically address the question of non-response bias, which is what we are talking about in the Canadian census.

In Appendix 4, the report seems to talk only about the variance. That is, the increase in sample size is intended to decrease the variance of estimates. They do not seem to discuss the mean-squared error, which incorporates the bias into the measure of quality. So, it seems to me that when the report uses the term "quality", it is referring to variance only.

Since the report mentions the subpopulations that were underrepresented due to making the ACS voluntary, perhaps they would have oversampled by subgroups (in fact, I'm sure they would have since that is par for the course in other U.S. surveys). But of course, they cannot oversample for subgroups that they do not know in advance will be underrepresented and neither can we.

(my apologies here for a long personal aside)

Rachel - on the WASPy thing - yes, I am starting to get really bugged by what seems like the 90+% of American movies that use the handy signaling device "the bad guy is the one with the English accent." When people can seriously suggest that BP made a PR mistake putting someone with a British accent in front of the cameras because of the way that British accents are perceived. Plus, like many people born "white anglo-saxon protestant," I am finding that my family is increasingly non-white and non-protestant, and as you say, it's quite possibly they were never anglo-saxon in the first place.

Thank-you for talking about how the census is vital for the weighting of ALL other surveys, this is the issue that I feel is missing from all media reports.

How about getting rid of the entire census? Seems common in Europe.
http://www.economist.com/node/16590962?story_id=16590962

Less info for less money is at least debatable.
Less for more money is not.

@Peter

The Scandinavian countries can do this because they have central population registries where nearly all personal data can be found. The equivalent would be to merge health insurance databases, electoral rolls, drivers' registrations, social insurance databases, real estate records, and I'm sure plenty of other databases that I've forgotten. Not only does it go completely against the government's own privacy rationale, the mishmash of jurisdictional conflicts between the feds, provinces, and municipalities would be an operational nightmare.

@Peter:

Further to a-non's comment;

http://www.norway.no/temaside/tema.asp?stikkord=94303

Which gives them:

http://www.ssb.no/en/

The only recervation I have about the long form is the section regarding race. Why "white"? What is "white"? Only a few generations ago, anyone non-"English" was not considered "white". Irish were not, Italians were not, Germans weren't and certainly not Latinos. The same observation can be stated about "black". How "black" must you be to be considered "black"? In the U.S. one sixteenth "black" ancestry made one a "black" person no matter what the appearnce.

I know there's a need to ask some questions regarding this if only to track how integrated a society is but I do wish there was a better way.

Evagrius, there is much truth in what you say. Take, for example, a Muslim woman of European origin - someone from somewhere in the Baltics, say. Is she white? I don't know, but I do that know that the moment she steps out of her house wearing a hijab she's a member of a visible minority, and some people will make all sorts of assumptions about her.

I remember having an interesting discussion with my daughter when she was little about race (the kinds of conversations policy wonks have with their kids) and she said to me "Chinese people are white." That was her concept of race. Who was I to argue?

Busy morning.. will respond to comments this afternoon. One I wanted to get to ASAP:

Rachel: I wasn't entirely happy with the 'WASPy' label due to accuracy, but it was the most accurate thing I could think of. It hadn't occured to me that it might cause offense. If someone can come up with an alternative, I'll change it immediately.

Identity is a funny thing. I remember as a kid that on the playground if your parents were immigrants from Italy, you'd be considered: a) Italian, b) Italian and Canadian c) Canadian and Italian or d) Italian-Canadian. But for kids like me, we were just 'Canadian'. We weren't Scottish-Canadian or anything like that - that would describe a family that has recently immigrated from Scotland. So for the context of this post, the next obvious alternative to my childhood situation is to call our group 'non-hypenated Canadians', but I could see that causing all kinds of offense! (for good reason)

bigcitylib: "Right, Mike Moffat, but I'm interested in the contention that those issues could have been overcome if the Feds were willing to throw money at the problem (ie spend 59 mill or so per annum to ensure an adequate sample size)."

It's possible that if you spent enough money you could get the response rate to a voluntary survey to be the same as to a mandatory one. But at a minimum I think you'd have to do a huge "Don't Mess With Texas" style marketing campaign about the importance of the census as well as hire an army of people to annoy/harrass/guilt people into filling out the voluntary census. Even then it still probably wouldn't work and if you're concerned about privacy do you really want Statscan contacting you dozens of times to guilt you into filling out your census?


Obviously a voluntary survey is self-selected and therefore a non-probability sample -- one from which estimates of population percentages and other population parameters cannot be derived. However, some of the information on the long form is intrusive, irrelevant, or useless (I got it three censuses in a row, getting angrier each time). The ethnicity information is useless. Many people (including those of us who were relieved when this question was removed from the census decades ago) simply answer "Canadian," and experience in surveying people in Toronto suggests that people of all ethnic groups and countries of origin do this (people come here to be Canadian, eh?). It also seems to be inaccurately reported by people who take it seriously. For example, census data are used to justify the conclusion that the Metis population is growing by leaps and bounds; however, the leaps and bounds are so huge that I suspect that it's due to growing awareness of Metis coupled with an ignorance of their history. Specifically, I suspect many people are claiming Metis ethnicity because they have, or believe they have, First Nations forebears, but that alone does not qualify you as Metis. The religion question is not only intrusive but dangerous; I suspect many people in some religious groups do not report themselves simply because they fear the data may be misused. And a lot of the data collected are highly correlated with income, which the government already has better data on.

A better solution would be to throw out the irrelevant and unreliable items and administer only an intermediate-length questionnaire. And why do we need a 100% sample? Sampling theory seems to me to suggest that a smaller, regionally stratified sample would be as effective, and the savings could be used for further inquiry.

As for your soccer field/baseball diamond example, you should decide what to build by asking the kids what they want to play, not by consulting data about an unrelated variable that could have been collected as log ago as four or five years earlier.

I'm also from EOA, by the way. The census never did nothing for me. I think when I was a kid back in the 50s and 60s London simply consulted the income distributions and put the swimming pools and arenas where the rich people were.

"you should decide what to build by asking the kids what they want to play"

That's essentially what we're doing (or more specifically, asking their parents). We just want to make sure we have an accurate cross-section of kids (that we're not over-asking or under-asking some group).

Evagrius: there are no questions using the word 'race.' There are questions on ethnicity and questions on place of birth. One may freely have doubts about the value of self-assessed ethnicity. One may not worry about questions that simply are not there, e.g. bathrooms, race.

Kevin, true, there are no questions using the word 'race.' However question 19 of the National Household Survey does ask is this person white, south asian, chinese, black etc. Just going out on a limb here - some people might interpret this as a question about race.

Obviously a voluntary survey is self-selected and therefore a non-probability sample -- one from which estimates of population percentages and other population parameters cannot be derived.

That's not true. It's no less a probability sample, and one can still estimate population parameters, but non-response bias affects the results' accuracy to an unknown but significant degree. Depending on the structure of this non-response (random? systematic?), there are ways of accounting for it... unfortunately this generally requires accurate survey weights which themselves are derived from things like the long and short-form census.

Clement is right that one can improve the precision of a survey by increasing the sample size, though this is of limited-to-no-significance in a very large sample (like a census...). In fact, any improvement in precision is asymptotically negligible. Non-response bias, of course, diminishes the accuracy of the results, something sample size cannot affect.

A self-selected sample is by definition a non-probability sample. Saying that it's a probability sample but "non-response bias affects the results' accuracy to an unknown but significant degree" is simply an admission that reliable population estimates cannot be derived from it -- that is, that it is a non-probability sample. For such a sample there are no such things as "accurate survey weights" because all subsamples are non-representative as well -- weighting unreliable data produces unreliable estimates.

Thanks for your response, Mike. Since you were referring to yourself as a WASP, I did not think you were being derogatory. I just wanted to point out some of the issues I had with the term to get you thinking. As I said I tend to be overly sensitive.
Rachel

I'm extremely disappointed with the way the CPC is apparently making their decisions lately. It's ridiculously paranoid about government and frustratingly anti-science. This relates not only to the census, but to the recent decision not to sign the Vienna Declaration, based again on ideology and contrary to scientific consensus. I wish this got as much attention as the census. It really reeks of of American-Republican-style paranoia and contempt for science, which is something I seriously hope never takes root here in Canada.

Rachel: I agree completely with your observations about "WASP." The P is often invalid, too, since a lot of people considered WASPs are actually Roman Catholics. Like you, I don't find it offensive but wrong and misleading. Perhaps people who use it would profit from reviewing the relevant census and survey data.

No, you're still wrong. A random sample is a probability sample by definition; that the population estimates derive from a biased sample does not make it any less random, but simply biased. We can still achieve a desired level of precision for such a sample to control random error, but we will miss the generally unquantifiable component of error contributed by non-response bias. I realize I'm quibbling over technical terms, but then (we) statisticians tend to be very particular about such things. In any case, such bias is only a significant problem if there is a systematic, i.e. non-random pattern to the non-response. Unfortunately, that would absolutely be the case with a voluntary "census".

Josh: A self-selected sample is not a random sample. A random sample is one in which every member of the population has an equal probability of appearing, and a self-selected sample is not. As you noted, the error in a self-selected sample is unknown, so we cannot possibly get accurate estimates from it -- if we don't know how big the error is, we can't possibly know whether we've reduced it.

I fear, however, we may be arguing at cross-purposes. A sample in a voluntary survey is indeed a random sample of the people in the population of interest who take part in surveys. We can estimate the error, then, and the accuracy of the estimates for that percentage of the population (sometimes substantial) that volunteers for surveys. We cannot, however, assume that those estimates are accurate within a desired range for the entire population because we don't have a representative (that is, random) sample of the entire population, and sampling formulas assume a representative sample.

Or maybe we're just using the same terms with different meanings. Anyway, we agree that a voluntary census survey would be rubbish, so one of us must be right, eh, which is the point.

Josh: Aha! I see what we're quibbling over. It's the use of the word "can." You're saying in effect that if non-response bias is zero then accurate estimates can be made, in the sense that in that case the estimates will be accurate within a known range. However, I'm arguing that since we have no idea of what the non-response bias is, we cannot make estimates that we know are accurate within a known range.

It's a bit more complex difference than that, of course, but I think if we thrashed it out we'd eventually come to a point where we were agreeing without changing either of our positions. The difference is linguistic rather than statistical.

No, that's not what I'm saying, but we are arguing at cross-purposes. Specifically,

A random sample is one in which every member of the population has an equal probability of appearing, and a self-selected sample is not.

What you are defining is a simple random sample, but this is not true of many or most surveys that aim to represent a heterogenous target population. For example, national opinion polling requires stratified random sampling, but to achieve representativeness of, say, provincial populations, such strata are often oversampled. The sampling weights of each stratum are subsequently applied to arrive at population estimates.

Now, if non-response were uniform across the whole population, the bias could be estimated, but since non-response varies depending on demographics, weighting by strata or subpopulations would be required to adjust for it. Of course, since such weights derive from the census - and the long-term census in particular - we are left with a voluntary survey of great precision with considerable reason to doubt its accuracy.

Josh: Okay, I should have included stratified samples in my definition, but the principle is the same -- within each stratum each member of the population has an equal chance of appearing in the stratum. And you cannot reduce systematic bias with weights derived from the census, because the reason for bias is probably confounded with the subsamples on which your weights would be based. And anyway the sytematic error is unknown -- you can't adjust a quantity you can't estimate.

But yes, it's clear we're arguing at cross-purposes, and I think the difference is linguistic -- in the interest of ending this futile dispute between two people who probably have the same ideas I won't get into the snit I was going to go into about your idea that precision and accuracy are different things. Obviously we use these terms with different meanings.

I still think the idea of using a stratified sample for the census should be investigated. That's a syre way to reduce non-sampling error, for a start.

A census by definition is not a sample, but a complete inventory of sampling units in a population. The long-form census is thus something of a misnomer, apart from being mandatory. I've little doubt that the selection of households (the primary sampling unit) is accomplished through a complex multi-stage protocol.

Just as a point of interest: Statistics Norway has an introductory brochure at http://www.ssb.no/english/about_ssb/this_is_ssb/this_is_ssb.pdf in which their statistical sources are briefly described on pp. 10-11:

How does Statistics Norway produce statistics? Statistics Norway’s statistics derive from two main sources: administrative registers and survey questionnaires. In addition, an increasing amount of data is collected from businesses and municipalities’ own computer systems. Access to administrative registers: The Statistics Act gives Statistics Norway access to a number of administrative registers, so that statistics can be produced on the basis of figures that already exist. At present, Statistics Norway uses around 60 such registers. The three core registers are the Population Register, maintained by the Directorate of Taxes, the Brønnøysund Register Centre’s central Coordinating Register for Legal Entities, and the Norwegian Mapping authority’s Ground Property, Address and Building Register (GAB). The Norwegian Labour and Welfare Administration’s register of employers/employees is also a key register for Norwegian official statistics.

Survey questionnaires and interviews: If the data are not available in an administrative register, the information can be collected through questionnaires, sent by Statistics Norway to companies or individuals. Statistics Norway receives close to 400,000 questionnaires annually, preferably in electronic form. In addition, some 200,000 interviews are conducted, either by phone or door-to-door. In collecting data, Statistics Norway aims to burden companies and individuals as little as possible. The most important measure for reducing the response burden is to help facilitate the use of Norwegian administrative data in statistics production, so that it is not necessary to submit the same information more than once. In addition, more and more data is collected direct from companies’ and institutions’ own data systems, and increasingly also from a third party, for example, by having a head office report data for all the shops in its chain.

I don't see Canada ever setting up as comprehensive a national scheme, since even if the privacy hurdles can be overcome, the whole thing would crash into our provincial vs. federal squabbles ("Let 'em freeze in the dark without our data!!!").

Sorry, I respect your economics knowledge and your intelligence, but I cannot be agreeing with you on this. Nothing can justify this mandatory census and nothing justify putting someone in jail for not answer a survey! And you completely forgot the fact than the individual responses are more flawed (respondents tend to made more flawed answers in mandatory surveys than in voluntary ones) in a mandatory census than a voluntary census.

"But with the survey, we could re-weight the results against the mandatory census. But in the case of the voluntary census, we have nothing to re-weigh the results against!"

Not exactly, you seem to underestimate the non-census amount of information detained by the government on each citizen. We can re-weight with this information. And you can use the last mandatory census as a base (not as efficient as a mandatory census theoretically, though).

Of course, it will be more difficult to do that than the actual
mandatory census, but with work and intelligence, we can do something to fix it.

"Note that this problem does not go away if you increase the number of people you voluntarily survey. If Polish immigrants are less likely to respond to the survey relative to guys named Moffatt, then you'll get the same results if you send the survey to 20% of the population or 30% of the population."

That would be the case if the sample is "randomly non-stratified", but you can also stratify to include more members of these underrepresented categories in the "randomly stratified" sample

"Unless the government is planning on putting questions such as race, ethnicity, income and religion into the mandatory short-form census, we're going to have a really difficult time doing any kind of planning in heterogeneous neighbourhoods, as shown by this example."

The federal governement has all the information about race, ethnicity (unless you're an illegal immigrant! ;) ) and income (not religion, though), even without the census! And they can ask to procincial governments to provide other informations.

Furthermore, other instances, governments, companies and local communities could do themselves (with the collaboration of Statitics Canada, why not?) the task to survey voluntarily people if they want a more precise result.

Economists seem to have a problem with statisticians, I wonder why...

Huh? Economists and statisticians get along very well; we speak very similar languages. I'm not aware of any statistician who has pointed out that economists have somehow missed something in this file.

And you're making the same tedious mistakes that the govt talking points make. Just how are we supposed to know who is underrepresented without an unbiased census to provide guidance? Just how are we supposed to reweight for education and income unless they're part of the mandatory census?

And if you're so fired up about privacy, why on earth are you proposing the creation of Nordic-style Big Brother administrative databases? The whole point of having a census is to avoid that road. You can argue one point or the other, but not both.

But hey, thanks for dropping by.

"Huh? Economists and statisticians get along very well; we speak very similar languages."


Okay, maybe I should be more precise: Econometrists have problems with statisticians, not because of the language though but because of fear...

"I'm not aware of any statistician who has pointed out that economists have somehow missed something in this file."

I agree, but I think the challenge to deal with this decision is possible to accomplish, with work and intelligence. But all economists (and statisticians, unfortunately) have just this to say: "it's impossible to do anything else than put people in jail if you don't asnwer to the mandatory census". Nothing can justify the fact that you force people to answer a mandatory census, and scientists can solve (not perfectly, though) this problem.

"Just how are we supposed to know who is underrepresented without an unbiased census to provide guidance? Just how are we supposed to reweight for education and income unless they're part of the mandatory census?"

A large amount (not all, though) of this information (like education and income) is known by the government right now, without the census! Again, you're underestimate the amount of information detained by governement on each citizen.

"And if you're so fired up about privacy, why on earth are you proposing the creation of Nordic-style Big Brother administrative databases? The whole point of having a census is to avoid that road. You can argue one point or the other, but not both."

Maybe the best argument that I read in favor of the mandatory census until now. Yeah, I have a problem with Big Brother style databases too and I'm against all forms of mandatory databases. I don't propose that style of databases, personnally. This argument is interesting, and I share your worryning on this. Big Governement Conservatives act hypocritically on this issue, I know it!

The comments to this entry are closed.

Search this site

  • Google

    WWW
    worthwhile.typepad.com
Blog powered by Typepad