The Canadian Community Health Survey asks respondents "In the past 12 months, have you had sexual intercourse?". The overwhelming majority of 18 to 19 years, when asked that question, answered....
...Yes.
(Graph created by using the Stat command histogram SXB_3 if DHHGAGE==3 and then editing.)
There are two possible conclusions to draw from this analysis. One is that some combination of socialized medicine, widespread availability of contraception, and hockey has led Canadian teens to embark on a life of sexual adventure. The other is that there's something wrong with my analysis.
The latter conclusion is the correct one to draw. Before survey respondents were asked anything about their recent sexual activity, they were first asked "Have you ever had sexual intercourse?" Anyone who answered "no" to the "ever had intercourse" question was not asked about their recent sexual activity. (This type of question structure is a commonly used, and eminently sensible, way of reducing respondent burden.)
The question, then, is how to combine the information from these two different questions to find out the percentage of 18 to 19 year olds who are currently sexually active.
Here's how I would do it using Stata.
First, before I start messing around with the "last 12 months" question, I copy it into another variable - that way if something goes wrong I still have my original data.
gen SexRecoded = SXB_3
The "ever had sex" question, variable name SXB_1, is coded as 1=yes, 2=no. Right now people who answered "no" to the "ever had sex" question are coded as "not applicable" or "missing" from the "last 12 months" question. I want to record them as "no" which, in this example, is coded as "2". This is how I do it:
recode SexRecoded ("miss"=2) if SXB_1==2
Basically this takes the missing values in the last 12 months question and converts them to nos for everyone who answered "no" to the ever had sex question.
Corrected, the data looks something like this:
A very different picture emerges - yes, the majority of 18 and 19 year olds are sexually active, but a substantial proportion are not.
I should caution that the Canadian Community Health Survey oversamples Canada's smaller provinces, and these provinces have their own unique demographics - they tend to have fewer immigrants, for example. These data presented here is not weighted to correct for any bias introduced by this sampling strategy, so should not be treated as a definitive estimate of teen sexual activity.
The point is simply that reading the codebook and using information wisely is a critical part of good econometric analysis.
Is the second figure correct? When I look at it you have about 80% saying yes and just under 40% saying no.
Posted by: Ross Hickey | March 18, 2012 at 01:36 PM
Ross: it looks correct to me. just over 60% yes and just under 40% no.
I wonder what this post's title will do for our hits, and spam?
Posted by: Nick Rowe | March 18, 2012 at 01:47 PM
Ross - what Nick said. Since these graphs are all computer generated, the yes and nos have to add up to 100.
Nick - fair cop, it was a deliberately attention-grabbing headline.
I don't know how much worse the spam situation could get but yes, I should check the spam filter more than usual for comments on this post.
Posted by: Frances Woolley | March 18, 2012 at 01:56 PM
Yes, my mistake, viewing it on a bigger screen is easier to read ;)
Posted by: Ross Hickey | March 18, 2012 at 05:37 PM
Can I ask ... what CANSIM table has this information? I've looked through many of the CCHS 3.1 items, and this data doesn't appear anywhere I looked.
(References are good in research?)
Posted by: Chris S | March 19, 2012 at 03:05 PM
Chris S - the data was not obtained from Cansim but by cranking through the numbers in the public use micro data file (PUMF). The Canadian Community Health Survey files are available to university students and faculty through the Statistics Canada's Data Liberation Initiative - generally through the university libraries.
With the new open access policy, more public use files are being made available, e.g. it's now possible to download the Census PUMFs but as far as I know the general public is still not able to access the CCHS.
If you do you have access to the CCHS and you're still not able to find the info by searching for the variable names above (e.g. SXB_1) , check that you do indeed have the entire 2007-8 file. These questions weren't asked to all respondents - with the CCHS there's a base of common questions, and then a whole lot of different modules added on that vary from year to year.
Posted by: Frances Woolley | March 19, 2012 at 04:22 PM