Statistics Canada has just released a dataset with detailed anonymized information on all confirmed COVID-19 cases in Canada, available for download here.
Unlike many of the available trackers, the Statistics Canada data reports cases by date of onset, defined as "earliest date available from the following series: Symptom Onset Date, Specimen Collection Date, Laboratory Testing Date, Date reported to the province/territory or Date reported to Public Health Agency of Canada." Defined that way, the case numbers start to ramp up in the second week of March. The decline in late March almost certainly reflects the time lag between symptom onset and confirmed COVID-19 diagnosis, rather than any peaking of the curve.
Using the data, it is also possible to estimate hospitalization probabilities, that is, the predicted likelihood that someone with a confirmed COVID-19 diagnosis will end up hospitalized. This is no virus for old men - roughly 70 percent of men over the age of 80 diagnosed with COVID-19 end up hospitalized [corrected]. This diagram does, of course, overstate the true hospitalization probabilities, as people with mild cases of COVID-19 are unlikely to be tested - or even eligible for testing - and also unlikely to be hospitalized.
Here's a final graph - one that I found quite surprising, in fact - conditional upon being hospitalized with a confirmed COVID-19 diagnosis, age and gender seem to have little impact on the probability of receiving ICU treatment. The sample that this is based on is small, however - 152 individuals.
Here is how I created these charts:
- First, I downloaded the data from here as a .csv file: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310076701
- Next, I opened the .csv file, removed the introductory text in the first few lines of the file, and the footnotes at the end of the file, and then saved it.
- I imported the .csv into Stata, and then added variable names/cleaned the data using this code:
***merge separate month, day and year variables into a single date avariable
gen episode_date=mdy( episodedatemonth6, episodedateday6, year)
format edate %dM_d
***install a nice graphics scheme, to replace the crummy Stata default
ssc install blindschemes, replace all
set scheme plottig
**create histogram graph
hist episode_date, freq title("Earliest of Symptom Onset Date, Test Date or Reporting Date") subtitle("Number of Cases, Canada, March 29, 2020") note("Source: Statistics Canada. Table 13-10-0767-01 Detailed confirmed cases of COVID-19 (Preliminary)")
**rename gender and age group variables; add labels to them.
recode gender7 (1=1 "male") (2=2 "female") (3=3 "other") (7=7 "unknown") (9=9 "not stated"), gen(gender)
recode agegroup8 (1=1 "0 to 19") (2=2 "20 to 39") (3=3 "40 to 49") (4=4 "50 to 59") (5=5 "60 to 69") (6=6 "70 to 79") (7=7 "80 or older") (9="Not stated"), gen(age)
***create hospitalization variable
recode hospitilazation10 (1=1 "hospitalized") (2=0 "not hospitalized") (7=.a "unknown") (9=.b "not stated"), gen(hospitalized)
***run logit analysis; generate and plot marginal effects
logit hospitalized i.age#i.gender if age<8
margins i.age#i.gender
marginsplot, title("Probability of hospitalization by age and gender") subtitle("Canada, March 29, 2020") note("Source: Statistics Canada. Table 13-10-0767-01. Logit analysis of 1,583 cases")
***create 0/1 ICU variable
recode intensivecareunit11 (1=1 "in ICU") (2=0 "not in ICU") (7=.a "unknown") (9=.b "not stated"), gen(ICU)
***run logit analysis; generate and plot marginal effects
logit ICU i.age#i.gender if age<8 & hospitalized==1
margins i.age#i.gender
marginsplot, title("Probability of needing ICU care by age and gender") subtitle("Canada, March 29, 2020") note("Source: Statistics Canada. Table 13-10-0767-01. Logit analysis of 1,017 cases")
Your sentence says this: "This is no virus for old men - roughly 70 percent of men diagnosed with COVID-19 end up hospitalized." What I read from the graph is "This is no virus for old men - roughly 70 percent of men *over 80 years of age* diagnosed with COVID-19 end up hospitalized."
Posted by: Greg | March 30, 2020 at 03:59 PM
Interesting that the StatsCan data doesn't include any geographical details -- not even province.
Posted by: Phil | March 30, 2020 at 04:01 PM
People have various risk factors for more severe symptoms, so it may be that once someone has progressed to a certain point, age and gender matter less. Those who need hospitalization may include smokers/vapers, HIV, TB, etc. Lots of younger people in those groups. There is some theorizing that the dose of viral exposure also correlates with morbidity.
The age distribution on the PHAC epidemiological summary looks (on an eyeball basis) a lot like the age distribution of the general population. But likely some people will be misled by that and interpret it as probability of illness.
https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/health-professionals/epidemiological-summary-covid-19-cases.html
Posted by: Shangwen | March 30, 2020 at 04:18 PM
Greg - thanks so much for pointing this out.
Phil - yes. I imagine that's due to confidentiality - I imagine that once you get down to the level of a hospitalized 20 to 29 year old in Saskatchewan, there's a real risk of compromising people's privacy.
Posted by: Frances Woolley | March 30, 2020 at 04:29 PM
Phil - the web page that Shangwen links to https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/health-professionals/epidemiological-summary-covid-19-cases.html has links to every province's page if you scroll down a bit.
Shangwen, thanks for that link. Yes, you're right - the only demographic for which there's a big difference are 0 to 19 year olds, who make up roughly the same proportion of the population as the 65+ group, but make up a much smaller percentage of diagnosed COVID cases.
Good points about the risk factors, dose of viral exposure. The dose factor is probably important when looking at the male/female differences, since the health care workforce is majority female, and its the health care workers who are getting those big doses.
Posted by: Frances Woolley | March 30, 2020 at 04:43 PM
Signing in here as a lurker. Thanks Frances for the analyses and others for the comments.
Cheers, Wendy Watkins
Posted by: Wendy Watkins | March 30, 2020 at 05:49 PM
Wendy - thanks for dropping by and for the kind words - great to hear from you, hope you're keeping well.
Posted by: Frances Woolley | March 30, 2020 at 07:56 PM
Seems to be the same pattern in most/all countries.
I hear that males have weaker immune systems, and that paradoxically, this might actually make sense from an evolutionary point of view of maximising gene survival. Because there are always trade-offs, and given different gamete size (eggs expensive/sperm cheap), Jensen's inequality says men should be more risk-loving. But I don't understand it properly.
I don't know why the 1918 Spanish flu was worse for younger adults. You would think they would be the strongest?
As an older male, I console myself with the thought that at least we're not getting the barrage of flak about the patriarchy if it were the other way around! (Or, not so much; there are always exceptions.) And rather me than my daughters.
Good to see you getting into this data, and good on StatsCan too!
Posted by: Nick Rowe | March 31, 2020 at 06:30 AM
Nick -thanks for commenting and for the kind words. On the Spanish Flu - we don't really know what the death toll would be had the 1918 population had the age profile, smoking rates, and rates of obesity and diabetes that today's population have (this, by the way, is why I'm so worried about what this disease will do to south of the border where over 1/3 of the population is obese).
On COVID - one theory about why it's bad for men is that death from COVID seems to be linked to cardiovascular problems, and men are more likely to have those than women - see e.g. https://www.statnews.com/2020/03/03/who-is-getting-sick-and-how-sick-a-breakdown-of-coronavirus-risk-by-demographic-factors/.
I'm glad you're safe and well.
Posted by: Frances Woolley | March 31, 2020 at 09:13 AM
This is consistent with other respiratory infections: men are more likely to develop severe pneumonia than females, but conditional on pneumonia being severe enough to require an ICU, there is little difference in outcomes.
There are all kinds of theories as to why, ranging from sex hormones, chromosomes, immune system differences and behaviour (drinking/smoking) but a lot may be due to good old physiology: males have larger lungs, larger necks and trachea, and wider airways - leaving them more likely to take the virus deep into their lungs.... instead of stopping them in the upper respiratory tract.
In general there’s a 3x higher risk of pneumonia if you’ve had tonsils removed.
Posted by: LJ Gould | March 31, 2020 at 11:19 AM
'instead of stopping them in the upper respiratory tract'
Vaping has a different physical sensation to lungs than smoking a rollo. Don't men consume more alcohol? Yes, I know they drink because of their wives. lol
I'd like to see higher numbers on recovery from ICU. Only one was reported in AB.
Economics has changed over the years. I had to use SPSS.
Time and all that goes with it.
Back to lurking.
Posted by: Dee | March 31, 2020 at 01:35 PM
LJ - thanks for this. Interesting to hear about the tonsils connection - never knew that.
Posted by: Frances Woolley | March 31, 2020 at 05:55 PM
Here’s a non-technical reference for tonsils:
https://www.sciencedaily.com/releases/2018/06/180607135151.htm
There are also some stories on WebMD. Nothing specific for Covid, but if I remember correctly (20 years back) the difference was greater for men than women. That’s one reason recommendations for tonsillectomy changed.
Posted by: LJ Gould | March 31, 2020 at 07:56 PM
LJ - Too late for me, though, unfortunately!
Posted by: Frances Woolley | March 31, 2020 at 08:14 PM
Too late for me too. And I am diabetic (never eaten sugar but bad genes on both sides). So who knows if it will be adios cruel world . In fact no way. I am recently retired, cooped up in my apartment and my region is cut off from the rest of the province (though it’s because we don’t have levels 3-4 hospitals).
Posted by: Jacques René Giguère | April 01, 2020 at 03:20 AM
Jacques René - take care, and stay well!
Posted by: Frances Woolley | April 01, 2020 at 08:52 AM
The tonsillectomy connection seems to apply to childhood procedures, possibly because it impairs development of the immune system. I had mine out as an adult so am desperately clinging to this small fact.
Posted by: Shangwen | April 02, 2020 at 06:33 AM
Shangwen - I'm afraid that doesn't bring me a lot of comfort, unfortunately!
Posted by: Frances Woolley | April 02, 2020 at 09:38 AM
Great information and analysis. Thanks.
Posted by: mika | April 05, 2020 at 04:37 PM
Nice to see the data, I hunted Statscan - didn't see posted the coding for the data. Not sure where it is?
Do you know the coding for "Status" (the last variable)?
Thanks,
Cam
Posted by: Cam H | April 05, 2020 at 06:08 PM
Cam, I had to look in the .csv file with the downloaded data to find out.
Status is coded as 1=deceased, 9=not stated. However for every single entry in the dataset status is coded as 9 - probably for confidentiality reasons.
f.y.i. here are all of the footnotes with coding info:
1 Source: Public Health Agency of Canada, COVID-19 epidemiological reports, with contribution from Provincial/Territorial Ministries of Health.
2 Because the COVID-19 pandemic is rapidly evolving, these data are considered preliminary. The data published by Statistics Canada only account for those where a detailed case report was provided by the provincial or territorial jurisdiction to the Public Health Agency of Canada (PHAC). Statistics Canada’s detailed preliminary confirmed cases will not match the total case reporting done at the provincial and territorial levels which are reported daily by each jurisdiction and compiled by the PHAC. The discrepancy is due to delays associated with the submission of the detailed information, its capture and coding. Hence, Statistics Canada’s file on detailed case reporting is a subset of the total counts reported by the health authorities across Canada.
3 Confirmed cases are laboratory confirmed cases for which a case report form has been received by the Public Health Agency of Canada from provincial or territorial partners.
4 Statistics Canada generated case identifier number.
5 Date case was last updated (month: MM and day: DD).
6 The episode date (month: MM and day: DD) is created from the earliest date available from the following series: Symptom Onset Date, Specimen Collection Date, Laboratory Testing Date, Date reported to the province/territory or Date reported to Public Health Agency of Canada. When no date is available, this field is considered 'Not stated' and given the value 99 for both month and day variables.
7 Gender codes: 1 = Male, 2 = Female, 3 = Other, 7 = Unknown, 9 = Not stated.
8 Age group codes: 1 = 0 to 19 years, 2 = 20 to 39 years, 3 = 40 to 49 years, 4 = 50 to 59 years, 5 = 60 to 69 years, 6 = 70 to 79 years, 7 = 80 years or older, 9 = Not stated.
9 Transmission codes: 1= Travel exposure – cases that had contact with a travel-related case or had travelled outside of Canada in the 14 days prior to illness onset. 2 = Community exposure – cases that had no known contact with a travel-related case and had not travelled outside of Canada in the 14 days prior to illness onset. 3 = Pending – confirmation on exposure setting is pending.
10 Hospitilazation: 1 = Yes, 2 = No, 7 = Unknown, 9 = Not Stated.
11 Patient was admitted to the intensive care unit: 1 = Yes, 2 = No, 7 = Unknown, 9 = Not Stated.
12 Status: 1 = Deceased, 9 = Not stated.
How to cite: Statistics Canada. Table 13-10-0767-01 Detailed confirmed cases of coronavirus disease (COVID-19) (Preliminary data), Canada
https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310076701
Posted by: Frances Woolley | April 05, 2020 at 07:51 PM
Mika - thanks!
Posted by: Frances Woolley | April 05, 2020 at 07:51 PM
Mika,
Likewise thanks very much.
The US data released by the CDC last week:
https://www.cdc.gov/mmwr/volumes/69/wr/mm6913e2.htm
Shows 50% of non-ICU Covid-19 hospital beds, and 45% of ICU beds occupied by people aged under 65. While deadly for older people - it impacts those under 65 harshly as well. Alberta had a fatality in the 20's last week. Too bad different age splits - be interesting to compare under 65 Canada and USA.
I started looking at the death rate after seening the Lancet article from March 12th that tries to adjust for the impact of a median time to death from onset of symptoms of about two weeks (I've see 8 to 18 days). They estimated a death rate on confirmed cases outside of China of 15.2%
https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30195-X/fulltext
And that's average - I'd hate to see the +60 number!
The author's highlight this is confirmed cases and misses some (but in now way all) of the people with mild or no symptoms. Many countries have reported that ~ 40% of their confirmed cases have been pre or asymptomatic.
Thanks again,
Cam H
PS I've shared my Covid-19 file widely, It's an Engineer's look at the data (lots of graphs). I'd be happy to flip a copy. Reach me at cam.howey at gmail.com
Posted by: Cam H | April 06, 2020 at 08:58 AM