This is cross-posted on democraticSPACE.

There are a **lot** of published opinion polls out there: I count
41 since September 8. The trouble is that many of them contradict each
other - just what are we supposed to make from all this? Let's haul out
our introductory statistics textbooks and take a closer look.

Firstly, let's get an idea of the sort of variations that we should
expect to see, that is, the inherent uncertainty with drawing random
samples. Ekos, Nanos and Harris-Decima are all publishing daily
tracking polls, with announced 95% confidence intervals of 1.6%, 3.1%
and 2.6%, respectively. If sampling error were the only source of
uncertainty, what would a typical deviation between Nanos and
Harris-Decima look like? (I'm choosing those two because they have the
larger announced MOE; what follows will be an upper bound for the
variations we should expect.) The answer is **not** 3.1 + 2.6 = 5.7%.

If we play around a bit with the arithmetic of confidence intervals,
the standard deviations for the Nanos and Harris-Decima estimates are
1.58 and 1.33, respectively. If Nanos and Harris-Decima are operating
independently, the standard deviation for the difference between the
two estimates is (1.58^{2} + 1.33^{2})^{1/2} = 2.06. From this, the half-width of the 95% confidence interval is 1.96*2.06 = 4.05.

If sampling error were the only thing to worry about, the chances of seeing a gap of more than four percentage points is 1 in 20. It also follows that the probability of observing a discrepancy of 5 or more points is 1 in 65, and the odds of seeing a 6-point gap are 274:1 against.

So much for the theory; what are the variations we're actually seeing? First up is the support for the Conservative party; click the image for a full-sized version. In all the graphs that follow, the scale of the vertical axes is the same, so that differences in poll results are the same distance apart.

So far, so good: it is almost always the case that the polling firms differ by 3 percentage points or fewer. Indeed, there have been many days where two or three polling firms come up with the same estimate. It's also interesting to note that of the 26 polls published since September 15, only two are outside the range of 35-38%.

Now the New Democrats; again, click the image for a larger version.

The range of variation is greater than was the case for that of the Conservatives, but at least some of that can be attributed to the fact that NDP support appears to have been growing during this period. Aside from Ekos - which started with high initial support - all the polling firms show a discernible trend upwards.

For the Conservative and NDP numbers, the variations across polling firms is pretty much in line with what we'd expect. But things start to break down when we look at the Liberals; click on the graph for a full-size version:

In contrast to the fairly tight variation we've seen so far, the gaps between the reported estimates for Liberal support are implausibly large: discrepancies of 8 points - in theory, an event that should occur 1 time in 10,000 - are not uncommon. Something is seriously wrong here.

The same problem for the Greens; again, click the image for a larger image:

Discrepancies of 5 or 6 percentage points should be rare (especially for the Greens), but we're seeing them pretty much every day.

It seems pretty clear that Nanos is the outlier here: their numbers for the Liberals are consistently higher, and their estimates for the Greens are consistently lower than what the other firms are reporting. It seems that there is a significant chunk of the population - something around 4% - that is telling Nanos that they'll vote Liberal, but telling everyone else that they will vote Green. Why? And what does this mean for what will happen on October 14?

It's not too hard to come up with a plausible answer to the first question:

Mr. Nanos says the key difference is methodology. Unlike other polling firms, his asks open-ended questions on voter intention. Instead of offering a list of choices -- "Would you vote a) Conservative, b) Liberal ..." -- Nanos phone operators ask an open-ended question that requires respondents to come up with their own answers instead of multiple choice.

"If they don't get the list, you get the cleanest read because they have to articulate their support," Mr. Nanos said. The open-ended question eliminates the importance of the order in which the parties are listed, although most companies vary the the order to mitigate this factor.

Also, the open-ended method tends to put the Greens lower than other parties because, Mr. Nanos believes, respondents are not reminded of the party when they answer. Some will choose the Greens as a none-of-the-above if they hear the party name on a list before answering.

But it's much less easy to figure out what this segment of the population will actually do on election day. Will they indeed vote Liberal? Will they be reminded of the alternatives when they get to the voting booth (the party names are listed on the ballot) and vote Green? Or will they simply not vote?

Interesting analysis. The methodology is clearly a big factor in these numbers.

You may want to have a look a couple of posts on the Green Party website by myself and Jim Harris.

http://www.greenparty.ca/en/node/7289

Thanks,

Jim

Posted by: Jim Johnston | September 24, 2008 at 03:35 PM

I thought the 2.06 should be divided by the square root of two, even if you assume that the populations of the two samples are the same. I'm coming at that from the t-test and could be completely wrong though. Am I?

Posted by: Style | September 30, 2008 at 04:04 PM