So, after writing about David Cameron’s shoddy use of statistics in the Telegraph and now the margin of error in one YouGov poll that might just have changed the fate of the referendum and what happens after, I found myself thinking a lot about how official statistics and opinion polls are being used and reported in the media, and what pitfalls lie there.
Today, for instance, i want to talk about the “voodoo poll”, so called because it’s about as scientific as voodoo (and presumably because for the serious researcher seeing it reported as a serious poll in the media feels like a stab in the heart from a distance).
A “voodoo poll”, or open access poll, is one where a non-probability sample of participants self-select into participation.
In human language: sampling is the use of a subset of the population to represent the whole population. In probability sampling (random sampling), we have ways of calculating the probability of getting any particular sample, and therefore we can rigorously infer from the sample to the general population.In non-probability sampling, we do not; and therefore we need to use them with care.
Granted, there are reasons why you need to use non-probability samples sometimes: some populations are easier to sample randomly than others (a random sample of Londoners is pretty straightforward; a random sample of Romanian immigrants in the UK, or of people who regularly buy ice cream not so much), the fieldwork is much cheaper and less time-consuming (pay someone to get you 20 completed surveys in a day, standing with a clipboard on a street corner, or pay them to knock random doors in the neighbourhood for a day and bring you 5 completed surveys), you’re more concerned with understanding a complex phenomenon than with very accurate representativeness; but at the very least you can use quotas of gender, age, ethnicity etc. to ensure you have a diverse sample; and be very careful how you generalise it. For example, if you’re doing an opinion poll in a park- using non-random quota sampling- first of all you exclude from your sample people who don’t go to the park, therefore you can infer things about “park-goers” rather than “residents of the neighbourhood”, and even then your sample may be biased- because you’re more likely to interview people who sit on benches than people who jog; and these groups may differ from each other in ways you don’t realise.
While you do , therefore, have to take any generalisations from polls using non-random samples with a grain of salt, not all of them are voodoo polls; rather, voodoo polls represent a particularly egregious example: where instead of respondents being selected by a researcher (by whatever method), they need to opt-in (for example by phoning a number, clicking a voting option on a website or mailing a coupon from a newspaper); and everyone is free to opt-in; but the opt-in “button” is conveniently placed somewhere where it’s much more likely to be seen by certain types of people than by others.
See the problem with this?
98% of whom? Of UK residents? Of British citizens? Of twitter trolls? EDL supporters? People who say “I’m not racist, but…”? 98% of embarrassing older relatives at the family dinner table? Now that’s a generalisation you need to take with more than just a grain of salt…
In order to participate to Daily Express phone-in polls, you need to first buy the Daily Express to find out about them, and then to call a premium-rate poll number (so give more money to the Express) in order to register your vote.
The results look exactly as you’d expect (source for all images here)
Right… So how can we generalise these findings?
I’ll give it a try:
(altered image based on this)
This, boys and girls and variations thereof, is a voodoo poll. And this is why we do not report it seriously.
(The leading questions merit a separate post in themselves; update: see here).