Stuff explained: What is open data and why should you care about it?


Open data is, basically,  the idea that certain data should be freely available to everyone to use as they wish, without restrictions from copyright, patents or other mechanisms of control. A piece of data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.

Sir Tim Berners-Lee, the inventor of the World Wide Web talks about open data in this TED talk:


“Opening up data is fundamentally about more efficient use of resources and improving service delivery for citizens.  The effects of that are far reaching: innovation, transparency, accountability, better governance and economic growth.”


The idea would be: if you make your datasets open to the public, more researchers would have the opportunity to play with it and see what gives; potentially expanding knowledge. This can be particularly helpful for those operating on limited resources of their own: a lot of researchers using open data come from the global South/open data is being very successfully used by research informing policymakers in developed countries; see, for example, Ghana’s open data initiative or the Open Data Research network.

Now, Sir Berners-Lee makes another interesting distinction:

I asked everybody, more or less, to put their documents — I said, “Could you put your documents on this web thing?” And you did. Thanks. It’s been a blast, hasn’t it? I mean, it has been quite interesting because we’ve found out that the things that happen with the web really sort of blow us away. They’re much more than we’d originally imagined when we put together the little, initial website that we started off with. Now, I want you to put your data on the web. Turns out that there is still huge unlocked potential. There is still a huge frustration that people have because we haven’t got data on the web as data.

4:03 What do you mean, “data”? What’s the difference — documents, data? Well, documents you read, OK? More or less, you read them, you can follow links from them, and that’s it. Data — you can do all kinds of stuff with a computer. Who was here or has otherwise seen Hans Rosling’s talk? One of the great — yes a lot of people have seen it — one of the great TED Talks. Hans put up this presentation in which he showed, for various different countries, in various different colors — he showed income levels on one axis and he showed infant mortality, and he shot this thing animated through time. So, he’d taken this data and made a presentation which just shattered a lot of myths that people had about the economics in the developing world.

A while ago, I was looking at an Ipsos Mori poll regarding attitudes to immigration by political party support (here). Here’s what I wrote about it:

What we cannot see from the graph (and I’d love to know):

-How are these divisions across class divides? Urban-rural? By educational level? (The original article does mention that “younger voters, graduates and Londoners make up a larger proportion of” those valuing immigration the most; but I’d love to map it for the electorate of each political party (particularly in the case of Labour voters)

Turns out, Ipsos Mori do publish (and I think they’re obligated by law to publish) documents containing their crosstabs but not their actual datasets; for example in Excel or SPSS format. This means I can go and see that young people like immigration more than older people, urban dwellers more than rural dwellers and Lib Dem voters more than Labour voters more than Conservative voters.

What does this tell me? Actually not that much.

Maybe Lib Dem voters like immigration more and this is why they vote Lib Dem in the first place. Or maybe not. Maybe circumstances of living in an urban area (such as being more likely to have immigrant neighbours as opposed to only reading about them in the Daily Mail) make one more likely to look favourably upon migration; and then, for unrelated reasons, people in the countryside are more likely to vote Conservative and people in cities to vote Lib Dem or Labour. Case in which, if we look at Conservative voters only, do those of them who live in urban areas have different attitudes to migration than those who live in rural areas?

Or: Labour voters are somewhere in-between Tory voters and Lib Dem voters in terms of attitudes to migration; but more than one kind of people vote Labour (sorry if this upsets Neal Lawson). Are Labour voters as a group having a more moderate stance in general? Or is it that young, educated, urban Labour voters are about as pro-immigration as Lib Dem voters, while rural, less educated, traditionally working class Labour voters think much less of it, and they “cancel” each other out?

These are all questions that you can answer with the dataset, but not with the document. Hence the usefulness of raw data as open data. Back to what Sir Berners Lee was saying:

“OK, data is brown and boxy and boring, and that’s how we think of it, isn’t it? Because data you can’t naturally use by itself But in fact, data drives a huge amount of what happens in our lives and it happens because somebody takes that data and does something with it.”

For example, these guys examined the impact of planned fire stations closures in London.  And these other guys looked at how government inefficiencies affect cash flow for businesses. Access to open data, therefore, can mean research that would lead to better policies; or at the very list for keeping governments accountable.

For UK open data, pay a visit.


