Digging into Open Data
Data Insights, 2012-02-02
Rufus Pollock
@rufuspollock[.org] – @okfn[.org]
Licensed under cc-by v3.0 (any jurisdiction)
The Open Knowledge Foundation
Community-based not-for-profit founded in 2004
The Foundation now has projects and partnerships throughout the world and is especially active in Europe
We build tools and communities to create, use and share open knowledge - content and data that everyone can use, share and build on.
Making it easy to get, use and share data.
Mapping (public) money worldwide.
Sumer, Mesopotamia, 5000 years ago
The UK Census (1801)
The Hollerith Tabulator (US 1890 Census)
An IBM (1960s)
Today We Find Ourselves in the Midst of a Revolution
Driven by
Info Complexity
Info Tech
We are Innovating
Opening Up
Government is Opening up Data
Open Gov Initiatives Around the World. 2.5y ago ~ 0. Now UK, US, Finland, Kenya, Netherlands, ...
Open Data: What?
"A piece of content or data is open if anyone is free to use, reuse and redistribute it - subject only, at most, to a requirement to attribute or share-alike."
Anyone means Anyone!
(So no restrictions on commercial use etc)
What Data?
Transport, Geodata, Statistics, Electoral-Legal ...
Key point: Non-Personal Data!
(E.g. Train times, station locations, spending breakdowns, national laws ...)
Open Data: Why?
A Story
(About Medicine Gone Wrong)
Better Understanding
Better Governance
Better Research
Better Economy
The Challenge
the Opportunity
Challenge: Exploding Info Complexity
In 1820s all UK bank clearing done in a single room in London once a day. Today, millions of transactions a minute.
=> componentization to divide and conquer complexity
Opportunity: Info Technology
Today a smart phone has much computing power as the system for the Apollo moon landings. 1TB of storage is around $100 -- in 1994 this would have cost ~ $400,000.
=> Mass participation in information access, processing and production. Decentralization.
Openness and Scaling
(Closed Data Doesn't Scale!)
We're Weaving Data Together |
To Scale We Need to Componentize |
But We Need to Put Humpty-Dumpty Together Again - Not Possible if Closed |
Information is Special: Non-Rivalrous
Very cheaply copied ~ zero cost
Giving me a 'copy' of your car is a problem, giving me a copy of your data isn't
In Products and Services
The Best Thing to do With (Your) Data will be Thought of by Someone Else
Fixing is Faster with Open Data
(And You Don't Repeat Yourself)
To many eyes all bugs are shallow
Est 6% of all bus-stops in NAPTAN wrongly located
Building the Open Data Ecosystem
How Do We Scale?
(In the 'Open' Community)
Sharing, Reworking, Improving, Learning
Small (and Medium) Data
Rather than "Big Data"
'Data Management Systems' (CMS) Like CKAN and the DataHub
ETL Tools like Scraperwiki and OpenSpending
Mixins for Data
Open Data is Here
And Will Just Get Bigger
Data is a Platform – You Build on It Rather than Sell It
lego.jpg: http://www.flickr.com/photos/oskay/265899811/
humpty_dumpty.jpg: http://www.flickr.com/photos/aussiegall/298669543/
woven_ball.jpg: http://www.flickr.com/photos/exfordy/387876530/sizes/s/
sim_city_airport.jpg: http://www.flickr.com/photos/42183741@N05/3890767674/in/photostream/
thomas_jefferson.jpg: http://en.wikipedia.org/wiki/File:Rembrandt_Peale-Thomas_Jefferson.jpg
stick_figure_male.png: http://www.openclipart.org/detail/15036
stick_figure_female.png: http://www.openclipart.org/detail/15040
city_icon.png: http://www.openclipart.org/detail/20145
city_icon.png: http://www.openclipart.org/detail/20145
sumer-ur-pot.jpg: http://www.flickr.com/photos/seriykotik/122518602/
hollerith-tabulator.gif: http://www.columbia.edu/acis/history/census-tabulator.html (orig from IBM)
ibm-machine-city-hall.jpg: http://www.flickr.com/photos/library_of_virginia/2898506631/
census-1801-example.jpg: http://privatewww.essex.ac.uk/~alan/family/images/census/1801_example.jpg
itoworld-openstreetmap-2009.jpg: http://www.flickr.com/photos/peterito/3054501076/