CKAN and Building the Debian of Data
Chaos Computer Congress
December 28th 2009
Rufus Pollock and Daniel Dietrich
[open knowledge foundation]
[http://www.okfn.org/]
About the Foundation
Founded 2004 / Not-for-profit
Promoting, Creating Disseminating Open Knowledge
Genes to Geodata, Stats to Sonnets ...
We're A Community
Organized around a set of projects and working groups
Principles: open knowledge, meritocracy and tolerance
What's Open Mean?
http://www.opendefinition.org/
Open = Freedom to Access / Use / Re-use / Redistribute
So How's This Relevant?
Openness isn't an End-in-Itself!
1. Why: What We Want
(Really, Really Want)
To Create and Use Information
Whether I'm a citizen deciding how to vote
or
A researcher working on global warming
or ...
Sure, but Specifically By
Having Lots of Data
AND
Plugging It Together
Getting Data Often Ain't Easy!
US Unemployment
But the Original Data Ain't So Nice
So We Clean It ...
http://knowledgeforge.net/econ/svn/trunk/data/bls/usa_bls_employment/data.py
So I've now created/parsed a whole bunch of data
Which I can Happily Use
US Unemployment Figures: 1940-2006
http://www.openeconomics.net/plot/chart/usa_bls_employment
OK, that's great, but:
How do we SCALE
Want to link this with lots of other data (interest rates etc) and I'm only me
2. Building (Large-Scale) Data Infrastructures
(The Real Vision of Cyberspace)
Larger than Any Single Individual (or Corporation)
How Do We Build Complex Things?
Lots of Labour
How Do We Build Complex Things II?
But as we get bigger too much for one mind ...
Divide and Conquer: Componentization
Break data down into chunks (packages) that can be individually managed
Need to Put Humpty-Dumpty Back Together Again
Componentization isn't just atomization, it's also about 'packaging' ...
Two Different Models
One Ring to Rule Them All
- One centralized system
- A single set of APIs/formats
- Probably closed
- Most data currently like this
NO!
The Revolution Will be Decentralized
Remember: The Many Minds Principle
The Best Thing To Do With Your Data Will Be Thought of By Someone Else
Small Pieces, Loosely Joined
Production Should Be Decentralized and Federated
Sharing and Separation are Key
Requires OPENNESS!
3. Making It Happen for Data
Consider the Miracle of 'apt'
2 Related but Distinct Aspects
APIs + Distribution
Ignore (Knowledge) APIs Here (Hard!)
- Domain Specific (Geodata ain't Genomics)
- Require Coordination
- Hard to Plan in Advance, Progress By Experimentation
Distribution
- Package: Wrap the material up (+ basic metadata)
- In form suited for automatable upload/download
- Register so it can be found ...
The Vision
The Registry: http://www.ckan.net/
Freshmeat/CPAN/Gems ... for Open Data
Anyone can add material (760 pkgs + counting)
A CKAN Package
CKAN Helping Power data.gov.uk
apt-get: datapkg
http://www.okfn.org/datapkg/
A Data Packaging Swiss Army Knife
Getting and Using
# search for a package CKAN.net::
$ datapkg search ckan:// windhover
...
datapkgdemo -- ...
...
# Get info
$ datapkg info ckan://datapkgdemo
# Install (download + unpack atm) to the current directory:
$ datapkg install ckan://datapkgdemo .
Creating and Registering
# Have some existing data
cd my_data_directory
# Make a metadata (metadata.txt) - name/value pairs (like Debian,R etc)
$ vim metadata.txt
# register on CKAN
$ datapkg register . ckan://
# Check it has registered ok::
$ datapkg info ckan://mynewdatapkg
i18n + decentralization: http://de.ckan.net/
http://wiki.okfn.org/de/
4. Conclusion
The Start of the 'Debian of Data'
'Data' package managers wanted ...
Data and Code are Becoming One
Hack Code, Hack Data
Thank-You
Rufus Pollock and Daniel Dietrich
rufus.pollock / daniel.dietrich @okfn.org
http://www.okfn.org/
http://www.ckan.net/
http://www.okfn.org/datapkg/
Credits
Images
giza_pyramid.jpg: http://www.flickr.com/photos/cornelluniversitylibrary/3672461369/
cyberspace.jpg: thevirtualism.com/history_project/virtual.html
worker_on_empire_state.jpg: PD (Lewis Hine for US Federal Government)
tolkein_ring.jpg: http://en.wikipedia.org/wiki/File:Ringinscription.jpg
lego.jpg: http://www.flickr.com/photos/oskay/265899811/
humpty_dumpty.jpg: http://www.flickr.com/photos/aussiegall/298669543/