From the Nieman Journalism Lab by Michael Andersen
Four crowdsourcing lessons from the Guardian’s (spectacular) expenses-scandal experiment
Okay, question time: Imagine you’re a major national newspaper whose crosstown archrival has somehow obtained two million pages of explosive documents that outed your country’s biggest political scandal of the decade. They’ve had a team of professional journalists on the job for a month, slamming out a string of blockbuster stories as they find them in their huge stack of secrets.
How do you catch up?
If you’re the Guardian of London, you wait for the associated public-records dump, shovel it all on your Web site next to a simple feedback interface and enlist more than 20,000 volunteers to help you find the needles in the haystack.
Your cost for the operation? One full week from a software developer, a few days’ help from others in his department, and £50 to rent temporary servers.
Journalism has been crowdsourced before, but it’s the scale of the Guardian’s project — 170,000 documents reviewed in the first 80 hours, thanks to a visitor participation rate of 56 percent — that’s breathtaking. We wanted the details, so I rang up the developer, Simon Willison, for his tips about deadline-driven software, the future of public records requests, and how a well-placed mugshot can make a blacked-out PDF feel like a detective story.
He actually offered SIX lessons. Here they are in a gist:
1. MAKE IT FUN. Willison lured the readers by making it feel like a game. The Guardian’s four-panel interface — “interesting,” “not interesting,” “interesting but known,” and “investigate this!” made categorization easy. And the progress bar on the project’s front page, immediately giving the community a goal to share. He added the Guardian’s mugshots of each MP
to their pages in the database, which gave a personal element. “You’ve got this big smiling face looking at you while you’re digging through their expenses.”
2. MAKE IT COMPETITIVE. Willison posted lists of the top-performing volunteers. “Any time that you’re trying to get people to give you stuff, to do stuff for you, the most important thing is that people know that what they’re doing is having an effect. It’s kind of a fundamental tenet of social software. … If you’re not giving people the ‘I rock’ vibe, you’re not getting people to stick around.”
3. LAUNCH IMMEDIATELY. Before Parliament released its records Thursday, Willison’s team thought they might be able to postpone their launch to Friday if necessary. When they saw Thursday’s newsbroadcasts, they realized they’d been wrong. The country’s imagination was caught. “It became quickly clear on Thursday that it was a huge story, and if we failed to get it out on Thursday, we’d lose a lot of momentum."
4. USE A FRAMEWORK. Willison’s project was built on Django, the custom Web framework “for perfectionists with deadlines” that he and Adrian Holovaty created for the Lawrence Journal-World. Other frameworks and languages would have worked, too. “You absolutely could build this in Ruby on Rails or in PHP,” Willison said, but “as far as I’m concerned, this is absolutely Django’s sweet spot. This is absolutely what Django is designed to do. Once I had a designer and a client-side engineer working on the project, I could really just hand it over to them and I didn’t have to worry about the front-end code any more.”
5. HAVE SERVERS READY. As well as the Guardian’s first Django joint, this was its first project with EC2, the Amazon contract-hosting service beloved by startups for its low capital costs. Willison’s team knew they would get a huge burst of attention followed by a long, fading tail, so it wouldn’t make sense to prepare the Guardian’s own servers for the task. In any case, there wasn’t time. With EC2, the Guardian could order server time as needed, rapidly scaling it up for the launch date and down again afterward. Thanks to EC2, Willison guessed the Guardian’s full out-of-pocket cost for the whole project will be around £50.
6. SAVE COSTS. Willison used open-source, freely available software that anyone else who might want to imitate them could use. MORE.