The debate had gone on long enough. Sharing open data in science was too important to get hung up on the details. It was time for some of the leading scientists in the open data movement to hash it out and agree on some basic principles.
So, when John Wilbanks, vice president for Science, Creative Commons, was in the United Kingdom to host an event at the British Library on the future of publishing, he went to Cambridge and met with Rufus Pollock, Cameron Neylon, Peter Murray-Rust, at their invitation. The group all knew each other and had been in email contact for years, discussing the concept of open data in science.
They met at the Panton Arms, a pub just down the street from the chemistry department at Cambridge University. It was nearly empty on that hot summer afternoon in July 2009.
From left to right: Jenny Molloy, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust and Carolina Rossini.
(Source: http://pantonprinciples.org/about, with thanks to Cameron Neylon).
For a couple of hours over lunch and drinks, the group went back and forth to come up with basic guidelines for scientists to use when making data open. Neylon pulled out his laptop and started to put it into words. There were different perspectives from the practicing scientists (Neylon, a biochemist, and Murray-Rust, a chemist), and those with a more legal approach (Wilbanks and Pollock, of the Open Knowledge Foundation). Also there chiming in with ideas were Carolina Rossini, Jenny Molloy, Jim Downing, Nico Adams, Joe Townsend, and Jordan Hatcher of the Open Knowledge Foundation and Open Data Commons.
The group aimed to develop clear language to explicitly define how a scientist’s rights to his own data could be structured so others can freely reuse or build on it. The goal was to make the language simple enough that a scientist could easily follow the instructions.
“This licensing stuff is both really tedious and quite esoteric. Yet it has to be precise because it’s legal language,” says Neylon, biochemist at the Rutherford Appleton Laboratory in Didcot, England. “You want it to sound interesting, compelling, and be right.”
It was good to just sit down and hammer out the disagreements, rather than be passive aggressive about it all, says Wilbanks. The group wasn’t really that far apart and it was a matter of iterating to come up with language that was acceptable to all. Also, it was important that whatever the group produced be an independent declaration and not one perceived as solely the viewpoint of the Creative Commons or the Open Knowledge Foundation. In the end, it was “not that everybody loved it, but everybody could live with it,” says Wilbanks.
The authors advocate making data freely available on the Internet for anyone to download, copy, analyze, reprocess, pass them to software or use for any purpose without financial, legal or technical barriers. The group emerged with four recommendations to ensure that scientific data could easily and explicitly be made open. Condensed, the Panton Principles read:
1. When publishing your data, make an explicit and robust statement of your wishes.
2. Use a recognized waiver or license that is appropriate for data.
3. Non-commercial and other restrictive clauses should not be used.
4. Explicit dedication of data underlying science into the public domain via PDDL and CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.
The Panton Principles were publicly launched in February of 2010, and a Web site was established to spread the word at www.pantonprinciples.org. The authors set up a Q&A on the site to explain their philosophy, as well as links to other supporting documents. About 100 individuals and organizations have endorsed the Principles so far, including the Open Knowledge Foundation.
The authors strongly recommend that researchers adopt and act on the principles they collaboratively produced the summer before in Cambridge. Their reason? “Science is based on building on, reusing, and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open,” as the introduction of the Web site notes.
For their simple but revolutionary contribution, the authors of the Panton Principles are honored as SPARC Innovators for June 2010.
“This represents the first time we’re seeing diverse viewpoints crystallize around the pragmatic idea that we have to start somewhere, agree on the basics, and set the tone,” says Heather Joseph, SPARC Executive Director. “These are all leading thinkers in this area – as well as generators and consumers of data. They each approached the idea of open data from different directions, yet all had the same drive to open up science, and ended up on common ground.”
“There is a general belief among scientists that data should be open,” says Murray-Rust, a chemist at the University of Cambridge. “That isn’t true for all scientists and all data. But by its very nature, data is open.” So when he began to see restrictive practices affecting the availability of data – such as publishers requiring authors hand over copyrights, he felt something needed to be done. “I felt this was totally unacceptable and inappropriate,” he says.
In 2006, Murray-Rust started a Wikipedia page on the topic of open data. Over time, he realized the process of making data open was quite complex and a legal approach was necessary. Also, while scientists might agree with the concept of open data, most were not aware that it was an issue that needed to be addressed proactively.
It was important to have a standard against which openness could be judged, says Wilbanks. It’s easy for scientists to say they are open with their data, when it’s more like “fauxpen” – as if pixie dust is sprinkled on it to make it open, though with no legal standing, he says. “The Panton Principles put a stake in the ground,” says Wilbanks. “You have to get the law out of the way to make it public domain.”
The Panton Principles are a declaration of the public nature of the data, says Sayeed Choudhury, associate dean at The Johns Hopkins University Sheridan Libraries. “They codify and formalize what we all believe: Science is public endeavor,” he says. “It’s important that scientists remember that they are doing this on behalf of everyone.”
The Principles were needed because data are fundamentally different from documents, says Choudhury. The Panton Principles try to capture the notion that technology, licensing and preservation strategies for documents don’t work for data and so we need to think about it differently.
For data to be easily shared, you must be explicit about your wishes to make it available without any restriction. Pollock says scientists should be focused on research and not the legal details. The Principles are designed to help make it easy for scientists to share data openly without having to think about it much because the essential framework is already provided.
The Principles are like the “dull legal plumbing” necessary to allow scientists to do the really interesting stuff, says Pollock. “If we don’t get this boring legal stuff right, it’s not automatic,” he says. “It will get in the way and we’ll constantly be spending extra time dealing with the rights. It will slow us down.”
Choudhury says the declaration is a reminder of the value of openness and a reference for scientists. “This is something that can span across different communities, countries, scientific teams – it’s consistent with the idea there are overarching principles,” he says.
Alma Swan, director of Key Perspectives Ltd., a scholarly communications consultant and Ph.D. biologist, welcomed the Principles. “Scientists don’t share their data as well as they might, don’t know how to share and don’t really understand why it’s important to share in specific ways,” says Swan. While scientists might agree with the concept of sharing, it is not the norm and there are worries.
Swan says the principles take into consideration the behavior of scientists – their self-interest and legitimate anxieties about the process – and try to allay those by producing guidelines to share data in an effective way. “They are groundbreaking,” she says.
The benefits of open data, says Murray-Rust, are clear: Opening up data allows others to validate or disprove experiments, leads to new scientific insights and gives individuals who created the science new recognition.
“It's commonplace that we advance by building on the work of colleagues and predecessors – standing on the shoulders of giants,” says Pollock, co-founder of the Open Knowledge Foundation and Mead Fellow in Economics, Emmanuel College, University of Cambridge. “In a digital age, to build on the work of others we need something very concrete: access to the data of others and the freedom to use and reuse it. That's what the Panton Principles are about."
The authors’ vision is that data needs to be used and reused for the maximum benefit, says Neylon, who also serves on the Science and Technology Facility Council. In most cases, scientists are taking public money or money from charity to conduct their research and those entities want to generate meaningful outputs.
“They want to make investments that get the biggest possible return,” says Neylon. “Funders are under pressure to make sure funding outputs are being fully exploited. The public is not impressed with hearing data is not available… people are appalled when data is not available.” Other scientists should be able to use data, and it should be in a form in which people can have confidence that they are able to repurpose it, he says.
The goal is to get support for the Principles from people who write policies and fund projects, so they recommend open data sharing as standard practice, says Neylon.
“It is becoming much clearer in the mind of funders that data matter,” says Murray-Rust. “More are requiring data must be published along with a paper.” Journals are in the position to require open data sharing and the word is spreading, he says.
The vast majority of scientists who are making good data work inside an institution with grant funding, says Wilbanks. “We have to change their economic incentives and their institution hosting structures to the point that compliance with the Panton Principles is part and parcel of being a scientist,” he says.
So, initially, the focus is on changing the minds of funders rather than individual scientists, who may not be aware or care much about open data, says Wilbanks. “We are focused on how to rewire the structure so it makes sense to make the data available,” he says.
Much of the uphill battle for the adoption of the Panton Principles is awareness and education. “It’s just too much effort,” says Wilbanks. “There is no reward for spending time to make data available… It doesn’t make economic sense to put it out there unless I’m required by policy.”
Pollock says the Principles are not just about the licensing, they are also about making that data available. Often material sits on a CD or hard disk "mouldering" away and it's important that data, where it relates to published work, is made available, he says.
After all, scientists put lot of work into their data and by keeping it to themselves they may be able to get more papers out of it and advance their careers. Those working on proprietary projects worry, too, about protected ideas being shared.
“There is a great deal of fear about people stealing others’ data,” says Neylon. “We counter that it’s hard to steal when you’ve put it up and said it’s clearly yours.”
There is also the issue of privacy, particularly when dealing with medical experiments and clinical trials. The authors of the Panton Principles tried to address those concerns and suggest ways of sharing portions of data sets or information without names attached that still protect privacy, yet advance science.
The introduction of the Panton Principles is a first step in a multi-layered process to make data open and useful. Data, as opposed to documents, are more complicated to share. There are issues of formatting and technology that make it more of a challenging to access, says Wilbanks.
With publications, there is a well-established work flow and communities that know their roles about disseminating and validating material, says Choudhury of Johns Hopkins. “The whole ecosystem is pretty well established. It isn’t for data,” he says. “It’s completely new.” When talking about data being made available, it could be a whole data set, a portion or subsets where privacy is an issue. Public access then becomes much deeper and more layered, he says.
Over time, advocates of the Principles believe as scientists see the benefits of sharing, more will do so.
Neylon says he is very positive about the future of open data because scientists will see it gives them a competitive edge. “They want to do the very best science they can and be ahead of the next person,” he says. With easy access to data, scientists can be “faster, better and more effective in their work,” says Neylon.
Swan says the major issue ahead for the Principles authors will be awareness. “There is a lot of work ahead of them,” says Swan. “Coming up with the Principles is not going to cut the mustard by itself. They will need to be advocated and promoted so that scientists are interested in debating them.”
However, Swan says the authors are the “big thinkers in the field” and some of the most energetic people in the world of science. “If anybody will get anything adopted – those are the people,” she says.
Melissa Hagemann, senior program manager at the Open Society Institute in New York, who has endorsed the Panton Principles, says the development of a common standard to recommend for data licensing that will allow data to be easily shared and reused is a critical step for the advancement of science. “Having these highly respected leaders of the community come together, discuss and finally agree to recommend that data should be placed in the public domain is a major achievement,” says Hagemann, adding that OSI is interested in exploring possible engagement with the Open Science Community.
The authors are hopeful.
“We are only seeing the tip of the iceberg,” says Pollock. “Imagine a world in which every article and every dataset can be seamlessly stored, linked and navigated through.” If the authors of the Panton Principles got it right, that’s what their efforts may mean.
By Caralee Adams