Archive for February 2010
Science is based on building on, reusing and openly criticising the published body of scientific knowledge.
For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.
By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.
Formally, we recommend adopting and acting on the following principles:
- Where data or collections of data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual data elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.When publishing data make an explicit and robust statement of your wishes.
- Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.Use a recognized waiver or license that is appropriate for data.
- The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
- Furthermore, in science it is STRONGLY recommended that data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of much scientific research and the general ethos of sharing and re-use within the scientific community.
Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.
As of this morning, I have signed commitments for 4/5 of the money I need to spend a year working full-time on updating the Software Carpentry course. If you, or someone you know, would like to help me help scientists be more productive, please get in touch: I have only 64 days in which to find the $30K I still need.
Congratulations to Titus Brown and others on the NSF’s announcement that it will fund the BEACON (Bio/computational Evolution in Action Consortium) Science and Technology Center. BEACON “…BEACON is focused on studying the evolution of organization across multiple scales—from genomic and cellular, to multicellular, to inter-multicellular (a.k.a. social)—using techniques from experimental evolution, modeling, and digital life systems.” Long story short, this means that Michigan State University and its partner institutions “…has money explicitly for supporting students doing really sexy interdisciplinary work combining computation and biology.”