Software Carpentry

Helping scientists make better software since 1997

Links for Summer Interns

Our summer interns started today; our first job is to define exactly what they’ll be working on this summer, so it seems like a good time to round up a few links on interesting topics. My apologies for those hidden behind paywalls…

Steve’s Project Ideas

  1. Social network analysis for scientists
  2. Electronic lab notebooks

Reproducible Research

If I said, “I just got a really interesting result in the lab, but I didn’t record the steps I took or the settings on the machine,” no reputable journal would publish my paper. If I said, “I just got a really interesting computational result,” most reviewers and editors wouldn’t even ask if I’d archived my code and the parameters I used, or whether that code would run on someone else’s machine. Reproducible research (RR) is the idea of making computational science as trustworthy as experimental science by creating tools and working practices that will allow scientists to re-create past results.

  1. WaveLab and Reproducible Research
  2. The Madagascar project
  3. The Sweave project
  4. Special issue of Computing in Science & Engineering on reproducibility

Data Provenance

The “provenance” of an object is the history of where it came from, and how it got here. The provenance of a piece of data is similar: what raw values is it derived from, and what processing was done to create it? Ideally, every piece of scientific software should track this automatically; in practice, very few do, and most scientists don’t take advantage of the capability when it’s there. That’s changing, though, particularly as emphasis on reproducibility grows.

  1. The Provenance Challenge: a series of competitions to benchmark provenance tools against one another.
  2. Special issue of Concurrency and Computation: Practice & Experience reporting the results of the first challenge

Science 2.0

Also called “computer-supported collaborative science”, this is the idea of leveraging modern web-based collaboration tools to better connect scientists, their experiments, and their results. It encompasses a broad range of ideas, but “social networking for scientists” based on their interests is near the core, as is “open science” (the idea of making scientific results public in the same way as open source software or Creative Commons publications).

  1. Overview article in Scientific American
  2. Jon Udell’s Internet Groupware for Scientific Collaboration may be several years old, but it’s still prescient
  3. Jean Claude Bradley’s blog
  4. Cameron Neylon’s personal blog (see for example his post on “FriendFeed for Scientists“) and lab blog

Scientific Programming Environments

Compared to professional software developers, most scientists use fairly primitive programming environments, in part because they’ve been too busy learning quantum chemistry to learn distributed version control, and in part because software developers seem to go out of their way to make tools hard to set up and learn. Lots of people have tackled this from a variety of angles. Unfortunately, a lot of work to date has focused on supercomputing, which is sort of like studying modern medicine by focusing on heart surgeons…

  1. Greg Wilson’s “Where’s the Real Bottleneck in Scientific Computing?” and “Software Carpentry
  2. Carver, Kendall, Squires, and Post’s “Software Development Environments for Scientific and Engineering Software: A Series of Case Studies
  3. Matthews, Wilson, and Easterbrook’s “Configuration Management for Large-Scale Scientific Computing at the UK Met Office” is an example of tools done right

Written by Greg Wilson

2009/05/11 at 15:51

Posted in Content, Version 3

3 Responses

Subscribe to comments with RSS.

  1. […] on the Software Carpentry blog, I’ve posted some links for our summer interns that might be of general interest, mostly to do with Science 2.0, reproducible research, and the […]

  2. Thanks for the mention. Since this post is about software and data provenance this page might be of interest. It covers the web service written by Andrew Lang that we call from within Google Spreadsheets to calculate solubility from spectra. Both the experimental log and the details of the calculation are required to understand where the final numbers come from.

    Jean-Claude Bradley

    2009/05/12 at 12:15

  3. […] Summer projects: I posted yesterday on social network tools for computational scientists. Greg has posted a whole list of additional suggestions. […]


Comments are closed.