Software Carpentry

Helping scientists make better software since 1997

Archive for the ‘Opinion’ Category

Teaching Open Source

Over at, Red Hat’s Greg DeKoenigsberg has a post about a new collaboratively-authored textbook on open source software aimed squarely at undergrad courses. As Máirín Duffy points out in the first comment, it’s very code-centric, but in my experience, that’s the right approach: students won’t be ready for discussion of design until they’re proficient in coding [1]. I’m looking forward to borrowing lots from the book for Software Carpentry

[1] This is, by the way, why I believe that attempts to teach “computational thinking” without first teaching programming are doomed to fail, but that’s a rant for another time.


Written by Greg Wilson

2010/04/12 at 16:25

Posted in Noticed, Opinion

Measuring Science

Julia Lane, the director of the Science of Science & Innovation Policy program at the National Science Foundation, wrote an article for Nature a couple of weeks ago titled “Let’s make science metrics more scientific”. As the summary at the start says:

  • Existing metrics have known flaws
  • A reliable, open, joined-up data infrastructure is needed
  • Data should be collected on the full range of scientists’ work
  • Social scientists and economists should be involved

The same points could be made about evaluating software developers (or any other kind of knowledge worker). The devil, as always, is in the details, and unfortunately I have to start doing evaluations before those details are worked out. Several of the sponsors for this course need me to demonstrate its impact on the productivity of the scientists who take it (so that they can in turn justify their contribution to their funders). It isn’t enough to ask students who have completed the course whether they think they know more about programming than they used to: ignoring the obvious problems of survivor bias and self-assessment, I would still have to demonstrate that making people better programmers also makes them better scientists. I believe it does, but belief is not evidence, and doesn’t convey scale.

The best plan I’ve been able to come up with so far is to look at how scientists spend their time before and after taking the course, but that would require resources I don’t have.  If you’re interested in studying scientists or software developers empirically, and would like some raw material, I’d like to hear from you.

Written by Greg Wilson

2010/04/11 at 13:40

Posted in Opinion, Research

Simon Singh Wins (and So Does Science)

Simon Singh, the science journalist who was sued for libel by the British Chiropractic Association, has won the right to rely on the defense of “fair comment”. (Full ruling linked from this Index on Censorship post.) Singh had pointed out that there’s no evidence to back up BCA claims that their particular brand of pseudoscience could help with asthma and other ailments; it has taken him two years and £200,000 later to get this far, and it may be another two years before the matter is finally settled, but this is an important victory for everyone who believes in rational inquiry.

Written by Greg Wilson

2010/04/01 at 10:31

Posted in Opinion

How Much Of This Should Scientists Understand?

Let’s start with the problem description:

All of the Software Carpentry course material (including lecture notes, code samples, data files, and images) is stored in a Subversion repository. That’s currently hosted at the University of Toronto, but I’d like to move it to the domain (along with this blog). However, is hosted with, who only provide one shell account per domain for cheap accounts like the one I bought.

Why is this a problem? Because when someone wants to commit to the repository, they have to authenticate themselves. I could let everyone who’s writing material for the course share a single user ID and password, but that would be an administration nightmare (as well as a security risk). Site5 does have a workaround based on public/private keys, but it’s fairly complicated—i.e., it could break in lots of hard-to-diagnose ways. Another option would be to use the mod_dav_svn plugin for Apache, but Site5 doesn’t support per-domain Apache modules either. does, so I may be switching hosts in a few weeks.

So: how much of this should the average research scientist be expected to understand? If the answer is “none”, then how are they supposed to make sensible decisions about moving their work online? If the answer is “all”, where does the time come from? (It takes me 30 seconds to read the two paragraphs above; it would take many hours of instruction to teach people enough to do the analysis themselves.)  And if the answer is “some”, then which parts? To what depth? And who takes care of the rest on scientists’ behalf?

Written by Greg Wilson

2010/03/11 at 19:45

Posted in Content, Opinion, Version 4

It Seems That Everyone Cares

Ars Technica isn’t primarily a science site, but even they are now worried about reproducibility in computational science.  I think it no longer matters how important this “crisis” actually is—sooner or later, major funding agencies are going to mandate adoption of something like the Open Provenance Model. Problem is, given the current skill set of the average scientist, that will almost certainly translate into burden without benefit.

Written by Greg Wilson

2010/01/24 at 14:44

Posted in Noticed, Opinion

Big Science == Big Skills Gap

Over on Nature News, Eric Hand’s article “‘Big science’ spurs collaborative trend” is subtitled, “Complicated projects mean that science is becoming more globalized.” It talks about the benefits of international collaboration, but what it doesn’t say is that sharing ideas, results, procedures, and software requires skills that aren’t part of the standard curriculum. One of the main goals of the rewrite of Software Carpentry is to teach scientists some of what they need to know in order to do what Hand describes without heroic effort. I’d be grateful for suggestions about topics and tools that ought to be on the list, but aren’t.

Written by Greg Wilson

2010/01/20 at 18:05

Posted in Noticed, Opinion

Whatcha Gonna Do When They Come For You?

First it was pharma companies withholding “unhelpful” data, then it was ClimateGate, and now there’s this:

One of the founders of the controversial ‘Baby Einstein’ range of products is taking the University of Washington to court in an attempt to force the institution’s scientists to release their raw data to him…William Clark…wants records relating to two studies published in 2004 and 2007. The latter found an “association between early viewing of baby DVDs/videos and poor language development” while the former suggested “efforts to limit television viewing in early childhood may be warranted”.

If someone challenged your results, could you reassemble the programs and data you’d used to produce them? And what would happen if you couldn’t? Software Carpentry isn’t just about making scientists more productive; the skills that will help them do more, faster, will also make their work more traceable and reproducible.

Written by Greg Wilson

2010/01/13 at 20:01

Posted in Noticed, Opinion