Software Carpentry

Helping scientists make better software since 1997

Measuring Science

Julia Lane, the director of the Science of Science & Innovation Policy program at the National Science Foundation, wrote an article for Nature a couple of weeks ago titled “Let’s make science metrics more scientific”. As the summary at the start says:

  • Existing metrics have known flaws
  • A reliable, open, joined-up data infrastructure is needed
  • Data should be collected on the full range of scientists’ work
  • Social scientists and economists should be involved

The same points could be made about evaluating software developers (or any other kind of knowledge worker). The devil, as always, is in the details, and unfortunately I have to start doing evaluations before those details are worked out. Several of the sponsors for this course need me to demonstrate its impact on the productivity of the scientists who take it (so that they can in turn justify their contribution to their funders). It isn’t enough to ask students who have completed the course whether they think they know more about programming than they used to: ignoring the obvious problems of survivor bias and self-assessment, I would still have to demonstrate that making people better programmers also makes them better scientists. I believe it does, but belief is not evidence, and doesn’t convey scale.

The best plan I’ve been able to come up with so far is to look at how scientists spend their time before and after taking the course, but that would require resources I don’t have.  If you’re interested in studying scientists or software developers empirically, and would like some raw material, I’d like to hear from you.


Written by Greg Wilson

2010/04/11 at 13:40

Posted in Opinion, Research

2 Responses

Subscribe to comments with RSS.

  1. # … economists should be involved

    As always, the problem with evaluation is not knowing the counterfactual. But it is even more pressing in this case since you (1) catch most people early in their career, (2) there is a huge self-selection effect into the program (probably even bigger than survivor bias) (3) it has a potentially big impact, according to user experiences (

    Why more pressing? (1) rules out to simply compare before and after within people. (2) rules out to compare changes within the “treatment” group to those within a control group because you won’t be able to argue the changes would have been comparable if the “treatment” group had not taken the course. (3) makes the choice of metric extremely difficult because there will be all kinds of effects. For example, I would not think that people actually reduce their time spent programming after the course. Time spent on a specific task most certainly goes down. But the tasks they tackle change — see your post linked above.

    On the constructive side, I think the first step is to define more precisely what you mean by “better scientist”. For example, you could play the reproducibility chord. Self-assessments of counterfactuals might be credible there (How reproducible are your results now? How reprodudible would they be had you not taken the course?).


    2010/04/13 at 08:48

  2. Here’s an idea for trying to capture changes in their thinking.

    For pre-test/post-test, ask them to brainstorm say, 10, research ideas on a topic of interest to them.

    At the end of the class, show all 20 of the ideas to one or more experts (e.g., the student’s advisor, other profs in the same domain), and ask them to rate the quality of the ideas. Then test if there’s a statistically significant difference in quality in the post-test versus pre-test ideas.

    (History effects are a significant threat to validity here, so it would help to have a control group).

    Lorin Hochstein

    2010/04/13 at 21:25

Comments are closed.

%d bloggers like this: