Software Carpentry

Helping scientists make better software since 1997

It Seems That Everyone Cares

Ars Technica isn’t primarily a science site, but even they are now worried about reproducibility in computational science.  I think it no longer matters how important this “crisis” actually is—sooner or later, major funding agencies are going to mandate adoption of something like the Open Provenance Model. Problem is, given the current skill set of the average scientist, that will almost certainly translate into burden without benefit.

Advertisements

Written by Greg Wilson

2010/01/24 at 14:44

Posted in Noticed, Opinion

7 Responses

Subscribe to comments with RSS.

  1. Reproducibility has to mean something more than different groups running the same code with the same inputs and getting the same answers. Sure, that prevents people from flat out lying about what the code produced, but I don’t see that as one of the big problems of computational science today.

    From a science policy point of view, how much code sharing is optimal? Don’t we want people implementing different schemes to work on each problem, as cross-checks? But how much of that adds to the science, and how much is reinventing the wheel?

    To what extent does code sharing lead to more code development, and to what extent does it inhibit it? There are several freely available codes in my discipline, and the vast majority of users of the code don’t rigorously test the code, much less understand any significant part of it. As a result I’ve seen subtle bugs persist in well-tested code for ages, contaminating results in a way that would have been caught faster with more code comparisons. For something as specialized as a numerical science code, bugs don’t become shallower at the same rate as you increase the number of eyes.

    Jonathan Dursi

    2010/01/24 at 16:22

  2. I know the above is somewhat inchoate, but I think my question is — is the real problem here lack of software sharing, or that developing scientific software remains too difficult / too specialized a skill?

    Jonathan Dursi

    2010/01/24 at 16:24

  3. @JD:
    In my view, the argument put forth by Ars and constantly made by Greg is that computational science is as much an integral part of science as experimentation, since, if for nothing else, rare are the modern experimental results that do not require heavy processing to be useful.

    But the dichotomy is that whereas experimental setups must be described in detail for a publication to be considered reproducible, it is accepted that numerical methods are barely mentioned, implying that the later is not relevant for reproducibility.

    The argument that less computational details published provides an incentive to innovation in this area can be expanded to conclude that the less details one published about the experimental setup used to get a result, the more likely other researchers are of finding innovative ways of accomplishing it. Whereas it’s true, it is completely against the scientific method.

    I believe that opening up computational methods is not about preventing people from lying or allowing researchers to take the easy road and just use others’ software. I am not a computer scientist, but in attending CS courses, instructors would often stress that students were not there to learn how to program, which is often self taught by 12 year olds, but to learn how to effectively function in large scale projects that involve collaboration with many other developers. I imagine that there must have been a turning point in CS in which it became clear that having a bunch of programmers using ad hoc approaches would not allow them to build upon each others work in the same way scientists build upon previous findings. I would imagine that scientists have now the mindset of the first programmers and the same epiphany of the latter is about to strike them. I also imagine that, being scientists even more stubborn then programmers, the transition to a collaborative approach will take longer.

    JN

    2010/01/24 at 19:02

    • “it is accepted that numerical methods are barely mentioned,”

      It may be the case that there are disciplines where that is the prevailing culture, and if that’s the case it needs to be addressed. (And if so, that’s the problem, not lack of code sharing). But in any of the fields I’ve done anything in, there are always extremely detailed discussions of numerical methods, often with entire papers developed soley to the computational techniques.

      Can you give me an example of such a field where numerical methods are barely mentioned?

      Note that I think having code publically available is likely the way to go, but it’s not panacea; it carries real costs along with benifits. The real problem with reproducibility, IMHO, is that is overly difficult to write scientific code.

      Jonathan Dursi

      2010/01/25 at 12:41

      • “IMHO, [it] is overly difficult to write scientific code.”

        I agree. And there is no solution to this problem. That’s because computer programming in general, and scientific simulation software in particular, is the most complicated thing humans being do. Why? Because if a programmer IS ALLOWED or CAN make a program more complicated, the programmer WILL FIND SOME EXCUSE (bells-and-whistles, performance, etc.) to make the program more complicated. The LOWER limit to software complexity is often the programmer’s inability to make it more complex.

        Asking a programmer to write simpler code is like asking a race car driver to drive slower. She may even nod her head in agreement at your reasoning, but watch what she actually does anyway.

        George Crews

        2010/01/25 at 19:59

  4. IMHO, the issue of reproducibility in simulation based science is not one of honesty, sharing, or skill sets. It is one of error management.

    The scientific method is an iterative, self-correcting process for dealing with defects in our understanding of Nature. That this process would therefore be applicable to dealing with the bugs in our physical simulation codes would seem obvious.

    An essential component of the scientific method is reproducibility. Hence, it is an essential component of computational science.

    I understand it this way. It is very difficult to impossible to completely eliminate a person’s ideology from their view of reality. (The scientific method is the best tool known to deal with this issue.) It is very difficult to impossible to completely eliminate the bugs from complex simulation codes. But reproducibility will greatly help manage the bugs.

    George Crews

    2010/01/25 at 12:38

  5. Reproducibility and transparency are heated topics in climate science at the moment, as you’re doubtless aware. In the Clear Climate Code project we’re trying to help, initially by recoding the GISTEMP code base (from NASA GISS), from a hotchpotch of Fortran, C, Python, and ksh – with typical levels of clarity for science code – into simple clear Python. See my post today for some more, including a nod to Software Carpentry. Please drop by, either to comment or to code.

    Nick Barnes

    2010/01/25 at 18:29


Comments are closed.

%d bloggers like this: