Archive for April 2010
This blog is moving to http://software-carpentry.org/blog/ — please remove this record from your blogroll, and add that one in its place.
Our apologies for the flood of re-posts that some of you may have seen over the weekend: apparently, adding a category to a post, or changing its existing category, makes some blog readers believe the whole post is new. We’re sorry for any confusion or inconvenience the clutter may have caused.
A scientist I recently met in Toronto had a problem: how to share large files with colleagues. Each file is a couple of hundred megabytes; dozens are produced each week, but each is only interesting for a couple of months; and there are confidentiality issues, so some kind of password protection is needed. Conventional file-sharing services like Dropbox aren’t designed for data that size, so in the end she bought a domain and set up secure FTP.
But now there’s this:
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net.
It’s a neat idea, and will become neater once scientists routinely put DOIs on data as well as papers. I’d be very interested in a usability study to see how easy or hard it is for the average grad student in botany to get this plugged in and turned on.
We’re very pleased to announce that Scimatic Software, a Toronto based company that specializes in the development of software for the scientific community, has come on board as a sponsor of this project. Many thanks to Jamie McQuay and Jim Graham!
Like many programmers, I’ve learned most of what I know by poking around and breaking things. Quite naturally, that has led me to believe that this is the best way to learn—after all, if it worked for me, it has to be pretty good, right? But research says otherwise. Kirschner, Sweller, and Clark’s paper, “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching“, was published in Educational Psychologist in 2006, but the whole text is available online.
Over at opensource.com, Red Hat’s Greg DeKoenigsberg has a post about a new collaboratively-authored textbook on open source software aimed squarely at undergrad courses. As Máirín Duffy points out in the first comment, it’s very code-centric, but in my experience, that’s the right approach: students won’t be ready for discussion of design until they’re proficient in coding . I’m looking forward to borrowing lots from the book for Software Carpentry…
 This is, by the way, why I believe that attempts to teach “computational thinking” without first teaching programming are doomed to fail, but that’s a rant for another time.
Julia Lane, the director of the Science of Science & Innovation Policy program at the National Science Foundation, wrote an article for Nature a couple of weeks ago titled “Let’s make science metrics more scientific”. As the summary at the start says:
- Existing metrics have known flaws
- A reliable, open, joined-up data infrastructure is needed
- Data should be collected on the full range of scientists’ work
- Social scientists and economists should be involved
The same points could be made about evaluating software developers (or any other kind of knowledge worker). The devil, as always, is in the details, and unfortunately I have to start doing evaluations before those details are worked out. Several of the sponsors for this course need me to demonstrate its impact on the productivity of the scientists who take it (so that they can in turn justify their contribution to their funders). It isn’t enough to ask students who have completed the course whether they think they know more about programming than they used to: ignoring the obvious problems of survivor bias and self-assessment, I would still have to demonstrate that making people better programmers also makes them better scientists. I believe it does, but belief is not evidence, and doesn’t convey scale.
The best plan I’ve been able to come up with so far is to look at how scientists spend their time before and after taking the course, but that would require resources I don’t have. If you’re interested in studying scientists or software developers empirically, and would like some raw material, I’d like to hear from you.
After Thursday’s post-mortem on the latest offering of Software Carpentry at the Universitiy of Toronto, I had a chance to talk further with Jon Pipitone, who was one of the tutors (and who is just wrapping up an M.Sc. looking at code quality in climate models). We got onto the topic of infrastructure for Version 4, which needs to be settled quickly.
Hans-Martin von Gaudecker is planning to teach a Software Carpentry-style course for economists at Universität Mannheim this autumn — as his announcement says, “I think it is amazing that a profession obsessed with efficiency affords a very obvious inefficiency: Most researchers nowadays spend a fair share of their time programming, but hardly anyone has been taught to do that well.” I’ll post updates here as he sends them.
Thanks to the initiative of Dominique Vuvan (who took Software Carpentry last summer), we ran a semi-formal version of the course from last November through to this past week for grad students in Psychology, Linguistics, and a few other disciplines at the University of Toronto. Weekly tutorials were offered in both Python and MATLAB by graduate teaching assistants from Computer Science, covering roughly half of the existing material.