Software Carpentry

Helping scientists make better software since 1997

Archive for the ‘Versions’ Category

Instructional Design

As well as deciding on the format of the course, I have to re-shape its content. In contrast to e-learning, there seems to be a lot of solid material available on instructional design. The most useful guide I’ve found so far is Wiggins & McTighe’s Understanding by Design. I was initially a bit put off by the micro-industry the authors have built around the book, but its step-by-step approach immediately felt right:

  1. What are students supposed to understand at the end of the lesson?
  2. How is that going to be determined, i.e., what questions will they be answer that they couldn’t answer before, or what will they be able to do that they couldn’t do before?
  3. What lessons and activities are going to help them acquire that knowledge and those skills?

The whole thing is a lot more detailed than that, but you get the gist. And note that the last point says “help them acquire”, not “teach them”: while the latter focuses on what the instructor says, the former focuses on helping students construct understanding, which is both more accurate and a better fit for the level of students this course targets.

I’ve already used their ideas in reshaping the course outline. If the right way to deliver the course turns out to be 200 vignettes rather than 25 lectures, I will need to do some chopping and rearranging, but I think that what I have is a good starting point. Once I know what format I’m going to choose, I will rework the outline in accordance with the three-step approach summarized above and ask for feedback.


Written by Greg Wilson

2010/03/26 at 20:37

Posted in Tooling, Version 4

Online Delivery

As the announcement of Version 4 said, Software Carpentry is being redesigned so that it can be delivered in several ways. I want to support:

  1. traditional classroom lectures, with someone at the front of the room talking over a series of slides and/or coding sessions to a captive audience;
  2. students reading/viewing material on their own time, at their own pace, when and as they need it; and
  3. hybrid models, in which students work through as much as they can on their own, then get help (face-to-face or over the web) when they hit roadblocks.

#1 isn’t easy to do well, but the challenges involved are well understood. #2 and #3 are going to be a lot harder: it’s new ground for me, and despite the fact that the Internet is older than many of my students, most of the educational establishment still thinks of it as “new” as well.

There are hundreds of books and web sites devoted to e-learning, but the majority just recycle the same handful of inactionable truisms. (“When designing online material, try to make it as engaging as possible.” Well, duh.) Most of the high-quality material focuses on research about e-learning, rather than instructional design itself. For example, Richard Mayer’s Multimedia Learning says a lot of interesting things about whether people learn more deeply when ideas are expressed in words and pictures rather than in words alone, and the principles he derives from his research are good general guidelines, but again, there’s little help offered in translating the general into the specific.

If there isn’t much explicit guidance available, what about prior art? MIT’s Open Courseware got a lot of attention when it was launched, but its “talking heads” approach reminds me of early automobiles that looked like horse-drawn carriages with motors bolted on. Carnegie-Mellon’s Open Learning Initiative (which advertises itself as “open courses backed by learning research”) is more interesting, but what has really caught my eye is Saleem Khan’s Khan Academy, which I first encountered through one of Jon Udell’s interviews. Khan has created hundreds of short videos on topics ranging from basic addition to mitosis and Laplace transforms by recording himself sketching on a tablet. The results are just as digestible as Hollywood-quality material I’ve viewed elsewhere, and with 25 lectures to do in less than 50 weeks, his low-ceremony approach appeals to me for practical reasons as well.

Of course, any believer in agile development would tell me that there’s only one right way to tackle this problem (and in fact, one did just an hour ago). By the end of May, I plan to put one lecture—probably the intro to relational databases and SQL—up on the web in two or three formats, and then ask for feedback. Is one 50-minute video better or worse than five 10-minute vignettes? Do people prefer PowerPoint slides with voiceover, live sketching/coding sessions (complete with erasures and typos), or some mix of the two? How important is it to close-caption the videos? If classroom-style slides are available as well as the video, how many people look at each? I know how to do these kinds of usability studies, and hopefully enough people will volunteer opinions to help me choose the right path.

Written by Greg Wilson

2010/03/26 at 20:30

Posted in Tooling, Version 4

Software Carpentry Version 4 is a Go!

I am very excited to announce that I am going to work full-time on revising the Software Carpentry course from May 2010 to May 2011. This work has been made possible by the generosity of our sponsors:

I would also like to thank The MathWorks, the University of Toronto, the Python Software Foundation, and Los Alamos National Laboratory, whose support over the past 13 years has allowed us to help scientists use computers more productively.

Version 4 of Software Carpentry will improve on its predecessors in three significant ways. First, the content will be reorganized and updated to better meet scientists’ needs. As with Version 3,  a typical graduate student or research scientist should be able to cover all of the material in a regular one-term course with approximately 25 hours of lecture and 100-150 hours of exercises.

Second, we intend to provide parallel versions of the material in MATLAB and Python, so that scientists who already know numerical computing’s most popular scripting language can dive right into the parts that interest them most. We have been testing a MATLAB translation of the Version 3 notes this winter with good results, and are grateful to the students at the University of Toronto who have tried them out and given us feedback.

Third, and most importantly, Version 4 of the course will be designed so that students can work through most or all of the material on their own, at their own pace, when they need it. To do this, we will make video recordings of the lectures available, along with screencasts and interactive examples, and provide over-the-web support via email, Skype, and desktop sharing to help early adopters when they run into roadblocks. We hope that this will allow us to reach, and help, many more people that would otherwise be possible.

Software Carpentry is an open project: all of the materials are available under the Creative Commons Attribution license, and can be freely shared and remixed provided you include a citation. If you would like to help us help scientists be more productive, please contact Greg Wilson by email at or as gregory_v_wilson on Skype.

Written by Greg Wilson

2010/03/25 at 17:20

Posted in Version 4

Summer Course: Analyzing Next-Generation Sequencing Data

Analyzing Next-Generation Sequencing Data

May 31 – June 11th, 2010
Kellogg Biological Station, Michigan State University
CSE 891 s431 / MMG 890 s433, 2 cr

Applications are due by midnight EST, April 9th, 2010.

Course sponsor: Gene Expression in Disease and Development Focus Group at Michigan State University.

Instructors: Dr. C. Titus Brown and Dr. Gregory V. Wilson

This intensive two week summer course will introduce students with a strong biology background to the practice of analyzing short-read sequencing data from the Illumina GA2 and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq.

No prior programming experience is required, although familiarity with some programming concepts is suggested, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested.

Written by Greg Wilson

2010/03/25 at 01:02

How Much Of This Should Scientists Understand?

Let’s start with the problem description:

All of the Software Carpentry course material (including lecture notes, code samples, data files, and images) is stored in a Subversion repository. That’s currently hosted at the University of Toronto, but I’d like to move it to the domain (along with this blog). However, is hosted with, who only provide one shell account per domain for cheap accounts like the one I bought.

Why is this a problem? Because when someone wants to commit to the repository, they have to authenticate themselves. I could let everyone who’s writing material for the course share a single user ID and password, but that would be an administration nightmare (as well as a security risk). Site5 does have a workaround based on public/private keys, but it’s fairly complicated—i.e., it could break in lots of hard-to-diagnose ways. Another option would be to use the mod_dav_svn plugin for Apache, but Site5 doesn’t support per-domain Apache modules either. does, so I may be switching hosts in a few weeks.

So: how much of this should the average research scientist be expected to understand? If the answer is “none”, then how are they supposed to make sensible decisions about moving their work online? If the answer is “all”, where does the time come from? (It takes me 30 seconds to read the two paragraphs above; it would take many hours of instruction to teach people enough to do the analysis themselves.)  And if the answer is “some”, then which parts? To what depth? And who takes care of the rest on scientists’ behalf?

Written by Greg Wilson

2010/03/11 at 19:45

Posted in Content, Opinion, Version 4

Eighty Per Cent!

As of this morning, I have signed commitments for 4/5 of the money I need to spend a year working full-time on updating the Software Carpentry course. If you, or someone you know, would like to help me help scientists be more productive, please get in touch: I have only 64 days in which to find the $30K I still need.

Written by Greg Wilson

2010/02/25 at 13:45

Posted in Version 4

How We Got Here, and Where We Are

I gave a talk in Santa Fe early in 1997 describing a set of articles I’d organized for the Summer 1996 and Fall 1996 issues of IEEE Computational Science and Engineering (now Computing in Science & Engineering on the subject, “What should computer scientists teach physical scientists and engineers?” After the talk, John Reynders (then the director of the Advanced Computing Lab at Los Alamos National Laboratory) challenged me to put my money where my mouth was and actually teach basic software development skills to working scientists.

Brent Gorda and I ran the course for the first time in July 1997. We used Perl as a programming language, and covered topics such as CVS, regular expressions, and a little bit of web client programming. Our part of the course was three days long, and was followed by a two-day consulting visit from Steve McConnell (whose books Code Complete and Rapid Development were at the top of the charts). We ran the course in various forms another five or six times in the next three years, during which time we switched to Python and expanded it to five days. All told, about 120 LANL technical staff went through the course, most of them under 35.

In 2004, after I’d taught the course for the Space Telescope Science Institute and the US Navy, the Python Software Foundation gave me a grant to reorganize, update, and expand the material. That version is the core of what’s now online; when I last checked, the site was getting 10-12,000 distinct visitors a month, and the material was being used in whole or in part at Caltech, Indiana, several schools in the UK and Germany, Chile, South Korea, and of course here in Toronto.

Based on follow-ups with alumni, I’d guess that it has no effect at all on 20-25%, who take the course because their manager or supervisor told them they had to, and get little out of it. The rest routinely describe it as game changing: a PhD candidate in Psychology who did the course with us in July 2009 told me a few days ago that what she learned probably saved her six months on her current project, and that without it, a second project would simply not have occurred to her to try. As another data point, one of the other alumni of that offering came to me early in October to say that several of her labmates wanted to take the course, and was I planning to offer it again any time soon? I told her that I wasn’t, but that I could arrange for a CS grad student to teach it. Three weeks later, 65 students from Pysch and Linguistics had signed up to do it as a non-credit course, roughly 45 of whom have stuck with it so far.

While I don’t have data to back this up, I believe very strongly that what most students get out of the course isn’t specific knowledge about relational databases, regular expressions, or object-oriented programming, but rather a mental map of the computing landscape, so that they know what’s supposed to be easy, what else is supposed to be possible, and where to go looking for more information. Another student from the July 2009 offering said that the biggest thing the course did for him was turn “unknown unknowns” into “known unknowns”. I’m supposed to conduct a follow-up survey with those students later this month to see how much they’re using what they learned, and what impact is has had; I hope to have results up on the web by Easter.

And as regular readers will know, I’m presently trying to raise money to update the material: this post explains the background, while this plan incorporates what I’ve learned from students and instructors on four continents about what material, sequence, and presentation will actually “reach” scientists. Sadly, though, funding agencies and companies mostly still seem to think that only HPC-related training is worth funding, which I feel is asking scientists to run before they can walk. This CiSE paper talks about this particular frustration, while our survey results put weight behind the claim that the overwhelming majority of scientists will benefit much more from being helped with development issues than from anything to do with big iron.

Written by Greg Wilson

2010/01/10 at 18:11