Archive for the ‘Versions’ Category
We’re very pleased to announce that Scimatic Software, a Toronto based company that specializes in the development of software for the scientific community, has come on board as a sponsor of this project. Many thanks to Jamie McQuay and Jim Graham!
Like many programmers, I’ve learned most of what I know by poking around and breaking things. Quite naturally, that has led me to believe that this is the best way to learn—after all, if it worked for me, it has to be pretty good, right? But research says otherwise. Kirschner, Sweller, and Clark’s paper, “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching“, was published in Educational Psychologist in 2006, but the whole text is available online.
After Thursday’s post-mortem on the latest offering of Software Carpentry at the Universitiy of Toronto, I had a chance to talk further with Jon Pipitone, who was one of the tutors (and who is just wrapping up an M.Sc. looking at code quality in climate models). We got onto the topic of infrastructure for Version 4, which needs to be settled quickly.
Thanks to the initiative of Dominique Vuvan (who took Software Carpentry last summer), we ran a semi-formal version of the course from last November through to this past week for grad students in Psychology, Linguistics, and a few other disciplines at the University of Toronto. Weekly tutorials were offered in both Python and MATLAB by graduate teaching assistants from Computer Science, covering roughly half of the existing material.
As I said in last week’s announcement, and mentioned again in a later post, one of the main goals of this rewrite is to make it possible for students to do the course when and where they want to. That means recording audio and video, but much of the material will probably still be textual: code samples (obviously), lecture notes (for those who prefer skimming to viewing, or who want to teach the material locally), and exercises will still be words on a virtual page. And even the AV material will (probably) be accompanied by scripts or transcripts, depending on what turns out to work best.
Which brings up a question everyone working with computers eventually faces: what format(s) should material be stored in? For images, audio, and video, the choices are straightforward: SVG for line drawings, PNG for images, MP3 for audio, and MP4, MPEG, or FLV or video (I don’t know enough yet to choose). But there’s a bewildering variety of options for text, each with its pros and cons.
I mentioned yesterday that I maintain a list of books that haven’t been written yet. Partly it’s an exercise in sympathetic magic—if the reviews exist, maybe the books will follow—but it’s also useful for organizing my thoughts about what a programmer’s education should look like. Looking at the books I’ve matched to various topics in the Software Carpentry course outline, there are some distressing gaps:
- Given that programmers spend upwards of 40% of their time debugging, there are very few books about it, and only one collection of exercises (Barr’s Find the Bug).
- There’s a lot on higher-level programming techniques, but it’s scattered across dozens of books as disparate as The Seasoned Schemer, Effective C++, and The Practice of Programming. I haven’t read Perrotta’s Metaprogramming Ruby yet, but it looks like it will be another rich source of ideas.
- Material on systems programming—manipulating files and directories, running sub-processes, etc.—is equally scattered. The Art of Unix Programming includes all the right topics, but covers too much, in too much detail, at too low a level. Gift & Jones’ Python for Unix and Linux System Administration has the same two faults (from Software Carpentry’s point of view—I think both are excellent books in general), but uses a scripting language for examples, so it made the list.
- Mark Guzdial and others have done excellent research showing the benefits of teaching programming using multimedia, i.e., showing students how to manipulate images, sound, and video as a way of explaining loops and conditionals. That’s half of why the revised course outline includes image processing early on (the other halves being “it’s fun” and “it’s useful”). Once again, most of what I’m familiar with is either documentation for specific libraries, or textbooks on the theory of computer vision, but there are some promising titles in the MATLAB world that I need to explore further.
- Performance. It’s been 15 years since I first grumbled about this, and the situation hasn’t improved. Most books on computer systems performance are really textbooks on queueing theory; of that family, Jain’s Art of Computer Systems Performance Analysis is still head and shoulders above the crowd. Souders’ High Performance Web Sites is the closest modern equivalent I’ve found to Bentley’s classic Writing Efficient Programs, but neither is really appropriate for scientists, who need to think about disk I/O (biologists and their databases), pipelining and caching (climatologists with their differential equations), and garbage collection (everybody using a VM-based language). I had hoped that High Performance Python would fill this gap, but it seems to have been delayed indefinitely. (And yes, I’ve looked at Writing Efficient Ruby Code; it has some of what our students want, but not nearly enough.)
- There are lots of books about data modeling, but all the ones I know focus exclusively on either the relational approach or object-oriented design, with a smattering that talk about XML, RDF, and so on. I haven’t yet found something that compares and contrasts the three approaches; pointers would be welcome.
- Web programming. There are (literally) thousands of books on the subject, but that’s the problem: almost all treatments are book-length, and this course only has room for one or two lectures. It is possible to build a simple web service in that time, but only by (a) using a cookbook approach, rather than teaching students how things actually work, and (b) ignoring security issues completely. I’m not comfortable with the first, and flat-out refuse to do the second: if this course shows people how to write a simple CGI script that’s vulnerable to SQL injection and cross-site scripting, then it’s our fault when the students’ machines are hacked. This gap is as much in the available libraries as in the books, but that doesn’t make it any less pressing.
Given these gaps, I may drop one or two topics (such as performance and web programming) and either swap in one of the discarded topics or spend more time on some of the core material. I’m hoping neither will be necessary; as I said above, pointers to books in any language that are at the right level, and cover the right areas, would be very welcome.
I’m slightly obsessed with reading lists. (I even maintain a list of books that haven’t been written yet, in the hope that it will inspire people to turn some of the entries from fantasy into reality.) Partly to give credit to all the people whose work inspired Software Carpentry, and partly to guide students who want to learn more than we can fit into a double dozen lectures, I have started a bibliography, and added links to relevant books to the lecture descriptions in the course outline. Pointers to other material would be very welcome; I will blog soon about areas that I feel are particularly lacking.