Archive for August 2009
Cameron Neylon has been playing with Google Wave, and he likes it. His presentation at Science Online in London in August explains why (you can also watch video, though sadly there’s no soundtrack). He’s even writing robots to automate some scientifically interesting tasks. Nature News liked Wave too, which (a) reminds me yet again of how prescient Jon Udell’s “Internet Groupware for Scientific Collaboration” was ten years ago, and (b) makes me wonder (also again) how much this course should be re-thought.
The Unix shell philosophy of creating lots of simple single-purpose tools and then combining them in rich ways has clearly found its second wind on the web. Just look at the options:
- Ad hoc services using something like Django or Rails
- Drag-and-drop GUIs like Yahoo! Pipes
- Special-purpose frameworks like Galaxy
- Workflow tools like Taverna
- Next-generation scripting with something like PowerShell
and on and on and on. Each has its own opinion on what the problem to be solved actually is; each requires different skills, and with the exception of Taverna and Galaxy, they regard scientific computing as one niche interest among many.
The problem, of course, that with so many different ways to do it, no matter which one(s) the course covers, students will probably be faced with something else when they go back to the lab.
Software Carpentry currently teaches students how to manipulate text (using regular expressions), XML (using DOM), relational data (with SQL), and binary data. A decade ago, when we first put the course together, that covered everything I’d ever seen more than one or two scientists use. Today, though, an increasing number are using geospatial (map) data as well. How important is this to your work? If the answer is “very”, what data do you work with, what do you do with it, and what would you like to be able to do?
The bad news is, retractions of scientific papers have risen tenfold since 1990. The good news is, the rate has gone from 0.0007% to 0.007%. Going back to bad news, though, some estimates of how many papers ought to be retracted are around 1%, so we still have a long way to go.
This thoughtful article from the New York Times asks whether the current grant system for funding research discourages researchers from taking risks. My personal experience undoubtedly biases me, but I tend to agree—the problem is coming up with something better.
Nature‘s “Great Beyond” blog reports another attempt to force climate scientists to release their data. I’m of at least two minds on this: I believe openness is absolutely crucial to science (and society as a whole), but I understand scientists’ concern about being scooped, and equally their concern about having their work misrepresented or quoted out of context. Mostly, I come down on the side of openness—how about you?
They say the devil is in the details, but so’s the delight, because it’s the details that determine whether something works or doesn’t. So let’s take a look at how to translate the last post’s “big picture” into actual course content.
Every competent developer uses some kind of tool to automate tasks that involve dependencies. The best known is still Make, which compares the last-modified-times on files to see which ones are stale, and runs shell commands to bring them up to date. Ant, Rake, and whatever’s built into your IDE all work basically the same way, and can all be used (with greater or less ease) to recompile software, re-run tests, prepare distributions, update web sites, and what have you.
Dependency managers are an example of the kind of tool scientists are willing to spend an hour learning (more if they’re working with C++ or Java, less if they’re working with a scripting language). Understanding how they work, though, requires at least some familiarity with:
- automation (obviously)
- declarative programming (the user declares the “what”, the computer figures out the “how”)
- graphs (which is how these tools figure out what order to do things in)
- queries (since rules are often best expressed using pattern matching)
- programs as data (since dependency managers are programs that run other programs)
So, can we use Make to teach these concepts? Or teach these concepts using Make as an example? I thought so back in 2003 when I put together the first version of “CSC207: Software Design” for the University of Toronto. In their first two programming exercises, students worked with graphs and wrote simple text parsers using regular expressions. They then had to put the two together to create a very (very) simple version of Make.
I thought it worked well, but over the years the exercises were cut back until eventually this one disappeared entirely. There was just too much material in the course, and the various bits weren’t connected strongly enough. While it might work in theory, it didn’t in practice, and would probably fare even less well if crammed into two days of an intensive two-week course. It’s still a good example of how I’d like to tie the practical and conceptual parts of the course together, though; the trick is finding a way to make it work.