Archive for February 2009
Somewhere in The Age of Uncertainty, Galbraith wrote that what made Das Kapital and the Bible great books was that they were so large, and so full of contradictions, that everyone could find support in them for anything they wanted. I have felt the same way about the phrase “computational thinking” ever since I attended a workshop at Microsoft Research in September 2007. In one of the breakout sessions, six of us tried to operationalize our understanding of the term by coming up with questions for a quiz that could be given to someone to determine if he or she was thinking computationally. It quickly became clear that we meant very different things when we used those two words. It was also clear (to me at least) that this ambiguity was socially very useful, since (to switch metaphors) it allowed people to attend the same church while disagreeing on the nature of salvation. It’s not a polite fiction per se, but rather a—um, damn, I don’t know the word—a thing that no one looks at closely because doing so would cause discomfort or friction.
Eventually, though, things do have to be looked at closely. In this case, it’s the productivity of scientific programmers. Based on feedback from people who’ve taken it, I believe that Software Carpentry significantly increases how much scientists can do with computers, but I don’t have anything that would pass muster as “proof”. I’m actually not even sure what form such proof would take, since I don’t know how to measure the productivity of programmers of any other kind either—not in any reasonable amount of time, either. (Waiting to see if alumni produce more papers would take at least a couple of years, maybe more.) If someone could figure out how to measure computational thinking ability, on the other hand, before-and-after testing might be good enough. Any thoughts?
I’ve used the term “CSCS” a few times now; time to start groping toward a definition. “Computer supported collaborative science” (CSCS) is a specialization of computer supported collaborative work, which is the study of “how collaborative activities and their coordination can be supported by means of computer systems”. Insert the word “scientific”, and you have CSCS. More specifically, CSCS includes science 2.0, open notebook science, reproducible research, workflow & provenance, and other things modern computing technology can do to help scientists find and share information.
Another way to look at CSCS is “areas where typical researchers in software engineering and/or HCI can directly help scientists”. The word “typical” rules out HPC, numerical methods, very large databases, and a whole bunch of other “computational science 1.0″ topics, since most SE/HCI people don’t have the background for those. The stuff that falls under “e-science” or “grid science” (depending on which side of the Atlantic you’re on, and which grant agency you’re trying to seduce) might or might not be included, depending on which part you’re looking at—there’s certainly overlap. The same goes for the semantic web, data visualization, and a bunch of other things.
Ironically, it’s not clear whether traditional software engineering research falls under the CSCS heading either, at least not if you define SE as the study of software construction—it takes a lot of SE skill to build the kinds of things CSCS is about, but I don’t see where CSCS requires the invention or study of new ways of building things. On the other hand, if your definition of SE includes end-user programming or the study of how to do empirical studies of tools and techniques in action, then there’s definitely overlap with CSCS.
So that’s my opening shot: anyone want to volley it back?
Via Jon Pipitone: there’s a panel discussion tomorrow at Columbia titled “Open Science: Good For Research, Good For Researchers?” Jean-Claude Bradley, Barry Canton, and Bora Zivkovic are all going to be there, and yes, video will be distributed. I’m looking forward to it—it’ll be a lot of thinking on computer supported collaborative science in one place.
I blogged last August about the first and second Provenance Challenge, in which the creators of systems for tracking scientific data and workflows were given sample problems, then asked to have their tools answer a variety of questions. (Results from the first were reported in Concurrency and Computation, but ironically, those articles are not openly available; the third challenge will kick off soon.) Chasing down one of those references again, I came across the Open Notebook Science Challenge, which “…calls upon people with access to materials and equipment to measure the solubility of compounds (aldehydes, amines and carboxylic acids are a priority) in organic solvents and report their findings using Open Notebook Science“. This isn’t quite the same thing as automatically tracking data provenance; instead, it is “…the practice of making the entire primary record of a research project publicly available online as it is recorded”. There are lots of interesting research questions for computer scientists here, ranging from privacy and security issues to notification, peripheral awareness, ontological engineering, and more — for example, see Cameron Neylon’s latest post synthesizing discussion about using OpenID to identify scientific researchers and their contributions.
Steve Eddins has posted an xUnit-style testing harness for MATLAB called MTEST on the MATLAB Central File Exchange. It’s a nice piece of work, and I hope numerical programmers will make heavy use of it.
Carl Zimmer (prolific and talented writer on biology and evolution) has posted a crowd-sourced reading list of great science writing. Lots of good stuff…
Jan Erik Moström recently posted a request for textbook recommendations for teaching programming (using Python) to biotechnology students. He has now posted the responses he received — might be interesting to some readers of this blog.
Interesting post from Systeme D called “ShareAlike considered harmful” (for geo data, anyway). I should give this to my students and ask them to think it through…
This time, he has blogged about best practices for making scientific data available. I think this kind of thing will have a much bigger impact on scientists’ productivity than any amount of parallel supercomputing, and that computer scientists could have a lot of impact by helping “real” scientists figure out how to do it better.