Archive for May 2009
One topic that isn’t currently in the curriculum that I’d really like to add is detecting, handling, reporting, and recovering from errors. This makes up is 10-30% of the code in real applications, but dealing with errors is almost omitted from textbook examples and tutorials for the sake of clarity (Tanenbaum’s Minix book being a laudable exception). I have asked elsewhere for someone to write an entire book on the subject; if anyone wants to take a crack at an hour-long lecture, please get in touch.
Our summer interns started today; our first job is to define exactly what they’ll be working on this summer, so it seems like a good time to round up a few links on interesting topics. My apologies for those hidden behind paywalls…
Steve’s Project Ideas
If I said, “I just got a really interesting result in the lab, but I didn’t record the steps I took or the settings on the machine,” no reputable journal would publish my paper. If I said, “I just got a really interesting computational result,” most reviewers and editors wouldn’t even ask if I’d archived my code and the parameters I used, or whether that code would run on someone else’s machine. Reproducible research (RR) is the idea of making computational science as trustworthy as experimental science by creating tools and working practices that will allow scientists to re-create past results.
- WaveLab and Reproducible Research
- The Madagascar project
- The Sweave project
- Special issue of Computing in Science & Engineering on reproducibility
The “provenance” of an object is the history of where it came from, and how it got here. The provenance of a piece of data is similar: what raw values is it derived from, and what processing was done to create it? Ideally, every piece of scientific software should track this automatically; in practice, very few do, and most scientists don’t take advantage of the capability when it’s there. That’s changing, though, particularly as emphasis on reproducibility grows.
- The Provenance Challenge: a series of competitions to benchmark provenance tools against one another.
- Special issue of Concurrency and Computation: Practice & Experience reporting the results of the first challenge
Also called “computer-supported collaborative science”, this is the idea of leveraging modern web-based collaboration tools to better connect scientists, their experiments, and their results. It encompasses a broad range of ideas, but “social networking for scientists” based on their interests is near the core, as is “open science” (the idea of making scientific results public in the same way as open source software or Creative Commons publications).
- Overview article in Scientific American
- Jon Udell’s Internet Groupware for Scientific Collaboration may be several years old, but it’s still prescient
- Jean Claude Bradley’s blog
- Cameron Neylon’s personal blog (see for example his post on “FriendFeed for Scientists“) and lab blog
Scientific Programming Environments
Compared to professional software developers, most scientists use fairly primitive programming environments, in part because they’ve been too busy learning quantum chemistry to learn distributed version control, and in part because software developers seem to go out of their way to make tools hard to set up and learn. Lots of people have tackled this from a variety of angles. Unfortunately, a lot of work to date has focused on supercomputing, which is sort of like studying modern medicine by focusing on heart surgeons…
- Greg Wilson’s “Where’s the Real Bottleneck in Scientific Computing?” and “Software Carpentry“
- Carver, Kendall, Squires, and Post’s “Software Development Environments for Scientific and Engineering Software: A Series of Case Studies“
- Matthews, Wilson, and Easterbrook’s “Configuration Management for Large-Scale Scientific Computing at the UK Met Office” is an example of tools done right
Thank you once again for taking part in our Fall 2008 survey of how scientists use computers in their research. We will present a paper describing our findings at ICSE’09 in Vancouver on May 23, and will make the results public as soon after that as possible. There will also be an article in American Scientist magazine discussing what you’ve told us some time this summer.
Our next step is to figure out what makes some scientific computer users so much more productive than others. We would therefore be grateful if you would take a few minutes to answer the questions below and email the result to firstname.lastname@example.org:
- If you think that you use computers more effectively in your work than some of your peers:
- explain why you think so
- describe what you do or know that they don’t
- If you can think of someone in your research area who uses computers more effectively in their work than you do:
- explain why you think so
- describe as best you can what they do or know that you don’t
If you answered either question, we would be very grateful if you could pass this email on to the colleague or colleagues you were thinking of and ask them to answer it as well—we believe we will learn a great deal by comparing responses, as well as from the responses themselves. If they wish to remain anonymous, please ask them to return their response to you for forwarding to us. Otherwise, please have them reply directly to us. (It would be very helpful in the second case for them to mention your name, so that we can pair their response with yours.)
As with the original survey, only the researchers directly involved in this study will have access to respondents’ contact information and/or identities. This information will not be shared with any third party in any way.
Thanks in advance for your help—we hope you’ll find the results useful.
Prof. Greg Wilson
Dept. of Computer Science
University of Toronto
We’ve put up a list of the topics we intend to cover, and the order in which we intend to cover them. It’s very provisional; we’ll update it regularly, and your comments would be very welcome.
Our rough guide to what students should know before taking this course is now on the Prerequisites page. If you don’t feel confident you know this material, but still want to take the course in July, please let us know: we’re organizing some tutorial sessions in May and June.
Interesting summary in the New York Times of some collaborative science done by Dr. Sean Cutler. One of the goals of this course is to give scientists the skills they need to do things like this routinely. This raises a few questions:
- Should one of the “graduation exercises” for the course be to write some kind of search tool (or a wrapper around several search tools)? Finding potential collaborators is sometimes the biggest challenge.
- Should more emphasis be placed on sharing and merging data sets? Or on looking for inconsistencies in them?
- What else do you need to be able to do in order to collaborate more effectively with your colleagues?