Software Carpentry

Helping scientists make better software since 1997

Archive for February 2006

Database Lecture is Up

The lecture on databases (actually an introduction to SQL) is now up. Comments and corrections welcome.


Written by Greg Wilson

2006/02/23 at 20:41

Posted in Lectures, Version 3

Second Lecture on Testing Now Online

The second lecture on testing is now online. As always, comments and corrections are appreciated.

Written by Greg Wilson

2006/02/22 at 21:42

Posted in Lectures, Version 3

What Else for Software Carpentry?

16 lectures are now in place (more or less), which means I have 8 more to do. The syllabus shows what I’ve covered already; my current plans include:

  • unit testing
  • XML
  • SQL
  • more SQL
  • small-team development process

What do you think the other three should cover (keeping in mind that this is supposed to be a course on basic software engineering, rather than scientific programming)? Options include:

  1. Basic web programming, with much-revised versions of:
  2. Integration, including:
    • wrapping C code so that it can be called from Python
    • using popen() and its ilk to run external programs
    • (probably) something on refactoring to make code more testable (as per Feathers’ excellent Working Effectively with Legacy Code
  3. Three lecture-length examples, building very simple versions of core tools that haven’t been covered elsewhere:
    • data lineage
    • continuous integration
    • data consistency checking
  4. Give in, and do the scientific programming stuff anyway:
    • floating-point arithmetic
    • Python’s Numeric package
    • data visualization
  5. Scrap the single lecture on development process, and put in four full lectures on the subject
    • XP
    • UML-based processes (probably ICONIX)
    • something else (not entirely sure what)
  6. Something else entirely — suggestions would be very welcome.

Please let me know what you think.

Written by Greg Wilson

2006/02/21 at 14:18

Posted in Content, Version 3

Second Lecture on Object-Oriented Programming

The second lecture on object-oriented programming is now on the web. This describes operator overloading and static methods, and includes the design patterns material that was in the old design lecture (which has been removed—the general consensus was that it didn’t work). As always, comments are welcome.

Written by Greg Wilson

2006/02/21 at 13:29

Posted in Lectures, Version 3

AAAS Annual Meeting 2006

Wednesday, 11:10 p.m.: phone call from Air Can’tada saying that my Thursday morning flight to St Louis has been cancelled because of bad weather. Next available is 4:00 p.m. Friday afternoon—two and a half hours after my workshop is due to end. No, they can’t help me find an alternative carrier. Expedia can, though, and by midnight, I have a ticket on Delta, via Cincinnati.

Thursday, oh dark hundred: the cab’s tires crunch through eight centimeters of fresh snow on the way to the airport. We’re late getting off the ground, and even later leaving Cincinnati, but at least we’re airborne. Tornado warnings over St Louis, though, so after circling over a spinning mass of clouds with a lightning-filled depression in the middle for about an hour, we head for Evansville, Indiana. I finally get to my hotel at 7:30 p.m., fifteen hours after starting my day.

Friday: the Annual Meeting of the AAAS isn’t really a scientific conference—it’s a place for science advocates to gather and plot, stirred together with an extended series of press cuddles dolled up as seminars. (This is not a criticism: if the cosmetics industry, fast food vendors, and the military-industrial complex are smart enough to plot and cuddle, scientists should be too.) Some of the talks (particularly the medical ones) are Mojave-dry, but others are pretty cool:

  • “The Demography of Black Holes” (with pictures!)
  • “In Search of Genes that Influence Language” (without, but still interesting)
  • “New Approaches to Paleontological Investigation” (use a CT scan of a fossil to drive a 3D lithography machine, and you can photocopy dinosaur bones at sub-millimeter resolution—oh, and check out

Friday noon: Andy Lumsdaine and Peter Gottschling arrive from Indiana University for our workshop on Essential Software Development Skills for Research Scientists. We covered the usual topics:

  • Computational scientists don’t pay as much attention to quality and reproducibility as experimental scientists (in fact, many of them don’t pay any attention to these issues).
  • Most scientific programmers are woefully inefficient compared to their industrial counterparts, largely because no one has ever taught them basic software engineering skills.
  • A handful of tools and techniques can reliably improve scientific programmers’ productivity by 20-25%: version control, test-driven development, continuous integration, issue tracking, use of a debugger, enforcing style, traceability, and behind them all, automation.
  • There are many personal and institutional obstacles (ranging from “I have a degree in physics, so programming must be easy” to “journals and tenure committees don’t care, so I can’t afford to”).
  • We either fix this ourselves, proactively, or someone else will legislate bad rules in the wake of a very public disaster.

Randy Heiland’s picture shows the three of us on stage; there weren’t as many lab managers or funding directors as I’d hoped for, but lots of good questions and discussion.

Friday evening: a recap of the 2005 Ig Nobel Prize awards for science that cannot, or should not, be repeated, including:

  • Physics: John Mainstone and the late Thomas Parnell, for patiently conducting an experiment that began in the year 1927, in which a glob of congealed black tar has been slowly, slowly dripping through a funnel, at a rate of approximately one drop every nine years.
  • Medicine: Gregg A. Miller, for inventing Neuticles—artificial replacement testicles for dogs, which are available in three sizes, and three degrees of firmness.
  • Literature: the Internet entrepreneurs of Nigeria, for creating and then using e-mail to distribute a bold series of short stories, thus introducing millions of readers to a cast of rich characters, including General Sani Abacha, Mrs. Mariam Sanni Abacha, Barrister Jon A Mbeki Esq., and others.
  • Peace: Claire Rind and Peter Simmons, for electrically monitoring the activity of a brain cell in a locust while that locust was watching selected highlights from the movie Star Wars.
  • Economics: Gauri Nanda, for inventing an alarm clock that runs away and hides, repeatedly, thus ensuring that people DO get out of bed, and thus theoretically adding many productive hours to the workday.
  • Biology: Benjamin Smith and others, for painstakingly smelling and cataloging the peculiar odors produced by 131 different species of frogs when the frogs were feeling stressed.
  • Fluid Dynamics: Victor Benno Meyer-Rochow and Jozsef Gal, for using basic principles of physics to calculate the pressure that builds up inside a penguin, as detailed in their report “Pressures Produced When Penguins Pooh—Calculations on Avian Defaecation.”

Saturday: I smorgasboarded the seminars. The best was Latanya Sweeney‘s talk about information privacy—she was kind enough to chat with me for 45 minutes afterward about undergraduate curriculum reform, and the obstacles to it (did you know there isn’t an undergrad course on software engineering at CMU?). The worst was an unrelated seminar on “Information Security in Public Databases”. Aaron Emigh, of Radix Labs, did a great job of explaining the issues. Kevin Fu, of UMass, was also engaging, but Mike Szydlo (RSA) gave us a technical sales talk that I’m sure went over the heads of most of the audience.

And then there was Markus Jakobsson, of Indiana University. He’s the guy who conducted phishing attacks on IU students last year, without their prior consent (informed or otherwise), in order to get material for a paper. I think this was irresponsible: one of the obstacles to better security is that the public doesn’t trust us (the professionals) to look out for their interests. Some of that is Hollywood’s fault (how many positive portrayals of computer geeks have you seen recently? and how many portrayals of what hackers can and can’t do are half as accurate as the average episode of a medical soap opera?), but conducting experiments on people who don’t know they’re being experimented on sure doesn’t help.

One telling moment came after the presentations, when Jakobsson asked the audience which of two “solutions” they thought would work better: educating the public, or better technology. I pointed out that what he was really offering users was a choice between paying more (hours) or paying more (dollars, to technology vendors). I then asked why he hadn’t mentioned the third option, which is to shift the financial pain to the vendors (which is what brought the problem of credit card fraud under control). He dodged, but Aaron Emigh didn’t, so I’m going to see if I can get Aaron’s slide set, and post it here.

Saturday afternoon: discover that there are no bookstores in downtown St Louis. I don’t mean, “there are no good ones”. I mean, “there are no bookstores in the downtown core of St Louis, at all”. The nearest (according to both hotel staff and conference organizers) is a 15-minute drive away—in another county.

Sunday: up at quarter to five to get to the airport for a 7:15 flight that didn’t take off until 8:45, which meant that I missed my 10:59 in Cincinnati, and had to get the 1:10 instead, so I didn’t get home until 3:20. Very happy to walk through the door; very happy to have someone else happy that I was walking through the door.

Written by Greg Wilson

2006/02/20 at 09:44

Posted in Community, Version 3

Data Lineage

The January 2005 issue of ACM Computing Surveys (vol. 37, no. 1, if you prefer) has good review by Rajendra Bose and James Frew titled “Lineage Retrieval for Scientific Data Processing: A Survey”. In it, they look at what scientists do to keep track of what data they have, where it came from, and what has been done to it. Some of my students last term were worrying about the same issues in the context of HL7 medical data. It seems like an ideal place for software engineers to apply their skills: I’d be interested in hearing from people who have home-grown or small-scale systems I could use as a starting point for a lecture in Software Carpentry.

Written by Greg Wilson

2006/02/14 at 13:34

Posted in Content, Version 3

Lecture on Binary Data

The Software Carpentry lecture on binary data is now up on the web. The content of this one has been fairly stable for a while, but that just means that all the bugs will be in the details—comments and corrections are greatly appreciated.

Written by Greg Wilson

2006/02/14 at 12:10

Posted in Lectures, Version 3