Archive for the ‘Tooling’ Category
Like many programmers, I’ve learned most of what I know by poking around and breaking things. Quite naturally, that has led me to believe that this is the best way to learn—after all, if it worked for me, it has to be pretty good, right? But research says otherwise. Kirschner, Sweller, and Clark’s paper, “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching“, was published in Educational Psychologist in 2006, but the whole text is available online.
After Thursday’s post-mortem on the latest offering of Software Carpentry at the Universitiy of Toronto, I had a chance to talk further with Jon Pipitone, who was one of the tutors (and who is just wrapping up an M.Sc. looking at code quality in climate models). We got onto the topic of infrastructure for Version 4, which needs to be settled quickly.
My father once told me that a week of hard work can sometimes save you an hour of thought. In that spirit, I’ve been looking for asynchronous online courses to imitate. I previously mentioned MIT’s Open Courseware, CMU’s Open Learning Initiative, and (closer to my scale) Saleem Khan’s Khan Academy. Google Code University‘s lessons on programming languages are also on my radar—I’ll blog more about them once I finish the Python material—but another model that I’m looking at closely is Teaching Open Source, a collaborative effort to get more open source into college and university courses. I first encountered them through POSSE (Professors’ Open Source Summer Experience), which they describe as:
…a weeklong bootcamp that will immerse professors in open source projects. Participants spend a week of intensive participation in selected open source projects, led by professors with experience in teaching open source development, in partnership with community members who have deep experience and insight. By the end of the session, participants should have a much better understanding of the workings of open source projects, and a strong network of contacts to lean on as they begin to bring students into the open source world.
I’ve also been watching in awe (with a small ‘a’, but awe nonetheless) as half a dozen contributors have pulled together a textbook called Practical Open Source Software Exploration: How to be Productively Lost, the Open Source Way. It’s by no means complete, but I have already bookmarked it in a dozen places, and expect to add more. I always hoped that Software Carpentry would become a community project of this kind; here’s hoping that Version 4 finally manages to.
As I said in last week’s announcement, and mentioned again in a later post, one of the main goals of this rewrite is to make it possible for students to do the course when and where they want to. That means recording audio and video, but much of the material will probably still be textual: code samples (obviously), lecture notes (for those who prefer skimming to viewing, or who want to teach the material locally), and exercises will still be words on a virtual page. And even the AV material will (probably) be accompanied by scripts or transcripts, depending on what turns out to work best.
Which brings up a question everyone working with computers eventually faces: what format(s) should material be stored in? For images, audio, and video, the choices are straightforward: SVG for line drawings, PNG for images, MP3 for audio, and MP4, MPEG, or FLV or video (I don’t know enough yet to choose). But there’s a bewildering variety of options for text, each with its pros and cons.
As well as deciding on the format of the course, I have to re-shape its content. In contrast to e-learning, there seems to be a lot of solid material available on instructional design. The most useful guide I’ve found so far is Wiggins & McTighe’s Understanding by Design. I was initially a bit put off by the micro-industry the authors have built around the book, but its step-by-step approach immediately felt right:
- What are students supposed to understand at the end of the lesson?
- How is that going to be determined, i.e., what questions will they be answer that they couldn’t answer before, or what will they be able to do that they couldn’t do before?
- What lessons and activities are going to help them acquire that knowledge and those skills?
The whole thing is a lot more detailed than that, but you get the gist. And note that the last point says “help them acquire”, not “teach them”: while the latter focuses on what the instructor says, the former focuses on helping students construct understanding, which is both more accurate and a better fit for the level of students this course targets.
I’ve already used their ideas in reshaping the course outline. If the right way to deliver the course turns out to be 200 vignettes rather than 25 lectures, I will need to do some chopping and rearranging, but I think that what I have is a good starting point. Once I know what format I’m going to choose, I will rework the outline in accordance with the three-step approach summarized above and ask for feedback.
As the announcement of Version 4 said, Software Carpentry is being redesigned so that it can be delivered in several ways. I want to support:
- traditional classroom lectures, with someone at the front of the room talking over a series of slides and/or coding sessions to a captive audience;
- students reading/viewing material on their own time, at their own pace, when and as they need it; and
- hybrid models, in which students work through as much as they can on their own, then get help (face-to-face or over the web) when they hit roadblocks.
#1 isn’t easy to do well, but the challenges involved are well understood. #2 and #3 are going to be a lot harder: it’s new ground for me, and despite the fact that the Internet is older than many of my students, most of the educational establishment still thinks of it as “new” as well.
There are hundreds of books and web sites devoted to e-learning, but the majority just recycle the same handful of inactionable truisms. (“When designing online material, try to make it as engaging as possible.” Well, duh.) Most of the high-quality material focuses on research about e-learning, rather than instructional design itself. For example, Richard Mayer’s Multimedia Learning says a lot of interesting things about whether people learn more deeply when ideas are expressed in words and pictures rather than in words alone, and the principles he derives from his research are good general guidelines, but again, there’s little help offered in translating the general into the specific.
If there isn’t much explicit guidance available, what about prior art? MIT’s Open Courseware got a lot of attention when it was launched, but its “talking heads” approach reminds me of early automobiles that looked like horse-drawn carriages with motors bolted on. Carnegie-Mellon’s Open Learning Initiative (which advertises itself as “open courses backed by learning research”) is more interesting, but what has really caught my eye is Saleem Khan’s Khan Academy, which I first encountered through one of Jon Udell’s interviews. Khan has created hundreds of short videos on topics ranging from basic addition to mitosis and Laplace transforms by recording himself sketching on a tablet. The results are just as digestible as Hollywood-quality material I’ve viewed elsewhere, and with 25 lectures to do in less than 50 weeks, his low-ceremony approach appeals to me for practical reasons as well.
Of course, any believer in agile development would tell me that there’s only one right way to tackle this problem (and in fact, one did just an hour ago). By the end of May, I plan to put one lecture—probably the intro to relational databases and SQL—up on the web in two or three formats, and then ask for feedback. Is one 50-minute video better or worse than five 10-minute vignettes? Do people prefer PowerPoint slides with voiceover, live sketching/coding sessions (complete with erasures and typos), or some mix of the two? How important is it to close-caption the videos? If classroom-style slides are available as well as the video, how many people look at each? I know how to do these kinds of usability studies, and hopefully enough people will volunteer opinions to help me choose the right path.
Right now, the Software Carpentry material is basically printed pages on the web. Each lecture is a linear HTML page: bullet point follows bullet point, interrupted only by code snippets, tables, and diagrams. If I’m going to update the content, I’d also like to update the presentation; the question is, “To what?” An audio recording of me talking over the slides would add some value, though I think that typing in what I would say would probably be more useful, since most people can read faster than I can speak, and audio still isn’t googleable.
I’ve also thought about recording screecasts (audio on top of a video recording of my computer desktop). That would allow me to show live coding sessions, which I think many students would find valuable. Flipping that around, I could embed small snippets of video in the HTML pages. Then there are tools like Crunchy that allow you to create tutorials by embedding snippets of Python in web pages. That could help the programming parts of the course, but not with version control, Make (if we stick to Make, which I hope we don’t), or many other parts.
So: what’s the best online tutorial you’ve ever seen? What made it the best? Do you know how much effort it took to build the first time? How much effort it would take to build once the authors were experts in [name of tutorial-building technology goes here]? Pointers would be very welcome…
It’s clear from Friday’s end-of-course review that the course needs shaking up. Before that starts, though, there’s a higher-level question to answer: should the course notes be converted to a wiki to encourage contributions from others? It was always my hope that other people would contribute material, but in four years, only five ever have; perhaps wikification would change that.
Right now, the notes are stored as HTML pages in a Subversion repository and compiled by a little Python script to resolve cross-references, insert code samples, and so on. The advantages of this approach are:
- People can work locally and push coordinated changes when ready.
- Slide format can be skinned by changing a flag in the Makefile to select different CSS. (For example, I’m still hoping to get S5 or S5R working.)
- The build step can also insert code fragments, ensure that bibliography references resolve, etc.
Advantages of a wiki are:
- Easier collaboration: people can make small fixes in place without doing an “svn checkout” or running Make.
As a programmer, the first three weigh heavier in my mind than the last one, but again, only five people have contributed material in four years, which isn’t sustainable. What do you think? Would switching to a wiki make you more likely to add material or not?
Chris Lasher, who audited the Software Carpentry course long-distance last time around, has put together some screencasts to go with the first few lectures:
I think this is very cool — if you have feedback, or want to praise him (hint, hint), please drop him a line. I would also be interested in hearing how useful you find these — if there were twenty or thirty covering the whole course, would you actually watch them? And would they help you understand the course material?
I finished rewriting the build system for the Software Carpentry course notes yesterday. Doing so was an extended form of procrastination: the system I built over the summer and used through the fall was adequate, but I wanted to clean a few things up, and then, well, I might as well make it easier for other instructors to add site-specific content, and make tables inclusions instead of inlining them, and mumble mumble mumble type type type…
Of course, none of this has actually advanced the content of the course one whit. I have over seventy tickets to close, ranging in size from making sure that a particular Make example does what I claim to rewriting the lecture on security. And diagrams: no one was happy with the isometric ones created this term (not least because they’re kind of fuzzy), so I have over a hundred diagrams to re-do. In a perfect world, they’d be ready before I teach at the IASSE in mid-January. In this universe, I’ll be happy if they’re in place for the Essential Software Skills for Research Scientists workshop at the AAAS Annual Meeting on February 17.
We all do this. We all fold laundry instead of paying bills, or invent an antigravity drive when we’re supposed to be studying for an Economics final. (OK, maybe that was just me.) But it seems particularly common among software developers, many of whom would rather spend two hours creating a new (not better, just new) serialization class hierarchy than take five minutes to center-align the titles at the top of the product’s help page. One of the characters in Mark Costello’s Big If (reviewed here) is a prime example: his company desperately needs him to add some new monsters to a video game, so he spends a week adding shadows to clouds.
But back to the build system… What I have is a set of XML files marked up with a homegrown tag set, and what I want is some HTML pages. The files are organized into several directories: the main page is in the root, while all of the lectures are in
lec/, and site-specific content is in sub-directories underneath
sites/. Each directory that contains source XML files may also contain
tbl/ sub-directories; in turn, each of those has one sub-directory for each of the source files, which holds images, sample code inclusions, and tables.
The build system consists of the following tools:
- A 500-line Makefile in the root directory that drives everything else. Roughly half of those lines are comments (which can be extracted and formatted as a wiki page to create on-line documentation). This Makefile includes another file called
config.mk, in which users must specify the lectures they want to include in the course.
- A Python script called
linkages.pythat scans the source files and builds a data structure that records such things as the order of lectures, where glossary terms are defined, the two-part numerical IDs of figures and tables, and so on.
linkages.pywrites this data structure directly to a file called
tmp/linkages.tmp.py, which other tools then import. Persisting the data structure directly saved me from having to mess around with parsers or serializers. The clever bit (ahem) is that I only write it out if (a) the file doesn’t already exist, or (b) the contents have changed. That way, if I change a source file in a way that doesn’t affect cross-linkages, Make doesn’t do a lot of unnecessary rebuilding.
- Once the linkages file is up to date,
preprocess.pykicks in. This script creates copies of the source files under the
tmp/directory (preserving the directory structure), and adds information to those copies to make XSLT’s job easier. Among other things, it:
- adds a unique file ID, and the path to the root of the build, to the lecture’s root element;
- copies content from table files into the lectures;
- adds citation information to bibliography references;
- does multi-column layout of length tables;
- inserts figure and table counter values (the “4.2″ in “Figure 4.2″);
- fills in cross-references between source files;
- replaces the
<lecturelist/>element with a point-form list of links to lectures;
- fills in the
<tbllist>tags with lists of figures and tables respectively;
- links terms in the glossary back to their first uses;
- inserts included program source files;
- links to external references;
- adds “previous” and “next” linkage information to lectures;
- generates a syllabus; and
- adds tracing information, such as file version numbers and the time the files were processed.
Each stage ought to be a filter of its own, and in fact I wrote them all that way to begin with. However, launching fifteen or more copies of the Python interpreter for each source file made the build rather slow; doing the piping internally reduced the time per source file from eight or nine seconds to less than a second.
util/individual.xslis an XSL script that translates the filled-in XML lecture file into HTML. This script handles the outer skeleton directly, handing specific tasks like the bibliography and special lists to other XSL files that it includes.
- A Python script called
util/unify.pyand an XSL script called
util/unified.xslwork together to create a single-page version of the whole course.
unify.pystitches the filled-in lecture files together;
unified.xslthen applies the same transformations as
individual.xsl, but formats hyperlinks differently (since they’re all in-file).
- I use another Python script called
validate.pyto check the internal consistency of the source files. Do any of them contain tabs or unprintable characters? Do all the required images, source files, and tables exist? I run this before checking in changes; it catches something about one time in five.
- And then there are the minor tools:
util/fixentities.pyreplaces character entities with character codes (to work around a problem with Expat);
util/wiki.pyextracts specially-formatted comments from Makefiles and XSL files, and docstrings from Python, to create wiki documentation pages; and
util/revdtd.pyreverse engineers the actual DTD of either the source files, their filled-in counterparts, or the generated HTML files.
It’s a lot of code; it was a lot of work; I’m pleased with how smoothly it all runs; and most of the time I spent building it should probably have gone into upgrading the actual content of the course. But small(ish) tasks are seductive: you can start work at 8:30, confident that you’ll have something to show (even if only to yourself) by noon. Editing course notes, well, the payoff is usually a long way away, and may not come at all: people who read through the first, flawed, version of the notes probably aren’t going to come back and tell you how much better the second version is.
That last observation is the key ingredient of my cure for procrastination: find some partners. I am always more productive when I’m working with people than I am on my own. Not only does a small team wander down fewer blind alleys than someone working alone, team members can keep each other honest, and give each other feedback and encouragement. They can also appreciate just how big an accomplishment it is to have replaced all the a’s and b’s in twenty-eight short examples of list manipulation with the names of minerals, beetles, and mathematicians.
It’s now ten to eleven, and I’ve managed to fend off productivity for almost an hour. Should I look on eBay for a WACOM Cintiq 17SX that I can afford? It’d make drawing diagrams much more fun. Or maybe I should try Nose: Miles Thibault says it’s much friendlier than the unit testing framework in the Python standard library. Hm… A cup of tea will probably help me decide. A cup of tea, and a slice of toast with strawberry jam…