Archive for the ‘Research’ Category
Julia Lane, the director of the Science of Science & Innovation Policy program at the National Science Foundation, wrote an article for Nature a couple of weeks ago titled “Let’s make science metrics more scientific”. As the summary at the start says:
- Existing metrics have known flaws
- A reliable, open, joined-up data infrastructure is needed
- Data should be collected on the full range of scientists’ work
- Social scientists and economists should be involved
The same points could be made about evaluating software developers (or any other kind of knowledge worker). The devil, as always, is in the details, and unfortunately I have to start doing evaluations before those details are worked out. Several of the sponsors for this course need me to demonstrate its impact on the productivity of the scientists who take it (so that they can in turn justify their contribution to their funders). It isn’t enough to ask students who have completed the course whether they think they know more about programming than they used to: ignoring the obvious problems of survivor bias and self-assessment, I would still have to demonstrate that making people better programmers also makes them better scientists. I believe it does, but belief is not evidence, and doesn’t convey scale.
The best plan I’ve been able to come up with so far is to look at how scientists spend their time before and after taking the course, but that would require resources I don’t have. If you’re interested in studying scientists or software developers empirically, and would like some raw material, I’d like to hear from you.
A special issue of Computing in Science & Engineering that Andy Lumsdaine and I edited, devoted to software engineering in computational science, is now available. We’d like to thank everyone who contributed:
- Report on the Second International Workshop on Software Engineering for CSE, by Jeffrey Carver (University of Alabama)
- Managing Chaos: Lessons Learned Developing Software in the Life Sciences, by Sarah Killcoyne and John Boyle (Institute for Systems Biology)
- Scientific Computing’s Productivity Gridlock: How Software Engineering Can Help, by Stuart Faulk (University of Oregon), Eugene Loh and Michael L. Van De Vanter (Sun Microsystems), Susan Squires (Tactics), and Lawrence G. Votta, (Brincos)
- Mutation Sensitivity Testing, by Daniel Hook (Engineering Seismology Group Solutions) and Diane Kelly (Royal Military College of Canada)
- Automated Software Testing for MATLAB, by Steve Eddins (The MathWorks)
- The libflame Library for Dense Matrix Computations, by Field G. Van Zee, Ernie Chan, and Robert A. van de Geijn (University of Texas at Austin), and Enrique S. Quintana-Ortí and Gregorio Quintana-Ortí (Universidad Jaime I de Castellón)
- Engineering the Software for Understanding Climate Change, by Steve Easterbrook (University of Toronto) and Timothy Johns (Hadley Centre for Climate Prediction and Research)
The latest in a series of workshops on “Software Engineering for Computational Science and Engineering” was held in Vancouver on May 23, just after ICSE’09. Steve Easterbrook has written a good summary of what was discussed, and Jeffrey Carver‘s longer summary will appear in a future issue of Computing in Science and Engineering.
Thank you once again for taking part in our Fall 2008 survey of how scientists use computers in their research. We will present a paper describing our findings at ICSE’09 in Vancouver on May 23, and will make the results public as soon after that as possible. There will also be an article in American Scientist magazine discussing what you’ve told us some time this summer.
Our next step is to figure out what makes some scientific computer users so much more productive than others. We would therefore be grateful if you would take a few minutes to answer the questions below and email the result to email@example.com:
- If you think that you use computers more effectively in your work than some of your peers:
- explain why you think so
- describe what you do or know that they don’t
- If you can think of someone in your research area who uses computers more effectively in their work than you do:
- explain why you think so
- describe as best you can what they do or know that you don’t
If you answered either question, we would be very grateful if you could pass this email on to the colleague or colleagues you were thinking of and ask them to answer it as well—we believe we will learn a great deal by comparing responses, as well as from the responses themselves. If they wish to remain anonymous, please ask them to return their response to you for forwarding to us. Otherwise, please have them reply directly to us. (It would be very helpful in the second case for them to mention your name, so that we can pair their response with yours.)
As with the original survey, only the researchers directly involved in this study will have access to respondents’ contact information and/or identities. This information will not be shared with any third party in any way.
Thanks in advance for your help—we hope you’ll find the results useful.
Prof. Greg Wilson
Dept. of Computer Science
University of Toronto
The slides for my talk at the National Research Council on empirical software engineering and how scientists actually use computers are now up on SlideShare. The colors in some of the embedded images were messed up during upload, but the result should still be readable.
Second International Workshop on Software Engineering for Computational Science and Engineering
Saturday, May 23, 2009
Co-located with ICSE 2009 – Vancouver, Canada
This workshop is concerned with the development of:
- Scientific software applications, where the focus is on directly solving scientific problems. These applications include, but are not limited to, large parallel models/simulations of the physical world (high performance computing systems).
- Applications that support scientific endeavors. Such applications include, but are not limited to, systems for managing and/or manipulating large amounts of data.
A particular software application might fit into both categories (for example, a weather forecasting system might both run climatology models and produce visualisations of big data sets) or just one (for example, nuclear simulations fit into the first category and laboratory information management software into the second). For brevity, we refer to both categories under the umbrella title of “Computational Science and Engineering (CS&E)”.
Despite its importance in our everyday lives, CS&E has historically attracted little attention from the software engineering community. Indeed, the development of CS&E software differs significantly from the development of business information systems, from which many of the software engineering best practices, tools and techniques have been drawn. These differences include, for example:
- CS&E projects are often exploring unknown science, making it difficult to determine a concrete set of requirements a priori.
- For the same reason, a test oracle may not exist (for example, the physical data needed to validate a simulation may not exist). The lack of an oracle clearly poses challenges to the development of a testing strategy.
- The software development process for CS&E application development may differ profoundly from traditional software engineering processes. For example, one scientific computing workflow, dubbed the “lone researcher”, involves a single scientist developing a system to test a hypothesis. Once the system runs correctly and returns its results, the scientist has no further need of the system. This approach contrasts with more typical software engineering lifecycle models, in which the useful life of the software is expected to begin, not end, after the first correct execution.
- CS&E applications often require more computing resources than are available on a typical workstation. Existing solutions for providing more computational resources (e.g., clusters, supercomputers, grids) can be difficult to use, resulting in additional software engineering challenges.
- CS&E developers may have no formal knowledge of software engineering tools and techniques, and may be developing software in a very isolated fashion. For example, it is common for a single scientist in a lab to take on the (formal or informal) role of software developer and to have to rely solely on web resources to acquire the relevant development knowledge.
Recent endeavors to bring the software engineering and CS&E communities together include two special issues of IEEE Software (July/August 2008 and January 2009) and this current ICSE workshop series. The 2008 workshop [
] brought together computational scientists, software engineering researchers and software developers to explore issues such as:
- Those characteristics of CS&E which distinguish it from general business software development;
- The different contexts in which CS&E developments take place;
- The quality goals of CS&E;
- How the perceived chasm between the CS&E and software engineering communities might be bridged.
This 2009 workshop will build on the results of the previous workshop.
Similar to the format of the 2008 workshop, in addition to presentation and discussion of the accepted position papers, significant time during the 2009 workshop will be devoted to the continuation of discussions from previous workshops and to general open discussion.
We encourage submission of position papers or statements of interest from members of the software engineering and CS&E communities. Position papers of at most eight pages are solicited to address issues including but not limited to:
- Case studies of software development processes used in CS&E applications.
- Measures of software development productivity appropriate to CS&E applications.
- Lessons learned from the development of CS&E applications.
- Software engineering metrics and tool support for CS&E applications.
- The use of empirical studies to better understand the environment, tools, languages, and processes used in CS&E application development and how they might be improved.
The organizing committee hopes for participation from a broad range of stakeholders from across the software engineering, computational science/engineering, and grid computing communities. We especially encourage members of the CS&E application community to submit practical experience papers. Papers on related topics are also welcome. Please contact the organizers with any questions about the relevance of particular topics. Accepted position papers will appear in the ICSE workshop proceedings and appear in the IEEExplore Digital Library.
Please observe the following:
- Position papers should be at most 8 pages.
- Format your paper according to the ICSE 2009 paper guidelines.
- Submit your paper in PDF format to firstname.lastname@example.org.
- Deadline for submission: January 19, 2009
- Submission notification: February 6, 2009.
- Jeffrey Carver, University of Alabama, USA (chair of the organizing committee)
- Steve Easterbrook, University of Toronto, Canada
- Tom Epperly, Lawrence Livermore National Laboratory, USA
- Michael Heroux, Sandia National Laboratories, USA
- Lorin Hochstein, USC-ISI, USA
- Diane Kelly, Royal Military College of Canada
- Chris Morris, Daresbury Laboratory, UK
- Judith Segal, The Open University, UK
- Greg Wilson, University of Toronto, Canada
Over 1900 people have already responded to our survey of how scientists use computers, and it still has two weeks left to run. Our next task will be to analyze the data we’ve collected, which (among other things) means coding people’s free-form descriptions of their specialties so that we can talk about physicists and chemists as opposed to “this one person who’s doing N-brane quantum foam approximations to multiversal steady-state thingummies”.
Except: are “physics” and “chemistry” too broad? At that level, there are only a handful of sciences: astronomy, geology, biology, mathematics, psychology, um, computing, er, Curly, Larry, and Moe. Or maybe you’d distinguish “ecology” from “biology”. Or “oceanography” from something else, or — you see the problem. Rather than making up our own classification scheme, I’d like to adopt one that’s widely used and generally intelligible, but I’m having trouble finding one. Yahoo!, Wikipedia, and other web sites have incompatible (and idiosyncratic) divisions; the Dewey Decimal System and other library schemes have a very 19th Century view of science, and the ACM/IEEE publication codes are domain-specific.
If anyone can point me at something else (ideally, something with about two dozen categories — that feels like it ought to be about right, just from eyeballing the data we have so far), I’d be grateful.
1731 people have completed our survey of how scientists use computers since it went online three weeks ago. That’s pretty cool, but I’d like to double the number (at least). If you consider yourself a working scientist, and haven’t taken the survey yet, please take a moment and do so. If you aren’t a scientist, but know some, please pass on the link: