Wednesday, 4 February 2015

Bad DIY is not carpentry: on coding in academia

I have a friend who's hobby is woodworking. Half of the basement of his and his wife's house outside Philadelphia is dedicated to his hobby. The large dining room table around which the family gathers was made by him from the cured wood of an old oak tree he salvaged after a storm. The table is beautiful, inlaid with rosewood, with intricate foldaway leaves. It is precise, the result of years of craft and practice in the service of a single result: to make a beautiful, functional table. Curiously, until he retired, his career was physics. He's never done woodwork for money, only for pleasure. But his work is exquisite.
Conversely, I can barely saw straight. I took shop classes after school, where I learnt the rudiments of planing, sawing, and hammering. I made a box from off cuts of cheap wood, and a toy castle, held together with glue and impatience. I would not use the terms woodwork or carpentry to describe what I do, even though in a pinch, I could probably slap together a vaguely functional table. In fact, I once did. It fell apart after a year and still is the source of much hilarity for my more practically inclined friends.
There is, I think, a distinction in the mindset of woodworking and DIY. Woodworking, though it may have the goal of producing a useful, functional, or beautiful object, is also concerned with the process, materials and skills intrinsic to the craft. A woodworker makes things to become better at woodworking. Conversely, DIY is more utilitarian. The purpose is to produce a functional result, and improvements in skill are valued only in so far as they help achieve the final functional goal.
The point of the above: I think that much of the code and software that is used in modern science fits the definition of DIY, not woodwork.
In our lab, we collect electromyographic data (electrical signals from actively contracting muscles). Because of various properties of the muscles we record from, and because of various properties of the animals we work with, our EMG data require idiosyncratic post-processing before interpretation. To do this post processing, we use a piece of software that was written by a collaborator of my PI probably over a decade ago. Or it is perhaps more appropriate to say, we endure this piece of software. It is unstable, the code is lost, and in fact it is probably written in a legacy language. It was written by an amateur coder whose sole concern was to get it to do the post processing we require. The formatting requirements are fickle, the software is only partially compatible with modern operating systems, and it randomly stops working for reasons we cannot understand. And yet, much like an inexpertly fitted IKEA kitchen, it does the job well enough that we cannot justify the expense of developing something better.
Similarly, my dissertation has about ten pages of Matlab code in it. The code was pretty much when I learnt to write basic manipulations. It is a nightmare of nested for- loops because I never figured out how to do logical matrix indexing, and for- loops did the job. The code does the job, is reasonably well commented, and has meaningful variable names (probably the only good coding practice I have). But it is quasi useless for any project other than my dissertation. And an actual coder would probably rewrite from scratch.
There's a lot of push for scientists to learn to code. And certainly, the ability to do our job these days requires at least the ability to formulate how we would like our data manipulated and to competently explain what the software we use does. I am radically opposed to black box approaches to data analysis software and overly customised solutions. They are anathema to good quantitative analysis that is sensitive to the specific structure of data and to the proper formulation of hypothesis tests. And yet in encouraging a DIY mentality, we encourage the proliferation of cludgy, inelegant, and unreliable software.
Yet those scientists who dedicate themselves to actual code carpentry often feel denigrated as "methods people". We are all grateful for their software (especially when it saves us the effort of writing a miserable few lines of code) yet do we remember to cite them? Do we defend their contributions to the field? I'm part of a macro-ecology journal club that includes a good number of methods people who take pride in writing efficient, elegant, usable code in the R- environment, and these issues come up often.
In an ideal world, we'd make more use of professional software developers. Yet on the other hand, our questions are sufficiently esoteric that I wonder how useful that would be in practice. But I do think we need to be careful of over celebrating a DIY coding ethos, and maybe work to nurture, celebrate,  and collaborate with the genuine software carpenters in science more. 

No comments:

Post a Comment