Random Blog
Join JournalHome.com.
Create your own free blog today.
Create Your Blog
Flag this entry/bog.
It will be manually reviewed.
Report This!

Code Craft
The art, science and craft of writing quality software

Nov. 22, 2005 - Half of "writing code" is writing

Posted in Programming

Yesterday I had two literary experiences with nothing in common: I watched Harry Potter and the Goblet of Fire with my family and I read an article entitled What is Code in Free Software Magazine. With Harry Potter (the books, not the movies) I feel like Rowling has put together an insanely readable story that offers up its meaning at enough levels to satisfy everyone from an eight-year-old boy to a seventy-year-old woman. In short, it is almost as universal as Calvin and Hobbes. When I read What is Code my reaction was, "Holy Gotham, Batman; this looks like the work of the Riddler!" It was as though I had wandered into a barren cubist landscape where human language is ironically replaced by secret code words. I found myself wading through turns of phrase like, "epistemological dystopia" and bombarded with references to Heidegger (who I would have read more of in junior high but his Nazi leanings were a turn-off). While it's possible that it was a deeply enriching and perspective-shaping event for someone somewhere, Bill Watterson it wasn't.

The most striking difference between these two works of art is not mechanical, or stylistic. The defining difference is really the question of audience. Both authors are telling a story of sorts, but what differs is who they are telling it to. The audience for Goblet of Fire is deliberately very large, but the writers of What is Code were writing to a much narrower target audience.

One of the things that writers learn very early on is the importance of knowing your audience. When you know who you are writing for you are more likely to be understood. Because we don't always know completely who our audience is the natural ideal, then, is that ideas should be expressed in the most broadly-understandable form that does not lose meaning. Not all materials CAN be given a populist treatment. It's fine for scientific journals to be incomprehensible to the layman; if they weren't, many important ideas simply could not be expressed. Even well-trodden popular genres (detective-novels, for example) can contain a lot of genre-specific assumptions about what the reader knows. All that said, however, the principle remains valid: understand your audience and write for the broadest spectrum of people within it.

Knowing your audience in code

This rule of knowing your audience is something we can learn from when writing code. While the audience for code seems obvious (the customer) that's not all there is to it. Other audiences include testers, engineers, and sometimes the people who setup and manage the software as well. All these people care a lot about the code, but they obviously think about it in different ways. Recognizing all those people as your audience and writing code that balances their needs is one of the things that many programmers (and organizations) never do.

Of course the customer is still your number one audience, and if you forget that then you will write your code like a bad author who writes for the critics only to find that no one wants to publish his book. One of the lessons from the failure of Big Design Up Front is that taking the customer out of the picture is a sure way to set yourself up as a case study in what-not-to-do. But this is not what I want to focus on, so let's instead presume that the UI and design and other elements of the software process are in harmony with the universe. Let us assume that they resonate in the key of C with a precision and power unrivaled in human history and that in centuries to come the very name of your project will be used as a Mantra for personal reflection. In that context most of what I'll discuss is how we write for our non-customer audiences.

Don't expect people to remember non-essential details

Imagine reading through a piece of prose describing the appearances of the twelve concubines of Prince Hafiz and then having the narrator describe their actions sometimes referring to them by random features.  "The concubine with green eyes punched him smartly in the groin while the one in the red shawl shaved his back hair."  When one of them ended up beheaded it would be hard to remeber who had done what. This is what we do when we manage business rules without regard to future readers of our code. You are asking for trouble if, for example, the PrinceHafiz class has a method named "getConcubine" and it always returns the concubine with the shortest hair unless her eyes are brown in which case it returns the youngest of the blue-eyed concubines. A method like this caries too much implicit knowledge of the details to be a general pupose method.  It should be named something like "getPreferredConcubine." With that kind of name at least you know there is some kind of hidden domain logic luking in the depths.

The other approach that is tempting is to simply provide a method named "getAllConcubines" or maybe "getConcubinesByAge" and "getConcubinesByLengthOfHair" That way users can presumably find whatever concubines they need. This sounds logical and simple, but it's asking someone else to remember the non-essential details about the kinds of concubines that Hafiz prefers. This is Hafiz's preference and not-essential to other people's understanding of Hafiz. Of course it should be noted that obtaining the preferred Concubine sounds like a private method, but perhaps there are legitimate public uses that are not clear from this context.

Avoid unnecessary jargon

Jargon is language that is specific to a particular field or interest area. For example, a wicket is cricket jargon and NASA is US Government jargon. In general writers frown upon jargon, but many times it has its place. Inside a particular domain a term that is jargon can convey a more specific meaning that is appropriate to that particular audience. If, however, jargon doesn't actually convey a more specific meaning then it shouldn't be used. In other words, use jargon if it helps clarify and avoid it if it does not.

Returning to PrinceHafiz, it may seem tempting to observe that a Harem object would be a good place to keep the set of Concubine objects. Maybe, but nothing we've seen so far suggests it. Until the harem means something other than a collection of concubines it's just useless jargon. As it turns out, however, a Harem need not be a simple collection of Concubines. PrinceHafiz also needs a collection for his four Wife objects. Now Harem starts to look more and more appealing. Still, if all a Harem contains is two collections it's not a very useful object. Only when it begins to acquire useful behaviors does the cost of injecting the jargon become worthwhile. In this case, perhaps it is useful to sometimes think of the Harem as two collections of objects (Wife and Concubine objects) and other times as a single set of Woman objects.

A word like harem is relatively meaningful to most people (although there are two other meanings than the one I used above) but a word like Polychloride is more or less meaningless to most people. The more generally understood a piece of jargon is, the less your jargon alert needs to go off. In the case of Harem the test is whether the word adds enough clarity of meaning to account for its violation of the anti-jargon clause. To answer this question you must first ask what alternatives are available. We could use a made up word like "WivesAndConcubines", or we could use the more general term "Lovers" or we could choose not to have an equivalent term and simply have two collections (a set of wives and a set of concubines). Using "Lovers" is tempting, but wrong. While lover is not jargon, it is also not an accurate description of a collection of wives and concubines (as the set of lovers might include additional entries and exclude some of the earlier entries). It's definitely better to use a bit of jargon rather than use something that is wrong. "WivesAndConcubines" is accurate, but frankly it feels pretty stilted; I'd take the jargon hit rather than working with an object like that one. Lastly, we could do without. Here the answer is less simple. If we mostly interact with Wife and Concubine objects independently then there is very little merit in the Harem object, but if harem is going to get used as a general collection a lot then it really needs to exist.

Interestingly, this vague uncertainty about when it's ok to use jargon exists in writing as well. In the story of the twelve-concubines of prince Hafiz the reader will find herself immersed into the world of the prince. In that world the harem will get regular use and the reader will quickly gain comfort with the meaning. If we were writing about a pimp in South Chicago and threw out the harem reference casually it's not clear that the jargon cost would be overcome (here the better term would be... never mind).

Use concrete language

Writers are taught to use the most specific language they can. Don't say "dog" when you mean "pit bull" or gun when you mean "AK-47 assault rifle." I find it strange that software writers seem to go in completely the opposite direction. They frequently try to create the most abstract language possible. So that PrinceHafiz's collection of HuntingRifles ends up as a set of Weapon objects (which is extended by Gun and then LongGun and finally Rifle). Sure the prince has a collection of Whip objects as well, but these are two discrete collections (used in different contexts) and so bundling them up really just makes things confusing. These kinds of deep abstractions need to emerge naturally and not get created on designer whim. Also, with regard to naming, if the objects are HuntingRifles they should not be called Guns.

In general, when writing code we work with abstract things, something like a PrinceHafiz class seems suspect as it sounds like a singleton, but there are cases even for concrete objects like this. Consider an ASP whose software provides "concubine tracking" services. Basically they place small wireless receivers with GPS on the necklaces of concubines and then, through the magic of the Internet, they keep track of where all concubines for each client are at all times. Now let's imagine that the Sultan comes to you and says that he is willing to pay an extra ten million Dinars if you will alter the code so that when a particular concubine named Aludra leaves the prince's compound she will always be recorded as being just outside in the garden until she returns. Ignoring potential ethical considerations, let us assume that your company agreed (that's a lot of Dinars after all). Now, Doesn't it make sense to extend your Concubine class and make a concrete Aludra Class that hides the regular behavior. Surprisingly, most people would just insert a ton of if-statements into the regular code thinking that since this is a hack there is nothing they can do, but that approach ignores the rule of using concrete language. Aludra is a special case. She gets her own class. If you know about Aludra you understand her class. If you don't know about her then she looks like just another Concubine object (except for that cute mole on her left cheek, but I digress).

Avoid the passive voice

This is also known as the "just say what you mean" rule. Rather than "Prince Hafiz though that Aludra could be considered the most beautiful of all the concubines" it is better to write, "Prince Hafiz believed that Aludra was the most beautiful concubine." In code, there is also passive voice. Objects are passive when they hand over their contents rather than acting on them. What kind of wussy object would PrinceHafiz be if he let other objects obtain his Concubine List and then do what they want with them. Consider the following implementation [in random pseudo-code] of the "giveAllowances" function (implemented by the PayMaster class).

def giveAllowances( prince)
    foreach (concubine in prince.concubines)
        if (concubine.isPure)
            treasury.dinars -= concubine.allowanceAmount;
            concubine.dinars += concubine.allowanceAmount;
        endif
    endfor
end

Now replace that with this code in the PrinceHafiz class:

def giveAllowances( treasury)
     foreach (concubine in concubines)
        if (concubine.isPure)
            concubine.pay( treasury);
        endif
    endfor
end

Although these two are pretty similar, one is passive (PrinceHafiz lets the PayMaster interrogate the Concubines and give them the money) and the other is active (the prince does the work). The latter is better because objects should act on their contents and not simply contain them. The attentive might notice that the isPure test might also be moved to the concubine.pay method. Yep, that is probably true since the Concubine object shouldn't be passive either (although there may be different pay rules that stand outside the Concubine so the case isn't clear-cut).

Avoid sexist and misogynistic language

Ahh, uhmm... yes, as I was saying [cough, cough]... the next point is...

Avoid unnecessary citations

There are many ways to reference other works. In the Simpsons, for example, most episodes contain references to literature, mythology, politics and a host of other things. In general this is kept at the level of transparent subtext. People familiar with whatever they are referencing will "get it" and it will add another layer to the humor. People who don't get it will experience a different more superficial level of humor and still not lose track of the plot. This is the cleanest and most understandable way to reference other works. The other end of the scale if when a piece of writing is basically an extension of one or more ideas that were developed elsewhere. In those cases it is fully valid to require the reader to be familiar with the works being extended. Bad writing is allowing the citations to overshadow the actual meaning of what is being written (this is sometimes called academic writing). References to other works that do not add to meaning should be avoided or at least explained so they don't create undue confusion.

These themes also apply to writing software. Some code can become nothing but a long running series of citations (invocations) of other systems (libraries). The challenge of the good writer is to allow readers (maintainers, testers, etc) to use the code without understanding everything about each of the libraries on which the code depends. In other words, good code is like a Simpsons episode, the complexity is hidden beneath the surface. 

In the Hafiz example, imagine that a pretty printer is needed so that when "showSelf" is called on the Concubine object the generated HTML is readable. Ideally the "showSelf" method should hide this, perhaps by working directly with the pretty printer library to get the HTML cleaned up. The problem is that as soon as the Concubine object starts using the pretty printer the Wife object and the children and probably any number of other objects will just "have to have" pretty printing too. It's a viscous marketing driven circle. Now if the pretty printing object is complicated (even just a little bit) each of these objects will find itself becoming a pretty printing expert (first I adjust the options, then I apply the filters, then I lengthen the line sizes, etc). It's much better to bring the expert to them. Just wrap the darned pretty printer so that all objects get the same treatment (full XHTML) and let the objects forget about having to understand pretty printing technology at all.

Code it for someone else to read

There are a lot of ways to think about the code you write. You can examine it for computational complexity or consider how well it adheres to the Law of Demeter. There are a thousand things you might consider. What I think is useful, though, is to step back from the technical and ask yourself, "could a reasonably smart programmer read this easily?" and more importantly, "would I want them to know it was me who wrote it?" This kind of thinking may not be tightly defined, but it can help you get a grip on why something doesn't quite feel like a clean solution. Taking a page from the literary notebook and thinking about your audience is a good way to get this kind of perspective, and in a sense it is also the deeper reason that things like the Law of Demeter are true (to the extent that they are). Encapsulation, functional decomposition, syntactic sugar and all the rest of the things we discuss when talking about good code are really just abstractions to help us get our hands around the question of whether people (besides ourselves) will be able to read and work with our code in the future.

One parting note... If you work for one of those companies that deliberately writes Nancy Drew Mysteries code (you know, one big template repeated 1000s of times over and over with just enough changes to keep it different); "may your wives find shade under another man's tent!" Although who am I to complain, Bill Watterson I aint.

Share |
Post A Comment!

Notify me of followup comments via e-mail.

Nov. 23, 2005 - active voice and message receivers

Posted by Anonymous
should 'concubine.pay(treasury)' be 'treasury.pay(concubine)' instead?
Permanent Link

Nov. 23, 2005 - Untitled Comment

Posted by Anonymous
Gladl to see you're applying lessons learned in on your voyage!
Permanent Link

Nov. 25, 2005 - I hate to rain on your parade

Posted by Anonymous
Especially as it was such a damn fine parade, but I think you'll find his name is "Bill Watterson".
Permanent Link

Share and enjoy
  • Digg
  • del.icio.us
  • DZone
  • Netvouz
  • NewsVine
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • YahooMyWeb
<- Last Page • Next Page ->

Kevin Barnes

Code Craft is the place for my thoughts, rants, ideas and occassional jokes on what it means to write code, why some people are better at it than others, and how we think about software in general.

Copyright (C) 2005, Kevin Barnes. All rights reserved.