Code Craft
The art, science and craft of writing quality software

Aug. 9, 2005 - Let the hard stay hard

Posted in Articles

Very few things are certain in life. Here are a few things that are:

 

  • A gas will expand to fill the available space.
  • Work will expand to consume all resources and time.
  • The most complex problem in a piece of software will expand to affect the entirety of the code.

The first two of these are well know and most people will recognize them.  The last one is a truth that is a bit harder to appreciate.  Understanding it is, in my opinion, one of the keys to becoming a master programmer.  It is also similar to the first two items in that it can be avoided only at the cost of work.  Leaning how to overcome this tendency of complexity to expand is the point of this discussion and the reason for the title: “let the hard stay hard.” 

 

To understand exactly what I mean by complexity expanding to affect the entirety of the code, let me start with two very simple examples and then move on to two larger (and perhaps controversial) ones.  The goal of looking at these is not to provide answers or fixes to the specific problems, but to observe the strange tendency of code to expose complexity despite our best efforts to hide it.

 

Log rolling for lumberjacks

 

This is just about the simplest example of complexity creep I can think of, and it’s present in a very large number of systems.  It looks like this (in some arbitrary language):

if (logger.isDebugEnabled())

{

    logger.debug(“some text ” + someObject +

        “some other stuff”);

}

If this code was hiding complexity properly it would look like this:

logger.debug(“some text ” + someObject + “some other stuff”);

The complexity in this case is performance.  Because construction of a string takes place on the stack prior to the function being invoked, a lot of debug logging can be very costly.  To get around this problem we have diligently wrapped this call to the debug statement in a test.  This is classic complexity spread.  The problem is now the responsibility of every developer who writes a “debug level” logging statement. 

 

In this case, the problem can be solved easily via #define, templates, aspects, or code generation.  Depending on the available language features this may seem like a lot of work, but solving the problem once makes it go away forever.  Otherwise the complexity bleeds across many thousands of lines.  If you can’t take the time to hide the performance complexity, maybe performance isn’t such a big deal for you and you can just ignore the performance impact and leave the “wrapping” exercise for later.

 

Who owns this property

 

Another surprisingly common case of complexity spread can be seen in code that looks like this:

int targetValue = DEFAULT_INT_VALUE_CONSTANT;

String someValue = null;

try

{

    String someValue = properties.getProperty(NAME_CONSTANT);

    if (someValue != null)

    {

        targetValue = stringToInt(someValue);

    }

}

catch (Exception e) # perhaps multiple types

{

    # do nothing or perhaps log an error/warning

}

This code is pretty simple, and I’ve probably seen something like it written 1000s of times over the years.  Sometimes the exception isn’t even caught and (of course) the levels above also have no idea what to do with the exceptions that are thrown.  There are maybe three or four legitimate possible constructions of this code, but the question is why is it not written like this?

 

int targetValue = properties.getIntProperty(NAME_CONST,

    DEFAULT_INT_VALUE_CONST);

If you have some kind of Service Locator pattern or are using code generation, you might be able to make the whole thing declarative as well, but that’s presumptive.  At least the above code should be the maximum amount a user of properties should be required to write.  Frankly, it amazes me that the writers of properties classes don’t universally include such methods, but that’s not an excuse for not wrapping/extending the class and providing the functionality for yourself.

 

How is this an example of complexity spread?  The complexity was introduced by making certain constants or values externally definable.  The Properties class was the tool to do it, but we still spread much of the complexity out throughout the application by forcing people to deal with the error conditions (and type coercion issues) each time the class is used.  In practice there are only three things you can really do when a property is incorrectly defined: 1) use a default and continue, 2) use a default, “display” an error somewhere and continue, or 3) abort.  Even if you want to do all three depending on circumstances you can hide this complexity fully without much work.

 

XML based systems are frequently littered with variants of this problem.

 

In both of the above cases, the complexity was of a very limited kind.  As a result there were also very simple solutions.  Despite this, very few people actually bother to do the work; so, like an expanding gas, the complexity has filled the space of their code.

 

The next few examples are much bigger and hence more controversial.  

 

I know what I know, but I can’t remember where I put it

 

Memory management is an intrinsic complexity of writing software.  Programs require memory.  That memory requires management.  Most (but not all) early functional programming languages consciously chose to expose much of this complexity to users due to performance considerations.  C++ continued to expose this complexity, probably for the same reasons.  This one choice has exposed millions of innocent lines of code to complexity they never wanted to know about.

 

Anyone who has written a substantial body of C++ code and also a substantial body of code in a garbage collected language can write the rest of this section themselves without my assistance (the holdouts can feel free to write me whatever rants they like).  C and C++ have an entire industry dedicated to nothing more than solving this one problem.  There are books, tools, patterns and approaches.  This piece of complexity spread is one of the most insidious and notorious around and is probably one of only two or three reasons for the rise of Java.  Otherwise Java was (at that time) inferior in many other respects.

 

Just to be clear as to why this is a case of complexity spread: managing memory is a principle concern of languages.  Writers of code in these languages are forced to think about it everywhere throughout their code rather than managing that complexity in just one place.  Hence, it is complexity bleeding out to affect the entire approach and substance of the code (this leaking is more like a broken pipe in this case, but the principle is the same).

 

Garbage collection comes with a price tag, but it is one most programs can and should pay willingly.  Not to dwell on solutions, but C++ can be garbage collected and Smart Pointers are even better than garbage collection in the cases where they make sense (which is not everywhere).

 

J2 everything

 

This final example reeks of controversy, but regardless of your take on the value of J2EE and its various elements, there is little doubt that it is a good example of complexity spread.  Even the most ardent admirer of EJBs will admit that there are quite a few things people commonly do “wrong” when developing with J2EE, which is synonymous with an acceptance that it has spread complexity to the user.  Because the details of this problem set are well known to people who have lived in this environment and completely foreign to others (say Python developers), I’ll pick just one detailed nit as an example. 

 

Let us consider creating an arbitrary service using EJBs (EJBs are, in fact, services and not more generalized objects in the proper sense, so I won’t confuse people by misnaming them).  To create this EJB (service) I will need to create a grand total of 3 files (plus 2 if we want local interfaces) and create entries in two XML files.  Of course I will also have to use the EJB which requires that I handle a host of potential exceptions, talk to JNDI, and perhaps deal with some caching issues. 

 

If I do nothing (beyond what J2EE provides and requires) I will be writing (and maintaining) a substantial body of code and declaration to create my services and in addition I will be writing on the order of 20 lines of code every time I invoke one.  Now that’s complexity spread.

 

J2EE advocates will rightly point out that with some up front work you can make this go away.  I agree.  At this very moment I am creating EJBs with nothing more than a couple of XDoclet tags in the comments of my service classes and using them with nothing more than a method invocation on an object (provided via either a single service locator call or by dependency injection).

 

Of course much of this is work that I was forced to do myself or work that others were forced to do (and which I borrowed).  These are not J2EE components but a set of wrapping layers.   The point is not that you can’t hide complexity; the point is that you (or someone) needs to do work to make it happen.  I have seen lots of implementations that failed to do that work.  In fact, for many years it was hard to find anyone who was doing it completely.

 

There are still other pieces of complexity to deal with when working with EJBs.  There are a number of “rules” one must remember when creating the service objects.  There are requirements on the parameters that are passed that are not compile time verifiable (without work).  And of course you must architect in a services layer (which might not be a natural part of your architecture otherwise).

 

To save myself a rant or two please note that I have not said that J2EE is bad.  I have just said some bad things about J2EE.  The overall utility of J2EE depends entirely on how it is used and in what particular circumstances.

 

Let the hard stay hard

 

This has been a pretty long run up to the main point of the discussion, but here it is:

 



Let the hard problems BE hard
so that the simple problems will BE simple.

 

When making a rug, the weaver spends extra time on the edges because he understands that they will fray.  He does not add extra weaves to the body of the rug to solve the problem.

 

When constructing a building the builder does not strengthen the entire roof to enable the placement of the air conditioner, he strengthens only the load bearing section (or wisely moves the compressor off the roof entirely). 

 

As we have seen, spending too little time on complexity tends to result in being forced to spend even more time on the same complexity (due to spreading).  Solving the problem once, truly and completely pays off for the lazy programmer.

 

The smell of complexity spreading

 

Since complexity spread is such a problem, it’s important to learn how to spot it, or to use Fowler’s terminology, how to smell it.  As there does not appear to be any religious significance to Fowler’s smell nomenclature I shall make up my own here and reference only a few of the well know ones he has laid out.

 

The biggest single smell indicating that a system has been infected by creeping complexity is the Needle-in-a-Haystack odor.  When you can look at a piece of code, determine that its behavior is logical and consistent, but it takes you a lot of looking to find out what that behavior is you’ve encountered the Needle-in-a-Haystack smell.  Long Functions and Lots of Branching are smells that contribute to Needle-in-a-Haystack, and are also signs of spreading complexity.

 

Stranger Intimacy is another sign that a class wants you to know too much.  If lots of classes seem to be using a single class that is not central to their main purpose, then they are exhibiting Stranger Intimacy and creeping complexity is probably taking place.  Stranger Intimacy frequently goies along with the equally damning Random Handshake wherein a class seems to require random small interactions with unrelated classes to be kept stable (cache-invalidation comes to mind as an example).

 

The Parade of Fools is another bad sign.  The parade of fools occurs when a class requires a host of “helper” classes to be used properly.  These helpers are generally stateless collections of functions.  In general, I consider such “helper classes” to be a smell in and of themselves.

 

When you start to see any or all of these problems in your code, stop what you are doing and save yourself a week of effort for each hour you spend identifying and hiding complexity.

 

Stopping the bleeding before it starts

 

These are, of course all after-the-fact assessments.  It would be nice to know in advance where complexity lies so that it can be avoided.  In discussing some suggestions on how to avoid complexity creep one item to bear in mind is that avoiding creep can cause it if you aren’t sure you understand what kind of complexity you are facing. 

 

Fixing problems you don’t have is a 100% guarantee of increased complexity.  A hard nosed attitude of “you aren’t going to need it” is a good idea, but only if seasoned with a strong dash of “willingness to admit mistakes” and only if you are willing to refactor as problems begin to appear.  This attitude should also not cause you to ignore what you really do know out of dogged attachment to a rule of thumb.

 

To learn how to let hard problems BE hard, lets re-examine the ealier four problems and see what “went wrong” in each one.

 

If you think about these examples from the tool-maker perspective you can easily understand some of the reasons that the tools do not hide complexity as completely as we would like.  In each case the tool maker faced limitations owing to his position as use-ee rather than use-er. 

 

In the logging example, the logging object cannot work “outside” of itself to provide meta-programming elements (although in languages with #define or certain kinds of templating this is not as true).  As a result the best they can do is to offer the user the tools to enable that meta-layer.

 

In the properties class, the tool maker did not want to presume too much about how the tool would get used.  Does the user want to error, do they want to abort, or do they want to continue?  Although this explains some shortcomings, there are still issues.  I will discuss these later.

 

In the C++ memory management case, the designer was forced to choose between two evils (the evils of garbage collection and the evils of manual memory management) and chose the solution that enabled maximum flexibility for the user (at the cost of exposed complexity).

 

In the J2EE case, there are literally millions of potential use cases.  One can imagine the difficulty in trying to find a universal solution.  By exposing so much of the internals the committee enabled the broadest possible use.

 

This then brings us to the first rule for tool makers (all of us)

 

Keep it simple stupid

 

Do not sacrifice the 95% case for the sake of the 5% case.  Both memory management and J2EE are clearly failing in this specific area.  Because there are some very challenging use cases they have made life very hard for the 95% of the times when the use cases don’t apply.  If C++ had enabled a simple garbage collected object type in addition to supporting more self-managed object types an entire industry would now be nearly out of business.

 

This rule should not be used as an excuse not to provide needed elements, if a slightly more complex path can still provide the additional 5% without impacting the 95% it should be made available.  Most garbage collected languages fail here.  Because you don’t usually want to manage memory they assume you never will.  There exist mechanisms to allow users to manage object lifecycles (forcing required complexity on them in those cases) that work fine for languages with garbage collection.  Sadly these have not been provided by most of the languages that use garbage collection.

 

Fail like you mean it

 

Provide ways to handle expected failures transparently.  This is the problem with the properties system.  It simply did exactly what it was asked to do and throws the exceptions and failures back to the user to deal with.  They could easily have provided a set of obtainer methods to cover the various data types (handling type coercion issues) and they could have provided a “throw on failure” optional parameter to either allow or suppress exceptions.  There are a half a dozen solutions each of which is better than the one provided.

 

In a sense Exceptions have made things too easy on tool designers.  In far too many tools exceptions are thrown for scenarios that are very close to normal.  Of course you need to be able to allow the user to be aware of the failure if necessary, but you also want him to be able to provide a default answer that will suffice in most cases.

 

For example, why not provide a setErrorHandler method on some service-ish classes to handle those really hard-core errors where the system has died horribly.  The passed object can be used if provided, otherwise the exception can be thrown as a “runtime” exception.  If such a service is really an obvious mandatory element then maybe just force it as a compile time creation of the objects.

 

Providing methods with default-accepting signatures is also a good idea (as in the properties case).

 

Ten is better than one

 

Make multiple small independent tools.  When a problem is so big and has so many use cases that it is simply not possible to identify a 95% case, it’s time to consider the problem as multiple problems and solve the 95% case for each.  This is definitely the J2EE problem.  The slicer-dicer-puree-and-make-julianne-fries Ron-Co mentality does not work with software.  

 

These separate tools should naturally be just that, separate. If they are deeply engaged and intertwined, they will have a knowledge-binding that makes each of them overly complex.  This is not to say you can not have binary dependency, just not notional dependency.  JNDI, for example, is a deep notional dependency for the J2EE user.  That’s unfortunate since it has nothing to do with the problem the user is solving.  If you want to use JNDI then hide its existence and use from the person using your tool (to the degree possible).

 

JCS (Java Caching System, part of the Jakarta Commons) is a good example of having hidden notional dependency well.  In JCS if you use distributed caching you will be using JGroups for communication.  If you want that feature you will need to know a few things about JGroups (those minimally required to have multiple machines communicate) but only if you want that high-end feature.  Others need know nothing at all. 

 

Actually, open source naturally tends to produce small independent solutions, and for me this is probably its biggest selling point.  When bringing in a tool from an open source project the odds are much better you will not be saddled with unused complexity.

 

Throw the design away and get a good night’s sleep

 

When you’ve come up with a tool design and you ask yourself: “self, what would it be like to work with this tool?”  And your self answers: “it would suck worse than a twenty pound tick on a hound-dog’s ear!”  It’s definitely time to throw it all away and get a good night’s sleep.  Also, maybe you should see a doctor about that split personality problem.

 

These then comprise some very basic ideas that tool makers should be aware of.  Tool users have a related set of concepts that will help them avoid, hide and manage complexity.

 

Never solve other people’s problems

 

I’ll write extensively on this topic another time, but the short version is this:  avoid tools that solve other people’s problems.  If you are writing a blue-fish management program and it needs to solve the problem of blue-fish-escaping and you find a tool that solves the problem of blue-fish escaping and also solves the problems of fish-hyperventilation and fish-telepathy and fish-that-grow-legs you might feel you’ve gotten a great deal.  After all, look at all these other problems you won’t have to worry about. 

 

This is wrong-think, also known as manager-think.

 

The problem is that no matter how well the tool is written there is a very good chance that little pieces of the fish-telepathy problem will have bled out and affected the blue-fish-escaping problem.  Likewise, the other features have a high probability of having bled.   Try to find a smaller tool, one that focuses on the blue-fish-escaping problem.  If a tool solves just your problem (even if incompletely) it is probably only a small amount of work to ensure that the complexity has been well isolated.  Do not rationalize that maybe-perhaps-some-day we might have fish that grow legs.  It is very unlikely that the rest of your design accounts for that problem (it shouldn’t actually) so having this one piece work with leg growing fish will not help you.  It is somebody else’s problem.

 

If the only tools you can find are much bigger than the problem you have to solve, write your own (much smaller) solution and maybe give it away so that others can benefit from it.

 

Answer := power(refactor, aleph0);

 

You read that right, refactor continuously.  If you’re thinking this is not a way to stop complexity from bleeding before it starts, you are right.  I’m just repeating it here because it is the most important thing you can do.  You simply cannot expect yourself or your team to have the psychic powers to predict all the issues they will face before they face them.

 

Wrap yourself a present

 

Wrap third-party tools in your own thin layer.  This is controversial, but it has never failed to pay off for me and takes almost no time (almost != none).  I almost never start with an inherited wrapper; instead I recommend you use a delegation model.  If it turns out you need to provide most of the methods of the tool you are wrapping, fine go ahead and inherit.  I realize that I have just recommended a practice identified as an odor in code, but there is a big difference between wrapping someone else’s code (that you can’t change) and your own (which you can).  A wrapping layer provides a means for doing any of number of things from logging to special-case management to hiding functionality that creates problems for you to fixing bugs in the implementation. 

 

The other really important thing that the wrapping exercise itself will do is to make it clear just how complex the interfaces to the code you are using are.  If it looks like you are going to be delegating and wrapping lots of methods and functions you have a good clean warning signal that you may need to spend some time coming up with an approach to simplifying the complexity that the tool is presenting you.  It's critical that if you follow this advice you don't make the wrapping exercise a "mindless" activity.  If it is you are better off not wrapping.

 

Fewer is better

 

Classes are a good thing, but when an element has a large number of classes that the user needs to work with there is a high probability that complexity is being spread.  The idea is to hide the complexity so that while you may have ten or a thousand classes and some interaction paths between them, they still have a small and simple interface representing a fraction of the internally used elements.

 

Layers are … good?

 

Layering (for the sake of this discussion) is the practice of creating sections of an application that are isolated from one another via a thin channel of interaction (an API of some kind).  This one is tough for me to recommend because I have seen layers create problems and I have seen layers solve problems.  Layering can act as a barrier that triggers awareness that a piece of complexity has not been hidden well.  It can also be a cause of complexity as data that should be transparent loses its transparency.  This means you need to be careful about creating layers in your application, and it means I need to spend a lot more time discussing this issue in a future article.  For now, try this as a rule of thumb: only layer if you know the layer will do more harm than good or if the need for the layer has already emerged in the process of writing the code.

 

There are no rules in fight club

 

My fifth grade PE teacher once told that no sport was ever worth playing if someone couldn’t lose an eye doing it. 

 

The point is that we need to be willing to take risks if we are going to do the best job possible.  I’ve seen more complexity leaked just because people assumed that the generally accepted norm was correct than for any other reason.  Don’t be afraid to break the rules if it makes sense.  Write your code in Smalltalk or Lisp if it looks like the best way to solve your problem.  Use Perl or Python or code.  Write to a file rather than a database.  Remember that the problem you are facing and the solution you will provide are real and not an academic exercise.  Choosing a non-status-quo solution implies that you have taken the time to understand the repercussions of that decision and that you will not be leaking complexity (installation complexity, code complexity, licensing complexity), but otherwise don’t stand on custom.  Properly containing the problem is the most important aspect.  On the flip side, don’t just be an iconoclast to be an iconoclast.  The decision must always be about the problem and not just an arbitrary expression of personality.

 

Commanding complexity

 

The fight against complexity requires that our full set of resources be marshaled.  Each person on a team needs to spot it and “managers” need to learn how to help people avoid it.

 

The most potent guards against complexity are frequently the least experienced team members.  They will naturally ask questions like “why do I have to do this?”  If the code does not speak to them in a language that makes sense, and if they are constantly breaking “rules” then you need to look for what complexity is hampering them and remove it.  The key to enabling this is having a learning environment where people are encouraged to bring up their questions.  Let them know how important this role is.

 

The more senior team members need to be encouraged to discuss complexity and they need to become fearless in their efforts to fix it.  If you have discussed it and people think an idea will benefit the code base find a way to get it done.  Developing that habit will pay of over time.  Remember, the code will last longer than you think.

 

Managers need to learn to understand the tendencies of the team members.  Every team will have Natalies and Bobs and Karls and they will all introduce complexity in their own ways.  Some will import complexity, some will write complexity and some will expand upon complexity.  The challenge is to learn to spot these tendencies and then to educate them about complexity and how they can work to avoid it. 

 

Expanding gas

 

When I started this article I mentioned three laws of expansion.  Apparently there is a fourth: my writing will expand to fill the available space.  It’s a good thing the internet has a lot of room in itJ

 

Despite its size, I hope this article has helped to get you thinking about the complexity in your applications in a new way.  If it has then the expenditure in work has done more than create heat, it has changed the system state.

Share |

Aug. 26, 2005 - Are you a twin I didn't know?

Posted by Anonymous
The better twin anyways :)



I spend a lot of time thinking on most of the stuff you blog about, and I really resonate with this article.



I wrote a while back on Good and Bad code at:

http://essiene.blogspot.com/2005/02/ramblings-on-good-code-non-complex.html

and

http://essiene.blogspot.com/2005/02/code-ramblings-on-coding-stlye.html





And more recently I added a few thought on Solution Layering (Not exactly in the same way as code layering, but the basics are the same)



http://essiene.blogspot.com/2005/06/solution-layering.html



I really love your writing style (like I said... you seem like a jus discovered better twin maybe from a parallel universe :-) )



Cheers,



Essien Ita Essien
Permanent Link

Nov. 19, 2005 - Care to elaborate?

Posted by Krasna Halopti
"In this case, the problem can be solved easily via #define, templates, aspects, or code generation."



I don't quite see how, in the general case; please elaborate. It seems to me that the problem is one of avoiding the computation of message strings which will then not be logged because of logger filtering. This is, in general, more than just string concatenation, of course. To avoid the waste, there needs to be some way of deferring the computations until we know that the logging is definitely going to happen. In order to wrap this complexity within the logging system, it needs to effectively know how to defer evaluation of any arbitrary expression (which might be used in constructing the message) until after the filtering happens. How can these expressions be specified easily to the logging system? I don't think they can, except via some method which adds just as much complexity. At least in the if-test, it need only be used only where the useless computation is a problem. The code should contain the unconditional statements, until and as when profiling shows that the computation is problematic from a performance perspective.



In terms of your suggested means of avoiding the complexity, I'd say:



#define - not available at runtime.

templates - not sure what you mean.

aspects - fine for method entry/exit logging but it's harder to do inside a function, isn't it?

code generation - not available at runtime (or if it is, it may lead to performance problems of its own)

Permanent Link

Nov. 19, 2005 - short answer

Posted by codecraft
The if-test is not the problem. The problem is having to write it yourself. With #define, for example you can include the if-test in the generated code. The user writes something like LOG(logger,error,stuff) and the code generated by the #define looks something like if (logger.isEnabled(error)) { logger.log(error,stuff); } this is exactly the same as writing this code yourself but the system generates it for you. The if-test still takes place, but the coder doesn't write it. Of course #define does not exist in Java and generics don't in-line the way templates do, so you are left with code-generation as your likely solution. If you do no code generation it probably isn't worth inserting it just for these purposes. If you DO code generation, adding this small element is probably a good thing. My purpose with this example, though, was to show how even very simple things spread complexity, in some languages the cost of avoiding it may be too high, in others it is not.
Permanent Link

Jul. 25, 2012 -

Posted by hire felons
It's very simple to find out any topic on net as compared to textbooks, as I found this piece of writing at this website.
Permanent Link

Aug. 7, 2012 -

Posted by dogging locations
Hey, thanks a lot for an exceedingly useful piece, I would not customarily attach compliments but enjoyed your post and as
a result thought I'd say many thanks -- Kate
Permanent Link

Oct. 18, 2012 -

Posted by quick loans
I got this web site from my pal who informed me
concerning this web site and now this time I am browsing this
web page and reading very informative articles or reviews at
this place.
Permanent Link

Oct. 25, 2012 -

Posted by Hayley
An impressive share! I have just forwarded this onto a colleague who
has been conducting a little research on
this. And he actually bought me lunch simply because I found it for him.
.. lol. So let me reword this.... Thanks for the meal!
! But yeah, thanx for spending some time to talk about this issue here on your site.
Permanent Link

Share and enjoy
  • Digg
  • del.icio.us
  • DZone
  • Netvouz
  • NewsVine
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • YahooMyWeb
<- Last Page Next Page ->

Kevin Barnes

Code Craft is the place for my thoughts, rants, ideas and occassional jokes on what it means to write code, why some people are better at it than others, and how we think about software in general.

Copyright (C) 2005, Kevin Barnes. All rights reserved.