Very few things are certain in life. Here are a few things that are:
- A gas will expand to fill the available space.
- Work will expand to consume all resources and time.
- The most complex problem in a piece of software will expand to affect the entirety of the code.
The first two of these are well know and most people will recognize them. The last one is a truth that is a bit harder to appreciate. Understanding it is, in my opinion, one of the keys to becoming a master programmer. It is also similar to the first two items in that it can be avoided only at the cost of work. Leaning how to overcome this tendency of complexity to expand is the point of this discussion and the reason for the title: let the hard stay hard.
To understand exactly what I mean by complexity expanding to affect the entirety of the code, let me start with two very simple examples and then move on to two larger (and perhaps controversial) ones. The goal of looking at these is not to provide answers or fixes to the specific problems, but to observe the strange tendency of code to expose complexity despite our best efforts to hide it.
Log rolling for lumberjacks
This is just about the simplest example of complexity creep I can think of, and its present in a very large number of systems. It looks like this (in some arbitrary language):
logger.debug(some text + someObject +
some other stuff);
If this code was hiding complexity properly it would look like this:
logger.debug(some text + someObject + some other stuff);
The complexity in this case is performance. Because construction of a string takes place on the stack prior to the function being invoked, a lot of debug logging can be very costly. To get around this problem we have diligently wrapped this call to the debug statement in a test. This is classic complexity spread. The problem is now the responsibility of every developer who writes a debug level logging statement.
In this case, the problem can be solved easily via #define, templates, aspects, or code generation. Depending on the available language features this may seem like a lot of work, but solving the problem once makes it go away forever. Otherwise the complexity bleeds across many thousands of lines. If you cant take the time to hide the performance complexity, maybe performance isnt such a big deal for you and you can just ignore the performance impact and leave the wrapping exercise for later.
Who owns this property
Another surprisingly common case of complexity spread can be seen in code that looks like this:
int targetValue = DEFAULT_INT_VALUE_CONSTANT;
String someValue = null;
String someValue = properties.getProperty(NAME_CONSTANT);
if (someValue != null)
targetValue = stringToInt(someValue);
catch (Exception e) # perhaps multiple types
# do nothing or perhaps log an error/warning
This code is pretty simple, and Ive probably seen something like it written 1000s of times over the years. Sometimes the exception isnt even caught and (of course) the levels above also have no idea what to do with the exceptions that are thrown. There are maybe three or four legitimate possible constructions of this code, but the question is why is it not written like this?
int targetValue = properties.getIntProperty(NAME_CONST,
If you have some kind of Service Locator pattern or are using code generation, you might be able to make the whole thing declarative as well, but thats presumptive. At least the above code should be the maximum amount a user of properties should be required to write. Frankly, it amazes me that the writers of properties classes dont universally include such methods, but thats not an excuse for not wrapping/extending the class and providing the functionality for yourself.
How is this an example of complexity spread? The complexity was introduced by making certain constants or values externally definable. The Properties class was the tool to do it, but we still spread much of the complexity out throughout the application by forcing people to deal with the error conditions (and type coercion issues) each time the class is used. In practice there are only three things you can really do when a property is incorrectly defined: 1) use a default and continue, 2) use a default, display an error somewhere and continue, or 3) abort. Even if you want to do all three depending on circumstances you can hide this complexity fully without much work.
XML based systems are frequently littered with variants of this problem.
In both of the above cases, the complexity was of a very limited kind. As a result there were also very simple solutions. Despite this, very few people actually bother to do the work; so, like an expanding gas, the complexity has filled the space of their code.
The next few examples are much bigger and hence more controversial.
I know what I know, but I cant remember where I put it
Memory management is an intrinsic complexity of writing software. Programs require memory. That memory requires management. Most (but not all) early functional programming languages consciously chose to expose much of this complexity to users due to performance considerations. C++ continued to expose this complexity, probably for the same reasons. This one choice has exposed millions of innocent lines of code to complexity they never wanted to know about.
Anyone who has written a substantial body of C++ code and also a substantial body of code in a garbage collected language can write the rest of this section themselves without my assistance (the holdouts can feel free to write me whatever rants they like). C and C++ have an entire industry dedicated to nothing more than solving this one problem. There are books, tools, patterns and approaches. This piece of complexity spread is one of the most insidious and notorious around and is probably one of only two or three reasons for the rise of Java. Otherwise Java was (at that time) inferior in many other respects.
Just to be clear as to why this is a case of complexity spread: managing memory is a principle concern of languages. Writers of code in these languages are forced to think about it everywhere throughout their code rather than managing that complexity in just one place. Hence, it is complexity bleeding out to affect the entire approach and substance of the code (this leaking is more like a broken pipe in this case, but the principle is the same).
Garbage collection comes with a price tag, but it is one most programs can and should pay willingly. Not to dwell on solutions, but C++ can be garbage collected and Smart Pointers are even better than garbage collection in the cases where they make sense (which is not everywhere).
This final example reeks of controversy, but regardless of your take on the value of J2EE and its various elements, there is little doubt that it is a good example of complexity spread. Even the most ardent admirer of EJBs will admit that there are quite a few things people commonly do wrong when developing with J2EE, which is synonymous with an acceptance that it has spread complexity to the user. Because the details of this problem set are well known to people who have lived in this environment and completely foreign to others (say Python developers), Ill pick just one detailed nit as an example.
Let us consider creating an arbitrary service using EJBs (EJBs are, in fact, services and not more generalized objects in the proper sense, so I wont confuse people by misnaming them). To create this EJB (service) I will need to create a grand total of 3 files (plus 2 if we want local interfaces) and create entries in two XML files. Of course I will also have to use the EJB which requires that I handle a host of potential exceptions, talk to JNDI, and perhaps deal with some caching issues.
If I do nothing (beyond what J2EE provides and requires) I will be writing (and maintaining) a substantial body of code and declaration to create my services and in addition I will be writing on the order of 20 lines of code every time I invoke one. Now thats complexity spread.
J2EE advocates will rightly point out that with some up front work you can make this go away. I agree. At this very moment I am creating EJBs with nothing more than a couple of XDoclet tags in the comments of my service classes and using them with nothing more than a method invocation on an object (provided via either a single service locator call or by dependency injection).
Of course much of this is work that I was forced to do myself or work that others were forced to do (and which I borrowed). These are not J2EE components but a set of wrapping layers. The point is not that you cant hide complexity; the point is that you (or someone) needs to do work to make it happen. I have seen lots of implementations that failed to do that work. In fact, for many years it was hard to find anyone who was doing it completely.
There are still other pieces of complexity to deal with when working with EJBs. There are a number of rules one must remember when creating the service objects. There are requirements on the parameters that are passed that are not compile time verifiable (without work). And of course you must architect in a services layer (which might not be a natural part of your architecture otherwise).
To save myself a rant or two please note that I have not said that J2EE is bad. I have just said some bad things about J2EE. The overall utility of J2EE depends entirely on how it is used and in what particular circumstances.
Let the hard stay hard
This has been a pretty long run up to the main point of the discussion, but here it is:
Let the hard problems BE hard
so that the simple problems will BE simple.
When making a rug, the weaver spends extra time on the edges because he understands that they will fray. He does not add extra weaves to the body of the rug to solve the problem.
When constructing a building the builder does not strengthen the entire roof to enable the placement of the air conditioner, he strengthens only the load bearing section (or wisely moves the compressor off the roof entirely).
As we have seen, spending too little time on complexity tends to result in being forced to spend even more time on the same complexity (due to spreading). Solving the problem once, truly and completely pays off for the lazy programmer.
The smell of complexity spreading
Since complexity spread is such a problem, its important to learn how to spot it, or to use Fowlers terminology, how to smell it. As there does not appear to be any religious significance to Fowlers smell nomenclature I shall make up my own here and reference only a few of the well know ones he has laid out.
The biggest single smell indicating that a system has been infected by creeping complexity is the Needle-in-a-Haystack odor. When you can look at a piece of code, determine that its behavior is logical and consistent, but it takes you a lot of looking to find out what that behavior is youve encountered the Needle-in-a-Haystack smell. Long Functions and Lots of Branching are smells that contribute to Needle-in-a-Haystack, and are also signs of spreading complexity.
Stranger Intimacy is another sign that a class wants you to know too much. If lots of classes seem to be using a single class that is not central to their main purpose, then they are exhibiting Stranger Intimacy and creeping complexity is probably taking place. Stranger Intimacy frequently goies along with the equally damning Random Handshake wherein a class seems to require random small interactions with unrelated classes to be kept stable (cache-invalidation comes to mind as an example).
The Parade of Fools is another bad sign. The parade of fools occurs when a class requires a host of helper classes to be used properly. These helpers are generally stateless collections of functions. In general, I consider such helper classes to be a smell in and of themselves.
When you start to see any or all of these problems in your code, stop what you are doing and save yourself a week of effort for each hour you spend identifying and hiding complexity.
Stopping the bleeding before it starts
These are, of course all after-the-fact assessments. It would be nice to know in advance where complexity lies so that it can be avoided. In discussing some suggestions on how to avoid complexity creep one item to bear in mind is that avoiding creep can cause it if you arent sure you understand what kind of complexity you are facing.
Fixing problems you dont have is a 100% guarantee of increased complexity. A hard nosed attitude of you arent going to need it is a good idea, but only if seasoned with a strong dash of willingness to admit mistakes and only if you are willing to refactor as problems begin to appear. This attitude should also not cause you to ignore what you really do know out of dogged attachment to a rule of thumb.
To learn how to let hard problems BE hard, lets re-examine the ealier four problems and see what went wrong in each one.
If you think about these examples from the tool-maker perspective you can easily understand some of the reasons that the tools do not hide complexity as completely as we would like. In each case the tool maker faced limitations owing to his position as use-ee rather than use-er.
In the logging example, the logging object cannot work outside of itself to provide meta-programming elements (although in languages with #define or certain kinds of templating this is not as true). As a result the best they can do is to offer the user the tools to enable that meta-layer.
In the properties class, the tool maker did not want to presume too much about how the tool would get used. Does the user want to error, do they want to abort, or do they want to continue? Although this explains some shortcomings, there are still issues. I will discuss these later.
In the C++ memory management case, the designer was forced to choose between two evils (the evils of garbage collection and the evils of manual memory management) and chose the solution that enabled maximum flexibility for the user (at the cost of exposed complexity).
In the J2EE case, there are literally millions of potential use cases. One can imagine the difficulty in trying to find a universal solution. By exposing so much of the internals the committee enabled the broadest possible use.
This then brings us to the first rule for tool makers (all of us)
Keep it simple stupid
Do not sacrifice the 95% case for the sake of the 5% case. Both memory management and J2EE are clearly failing in this specific area. Because there are some very challenging use cases they have made life very hard for the 95% of the times when the use cases dont apply. If C++ had enabled a simple garbage collected object type in addition to supporting more self-managed object types an entire industry would now be nearly out of business.
This rule should not be used as an excuse not to provide needed elements, if a slightly more complex path can still provide the additional 5% without impacting the 95% it should be made available. Most garbage collected languages fail here. Because you dont usually want to manage memory they assume you never will. There exist mechanisms to allow users to manage object lifecycles (forcing required complexity on them in those cases) that work fine for languages with garbage collection. Sadly these have not been provided by most of the languages that use garbage collection.
Fail like you mean it
Provide ways to handle expected failures transparently. This is the problem with the properties system. It simply did exactly what it was asked to do and throws the exceptions and failures back to the user to deal with. They could easily have provided a set of obtainer methods to cover the various data types (handling type coercion issues) and they could have provided a throw on failure optional parameter to either allow or suppress exceptions. There are a half a dozen solutions each of which is better than the one provided.
In a sense Exceptions have made things too easy on tool designers. In far too many tools exceptions are thrown for scenarios that are very close to normal. Of course you need to be able to allow the user to be aware of the failure if necessary, but you also want him to be able to provide a default answer that will suffice in most cases.
For example, why not provide a setErrorHandler method on some service-ish classes to handle those really hard-core errors where the system has died horribly. The passed object can be used if provided, otherwise the exception can be thrown as a runtime exception. If such a service is really an obvious mandatory element then maybe just force it as a compile time creation of the objects.
Providing methods with default-accepting signatures is also a good idea (as in the properties case).
Ten is better than one
Make multiple small independent tools. When a problem is so big and has so many use cases that it is simply not possible to identify a 95% case, its time to consider the problem as multiple problems and solve the 95% case for each. This is definitely the J2EE problem. The slicer-dicer-puree-and-make-julianne-fries Ron-Co mentality does not work with software.
These separate tools should naturally be just that, separate. If they are deeply engaged and intertwined, they will have a knowledge-binding that makes each of them overly complex. This is not to say you can not have binary dependency, just not notional dependency. JNDI, for example, is a deep notional dependency for the J2EE user. Thats unfortunate since it has nothing to do with the problem the user is solving. If you want to use JNDI then hide its existence and use from the person using your tool (to the degree possible).
JCS (Java Caching System, part of the Jakarta Commons) is a good example of having hidden notional dependency well. In JCS if you use distributed caching you will be using JGroups for communication. If you want that feature you will need to know a few things about JGroups (those minimally required to have multiple machines communicate) but only if you want that high-end feature. Others need know nothing at all.
Actually, open source naturally tends to produce small independent solutions, and for me this is probably its biggest selling point. When bringing in a tool from an open source project the odds are much better you will not be saddled with unused complexity.
Throw the design away and get a good nights sleep
When youve come up with a tool design and you ask yourself: self, what would it be like to work with this tool? And your self answers: it would suck worse than a twenty pound tick on a hound-dogs ear! Its definitely time to throw it all away and get a good nights sleep. Also, maybe you should see a doctor about that split personality problem.
These then comprise some very basic ideas that tool makers should be aware of. Tool users have a related set of concepts that will help them avoid, hide and manage complexity.
Never solve other peoples problems
Ill write extensively on this topic another time, but the short version is this: avoid tools that solve other peoples problems. If you are writing a blue-fish management program and it needs to solve the problem of blue-fish-escaping and you find a tool that solves the problem of blue-fish escaping and also solves the problems of fish-hyperventilation and fish-telepathy and fish-that-grow-legs you might feel youve gotten a great deal. After all, look at all these other problems you wont have to worry about.
This is wrong-think, also known as manager-think.
The problem is that no matter how well the tool is written there is a very good chance that little pieces of the fish-telepathy problem will have bled out and affected the blue-fish-escaping problem. Likewise, the other features have a high probability of having bled. Try to find a smaller tool, one that focuses on the blue-fish-escaping problem. If a tool solves just your problem (even if incompletely) it is probably only a small amount of work to ensure that the complexity has been well isolated. Do not rationalize that maybe-perhaps-some-day we might have fish that grow legs. It is very unlikely that the rest of your design accounts for that problem (it shouldnt actually) so having this one piece work with leg growing fish will not help you. It is somebody elses problem.
If the only tools you can find are much bigger than the problem you have to solve, write your own (much smaller) solution and maybe give it away so that others can benefit from it.
Answer := power(refactor, aleph0);
You read that right, refactor continuously. If youre thinking this is not a way to stop complexity from bleeding before it starts, you are right. Im just repeating it here because it is the most important thing you can do. You simply cannot expect yourself or your team to have the psychic powers to predict all the issues they will face before they face them.
Wrap yourself a present
Wrap third-party tools in your own thin layer. This is controversial, but it has never failed to pay off for me and takes almost no time (almost != none). I almost never start with an inherited wrapper; instead I recommend you use a delegation model. If it turns out you need to provide most of the methods of the tool you are wrapping, fine go ahead and inherit. I realize that I have just recommended a practice identified as an odor in code, but there is a big difference between wrapping someone elses code (that you cant change) and your own (which you can). A wrapping layer provides a means for doing any of number of things from logging to special-case management to hiding functionality that creates problems for you to fixing bugs in the implementation.
The other really important thing that the wrapping exercise itself will do is to make it clear just how complex the interfaces to the code you are using are. If it looks like you are going to be delegating and wrapping lots of methods and functions you have a good clean warning signal that you may need to spend some time coming up with an approach to simplifying the complexity that the tool is presenting you. It's critical that if you follow this advice you don't make the wrapping exercise a "mindless" activity. If it is you are better off not wrapping.
Fewer is better
Classes are a good thing, but when an element has a large number of classes that the user needs to work with there is a high probability that complexity is being spread. The idea is to hide the complexity so that while you may have ten or a thousand classes and some interaction paths between them, they still have a small and simple interface representing a fraction of the internally used elements.
Layering (for the sake of this discussion) is the practice of creating sections of an application that are isolated from one another via a thin channel of interaction (an API of some kind). This one is tough for me to recommend because I have seen layers create problems and I have seen layers solve problems. Layering can act as a barrier that triggers awareness that a piece of complexity has not been hidden well. It can also be a cause of complexity as data that should be transparent loses its transparency. This means you need to be careful about creating layers in your application, and it means I need to spend a lot more time discussing this issue in a future article. For now, try this as a rule of thumb: only layer if you know the layer will do more harm than good or if the need for the layer has already emerged in the process of writing the code.
There are no rules in fight club
My fifth grade PE teacher once told that no sport was ever worth playing if someone couldnt lose an eye doing it.
The point is that we need to be willing to take risks if we are going to do the best job possible. Ive seen more complexity leaked just because people assumed that the generally accepted norm was correct than for any other reason. Dont be afraid to break the rules if it makes sense. Write your code in Smalltalk or Lisp if it looks like the best way to solve your problem. Use Perl or Python or code. Write to a file rather than a database. Remember that the problem you are facing and the solution you will provide are real and not an academic exercise. Choosing a non-status-quo solution implies that you have taken the time to understand the repercussions of that decision and that you will not be leaking complexity (installation complexity, code complexity, licensing complexity), but otherwise dont stand on custom. Properly containing the problem is the most important aspect. On the flip side, dont just be an iconoclast to be an iconoclast. The decision must always be about the problem and not just an arbitrary expression of personality.
The fight against complexity requires that our full set of resources be marshaled. Each person on a team needs to spot it and managers need to learn how to help people avoid it.
The most potent guards against complexity are frequently the least experienced team members. They will naturally ask questions like why do I have to do this? If the code does not speak to them in a language that makes sense, and if they are constantly breaking rules then you need to look for what complexity is hampering them and remove it. The key to enabling this is having a learning environment where people are encouraged to bring up their questions. Let them know how important this role is.
The more senior team members need to be encouraged to discuss complexity and they need to become fearless in their efforts to fix it. If you have discussed it and people think an idea will benefit the code base find a way to get it done. Developing that habit will pay of over time. Remember, the code will last longer than you think.
Managers need to learn to understand the tendencies of the team members. Every team will have Natalies and Bobs and Karls and they will all introduce complexity in their own ways. Some will import complexity, some will write complexity and some will expand upon complexity. The challenge is to learn to spot these tendencies and then to educate them about complexity and how they can work to avoid it.
When I started this article I mentioned three laws of expansion. Apparently there is a fourth: my writing will expand to fill the available space. Its a good thing the internet has a lot of room in itJ
Despite its size, I hope this article has helped to get you thinking about the complexity in your applications in a new way. If it has then the expenditure in work has done more than create heat, it has changed the system state.