Pop quiz: Should software be rigorously tested before being delivered to your clients?

If your answer is no, then it's time you found another profession.

That's a blunt statement; however delivering partially, or worse untested, software to clients is an unprofessional thing to do. The only professional thing to do is to ensure that the software is tested as thoroughly as possible before release to the client. Software testing can be the most challenging part of the software development life cycle; but it can also be the most rewarding as the customer gets a quality piece of software. Software testing can drive people insane because there are so many views on the best way to approach software testing. There are also many different interpretations of the purpose of the various types of software testing.

So what is the purpose of software testing?

Software testing is a risk management strategy that aims to ensure that the software being delivered is fit for purpose and is of a sufficient quality that the customer is prepared to use it on a day to day basis. As with all risk mitigation strategies, determining the scope and duration of the testing effort is important.

So why does software testing drive people insane?

The main reason that software testing drives people insane is because there are so many definitions of software testing. Depending on who you ask, you will get a different answer. Ask a developer whether they've tested their software, and the answer will almost always be yes. But what does that actually mean?

  • Can the tests be repeated?
  • Are the tests comprehensive?
  • Are the tests correct?
  • Are there unit tests?
  • Are all the unit tests executed and complete successfully?
  • Are there valid integration and/or acceptance tests?

I have worked with many software engineers and the only consistent thing about them from a testing perspective, is that there is no consistency. The level at which testing is dealt with depends on the field in which the development is occuring. There is obviously a greater need for testing and oversight in saftey critical systems such as medical equipment and aviation.Conversely, the testing requirements for a website are different because it is not a safety critical system.

Is unit testing sufficient?

This is a very important question to ask within an organisation. There is no doubt that unit testing is beneficial to the organisation; but by definition unit testing tests only the individual software units. It does not test the integrated whole. I have seen software that is rigorously unit tested fail dismally because there was insufficient testing once the individual units were combined into software to be delivered to the client. The culprit tended to be the software that combined each unit. Each unit was designed to do one thing, and do it well. It was tested for the envisioned scenarios and passed each one with flying colours. So why did the software fail? Many times it was because the software was using the units in an unexpected manner. Similarly the software wasn't always checking the results of the operations using the units, especially in the case of boundary conditions.

Unit testing is very important as a risk minimisation strategy for software development. Unit tests remove one level of risk from the equation - they provide proof that the individual units will operate correctly for the identified scenarios. There are a few caveats to this:

  1. The unit tests must actually test the code;
  2. The unit tests must test not only the "happy path"; but also errors and boundary conditions;
  3. The unit tests must be synchronised with the unit under test;
  4. The unit tests must be complete;I once had someone try to tell me that a constructor only test was sufficient. It is if all you can do with an object is construct it; but if not then the unit has not been tested sufficiently.
  5. A large percentage of the unit's lines of code must be exercised. By large percentage I mean > 80%. 100% code coverage is possible in many cases; but is (usually) extremely expensive to implement. Sound engineering judgement Engineering judgement is used to weight up the risks and benefits associated with a course of action. It is a risk mitigation strategy. Engineering judgement is often influenced by the engineers view of engineering ethics. See also ACM code of ethics. must be used to define the stop point;
  6. The unit tests should be run everytime a change is made to the source code. Using a properly configured continuous integration
  7. tool helps this enormously;

Is integration testing sufficient?

This too is a very important question to ask. Integration testing occurs once the software has been assembled and is supposedly in a form fit for the customer to use. The software is tested from the user's point of view; however this focus often leads to critical sections of the code base not being tested. Imagine the having following conversation:

Developer A: Why have you written so many wrappers around other peoples classes?

Developer B: So that I am isolated from their changes and can work on my classes. My unit tests pass so I'm happy.

Developer A: That seems to be a reasonable short term strategy, but what happens when it comes time to integrate the software so that it uses the real versions of everyone else's code?

Developer B: That's an integration issue and not my problem.The conversation above actually occurred on a project being delivered on both Windows and Solaris. The wrappers in question were a reasonable representation of the Windows functionality but not a good representation of the Solaris versions. The unit had been primarily tested on Windows and not Solaris. Integration of the unit using those wrappers was extremely difficult as a result.

To some extent that conversation seems entirely reasonable. The problem of course is that the pain is being deferred until integration. There is a bigger issue here though - the last statement in the conversation is based on the assumption that the wrapper classes are a relatively good simulation of the classes being wrapped. If the wrappers are a good representation of the classes being wrapped, then this assumption works (up to a point). If they aren't then all that exists are unit tests that work; but software that cannot be integrated because the software does not work with the real versions of the software.

How often should code be integrated. The eXtreme Programming practices state that code should be "integrated often". The longer that modified code remains unintegrated with the main body of source code, the greater the risk that it will be incompatible and subsequently the amount of work required to integrate it increases. I don't believe there should be hard and fast rules dictating the frequency at which code is integrated. I prefer the integrate code as soon as I have some working code that does not break the current system. The completion of a single refactoring is another good time to integrate.

When should we start testing on multiple platforms?

That's simple - Immediately. The longer you leave testing on other supported platforms, the greater the chance that your code is specific to the development platform. Unfortunately there are always subtle and not so subtle differences between platforms that need to be addressed. The implementation of various threading libraries is a case in point. Windows uses kernel threads, whereas Solaris has a couple of different options.One project I worked on was to be delivered on Windows and Solaris. It used co-operative Solaris threads, which require the programmer to periodically call thread library functions to allow context switching. Done right, I've found that this doesn't impact performance greatly; however tight processing loops do require some thought before implementation begins. The software for the project was being developed in C++ and there was an operating system abstraction layer that was supposed to provide an OS agnostic environment to work in. At one point, a threaded application was written entirely on Windows then ported to Solaris once it was complete. The application had massive performance problems on Solaris because it assumed that the Solaris implementation of the abstraction layer behaved similarly to the Windows implementation. The tight processing loops essentially turned the application into a single threaded application on Solaris. A significant amount of rework was required in order to resolve the problem. The necessity for testing the software on multiple platforms was evident from the start; however most developers were blinded by the ease of development under Windows to consider that the Solaris model was substantially different. It was both a developer and management failure that testing on Solaris was left until later in the project. Testing on both Windows and Solaris whilst the software was being developed would have identified performance problems earlier whilst the software was being designed and implemented.

Testing on all supported platforms should start immediately through the use of automated unit tests. Assuming that all the supported platforms behave in the same manner is fool hardy to say the least. Failing to start testing on all platforms as soon as it is known that multiple platforms are involved is unprofessional.

Should we be proactive or reactive?

The answer to that question is a no brainer. Of course we should be proactively seeking to remove defects via testing. We should strive to test all our code before it gets delivered; however we must also use engineering judgement to determine when to stop. I don't believe that we should have 100% code coverage, except in safety critical systems. The reason for this is because the cost of doing so increases exponentially the closer we get to 100%. Regardless of your organisations' view on the percentage of code covered, it is the responsibility of the software engineer to proactively test for defects. At a minimum, all new code must have unit tests, and all code that is changed must have unit tests covering the changes. Unit tests must be created to replicate any defects that are found, and then used to ensure that the defect does not return once it has been fixed.

Some organisations feel that they should wait for their customers to tell them there is a problem before doing anything about it. I don't subscribe to that line of thinking. If you wait for the customer to report problems with your system, then you have a customer relations problem you need to deal with. Your customers should only find defects in your software under exceptional circumstances. I once had a vendor tell me that they don't use unit tests but preferred to use assertions to notifiy them of problems. The software assertions are a good backup strategy for detecting unanticipated conditions; however relying on them for your primary error detection mechanism shows that insufficient effort went into anticipating problems. Besides, if your software is telling you of unanticipated errors, then it is quite likely that your customer already knows there is a problem.

When should we stop testing?

The simple answer to that question is never. Unfortunately, that answer is somewhat naive. To be fair though, the question itself is the problem not the answer. Some more appropriate questions are:

  • When should we stop unit testing?
  • When should we stop integration testing?
  • Have we done sufficient testing?
  • Should we perform user acceptance testing?

The first question is relatively easy to answer. We should not stop unit testing until we stop modifying the software. Failure to unit test software whilst changes are being made is foolhardy at best, unprofessional at worst. The remaining questions are more difficult to answer sufficiently. "It depends" is the most likely answer to these questions; because the exit criteria are specified by the project being undertaken. These criterion should be defined in the project's test plan. Safety critical applications have stricter requirements than say a website. In each case, the Project Manager must work with their client and their team to define the testing requirements and exit criteria to define when sufficient testing has been completed. Unless these exit criteria are defined, misunderstandings about the level of testing being performed will cause friction among the project's stakeholders and will drive people insane trying to meet expectations.

Conclusion

Testing is a vital part of the software development life cycle (SDLC). Without it, poor quality software is delivered. With it, the likelihood of delivering high quality software is increased; but not guaranteed. Software testing is not a no brainer that can be fixed by choosing to implement one or more of the numerous testing techniques. Using multiple techinques will help increase the quality of the software; however without defining a common understanding of what is required of the testing, the various participants will pull in multiple directions increasing the frustration of everyone.

Most software engineers want to do the right thing by their customer. I doubt that many (if any) wilfully decide to deliver poor software to their clients. Yet many of us end up doing just that. Often it comes down to the language we use to specify what we want the software to do, and how we're going to determine that the requirements have been met. Natural language (in this case English) is not good for easily describing unambiguous technical details. Therefore, most software engineers have differing opinions of what should be considered to be sufficiently tested software. This is what drives many of us "insane" when it comes to developing, testing and deploying systems.

Can this state of affairs be reversed? I like to think so; however it will require education and patience. I am not a testing expert, I don't know everything about software testing; however I have seen what doesn't work, and I've seen some things that do work. A common consensus is needed. I think great strides have been made towards this end, but there is still a long way to go.


Footnotes