Debugging and Defensive Programming

Jeffrey S. Dutky
5 May 2004


In response to a (possibly disingenuous) article on slashdot I posted a (possibly ranting) response that detailed some of my own (possibly archaic) debugging practices. In fact, I have been trying, for quite a few years, to describe my ideas on debugging and come up with some kind of organized theory on the subject. This article is the closest I have come, and I thought I should try to expand on the topic a little bit.

New, Novel, and Wrong

First, I'll just say that the "new and revolutionary" debugging technique the poster was interested in is pure, useless crap. First, the article is obviously an attempt to get a bit of free advertising by an empolyee of the company linked to as "relative debugging." Second, the technique ("relative debugging") assumes that you already have a known good implementation of the program you want to debug! Finally, it is not clear how this technique will work: assuming that the two implementations of the program are significantly different (or that the platforms on which they are running are significantly different), there is no way that the separate runs of the programs could be compared against each other.

If the implementations are different by accident, just throw away the one with bugs and use the "known good" implementation. If the implementations are different on purpose, then you will be unable to compare the separate runs of the programs because they actually do different things. If the platforms on which the program are compiled are significantly different, then the compiled implementation will be incomparable. On all counts, the article submission and the content it references is worthless.

Now that that's out of the way, I can get on with the topic that actually interests me: debugging and defensive programming.

Debugging: Origins and Motives

At its heart debugging tries to answer one simple question: What the heck just happened with my program?!?

The program was run and it failed. The programmer wants to know what the program actually did (the actual order of instructions and values of program variables) leading up to the failure, in the hope of identifying what went wrong and how to fix it. The failure is an indication that the programmer doesn't really know what the program is doing.

At first, you might be suprised that a programmer would not know what their own program was doing. After all, the programmer wrote the program and should be intimately familiar with its operation. This is only partially true: First, the programmer may not fully understand the language or algorithms used to write the program. Languages are complex, algorithms obscure, most programmers are only familiar with a common subset of the languages they use and passingly familiar with complex algorithms or specialized domain logic, so it is not uncommon for a programmer to write code whose detailed operation they don't fully understand. Second, the source code is only half the story, the actual operation of a program is dependant on both the instructions that are executed and on the values of the program variables during the run. Even a programmer who is intimately familiar with the language and logic of a given program may be unable to follow all the intricacies of execution without mechanical aid.

In the first case, where the programmer has an imperfect understanding of their own code, the debugger is a crutch, allowing the inexperienced programmer to watch (allegedly) actual execution of their code. Interactive debugging tools, in this use, serve as an instructional tool, allowing the programmer to passively observe what he should be able to actively visualize. Many programmers become dependant on this crutch, never developing the vital skill and insight required to understand the code they write on a purely intellectual level. In the second case, where the program is so complex in its operation that no programmer, regardless of skill or experience, could visualize the actual operation of the program over anything but the most trivial input, the debugger is a prosthesis, a tool, giving the programmer sight with which he was not born.

Still, in both cases, the debugger does nothing more than divulge what paths the program takes through the code and what values are assigned to program variables along the way. In exchange for this service the debugger requires the programmer's undivided attention for a significant period of time. The execution speed of the program during the debug run is orders of magnitude slower than it would otherwise be. As a result, using the debugger on realistic runs requires a large amount of planning and effort.

If we were to step back from the problem and consider the motivating questions -- what paths did the program take and what values were assigned to program variables along the way -- ignoring the mirage of interactive debugging, we could find alternative ways of getting the answers we need without sacrificing execution speed or programmer effort. The results of this re-evaluation are defensive programming, automated testing and log based debugging.

Principles of Defensive Programming

I have found that a modicum of forethought can greatly simplify the process of debugging programs running in production environments. By generating log messages at essential points in the code you can leave enough information in the log file to actually diagnose failures and malfunctions without resorting to secondary debugging tools. By either allowing the debug log to be generated only when requested by the user, or when a failure or malfunction is detected, you can even prevent the logging mechanism from affecting the program's performance.

First, we should consider what kinds of information to log and under what circumstances to generate the log messages:

  1. errors shoud always be logged, along with as much information as you can collect about why the error occured or circumstances leading up to the error. I call this "level 0" debug logging.

  2. when recording the least detailed debug log, we would like to know when program flow changes. At this level we should record whenever a branch is taken (if statements, switch/case statements, and loops) or a subroutine is called. We may want to include the values of the branch control variables or the parameters to and return value from the subroutine. I call this "level 1" debug logging.

  3. at the intermediate detail level, we would like to know whenever a variable value is modified. In general I cluster these log messages near a flow control log message, to reduce the clutter effect in my code. I call this "level 2" debug logging.

  4. at the highest level of detail, we should record any other information that may be of interest to the programmer: database query strings, data records written to files, etc. I call this "level 3" debug logging.

Next, we need to know where to put these logging statements. As a general rule I try to cluster the log messages at the start and end of basic blocks. A basic block is run of instructions that don't cause a change of control flow or a subroutine call. This dictates that log statements should be placed at the entry to each conditionally executed block: then and else blocks of if statements, each case and default block in a switch, and at the start of every loop body.

Unfortunately, basic blocks tend to be pretty small in high level language code, so the log statements make up a fair fraction of my total code. This may sound ominous or inconvenient to a novice programmer, but when you consider that debugging tends to take twice as long as simply writing the code, the payoff in reduced debugging time is well worth the extra typing.

Another place to put logging statements is wherever the program accepts input from the outside world, whether from command line paramters, user input, data files or network/IPC ports. As soon as you have this input data in hand, you should send a copy to the log, so you can be certain of what inputs are being used to generate incorrect results later on.

Finally, I like to put log statements before any call to a subroutine (listing the input values being passed to the subroutine call) and just after the subroutine's return (listing the result values).

Debugging Only When Needed

Now that we have instrumented our code with adequate log statements, we will want to be able to turn the log generation on and off, or regulate the level of detail in the log messages.

As I write each log statement I assign it a detail class (0=errors, 1=flow control, 2=variable values, 3=everything else). I also add a command line parameter (or environment varialbe, database entry, etc.) to the main program that will tell the program which classes of message to write to the log (and which to suppress).

In order to make the inactive debug log statements as fast as possible, I use a very simple test expression for the log detail level: I just check if the detail level specified on the command line is less than the detail level of the current log message. More complex tests might be usefull (enabling logging for only a section of the program or only for certain subroutines, for example) but the overhead used to test these conditions in a program that is not generating log messges seems prohibitive. I have also found it more difficult to think about how to classify debug messages when I have to consider more than one axis of classification.

Another method that I have considered, but never used, is to store log messages in a memory buffer until an error is detected, then dump the memory buffer to the log file on disk (if no errors are detected, the log message buffer is discarded at the end of the run). This would allow the program to run with full logging enabled, but only incur the cost of writing to the log file when something goes wrong. This method is only useful when you are sure that no serious flaws still exist in your code (flaws that might corrupt memory or cause an unexpected and uncontrolled termination of the program). While that may sound like a serious limitation, there are a large number of production codes that fit those requirements.

Supporting Code

The amount of code needed to support defensive programming is fairly small and straightforward. We need to provide a means of setting the debug detail level, a means of selecting the debug output file and a means of writing a log message to that file.

Here is an example implementation:

debug.h

#ifndef DEBUG_H              /* check if this file has been previously included */
#define DEBUG_H              /* if not, define symbol to prevent future inclusion */
#include <stdio.h>

#ifndef DEBUG_C            /* if we were NOT included by our own C file */
extern int G_debug_level;

extern FILE G_debug_file;
#endif

void set_debug_level(int p_dblevel);

#define DEBUG(DBLEVEL, MKSTR, STR) \
{\
    int dblevel;\
    char *str;\
    \
    dblevel = DBLEVEL;\
    str = STR;\
    if(G_debug_file != NULL && dblevel <= G_debug_level)\
    {\
        MKSTR;\
        fprintf(G_debug_file, "DEBUG(%d) -- %s\n", dblevel, str);\
        fflush(G_debug_file);\
    }\
}

#endif

debug.c

#define DEBUG_C            /* define symbol to handshake with our H file */
#include "debug.h"

int G_debug_level = 0;

FILE *G_debug_file = stderr;

void set_debug_level(int p_dblevel)
{
    G_debug_level = p_dblevel;
    return;
}

void set_debug_output(FILE *p_dbfile)
{
    if(p_dbfile != NULL)
        G_debug_file = p_dbfile;
    return;
}

The debug log facility must be implemented as a macro for two reasons. First, we need to avoid the overhead of a function call when debugging is disabled. Second, we need to be able to execute arbitrary code in order to build the debug log message.

Automated testing needs much more involved code support, and the support is dependant upon the type of algorithms and data-structures your program uses. In my experience, however, there are few common patterns:

Tagged Structures
When you are working with dynamically allocated memory there is a high liklihood of accidentially using an unitialized pointer, reusing a pointer that has been deallocated, or misusing an allocated pointer (using a pointer to a structure of one type as if it were a structure of some other type). One way of mitigating these problems is to tag your data structures. The tag field will be initialized to some unique value when the structure is allocated, and cleared when the structure is freed. For each structure your program defines you would have code that looks something like this:

mystruct.h

#ifndef MYSTRUCT_H
#define MHSTRUCT_H

typedef struct
{
    long tag;
    ...            /* other fields of MyStruct */
} MyStruct;

MyStruct *create_MyStruct(...);

void delete_MyStruct(MyStruct *p_mystruct);

...                /* other functions related to MyStruct */

#endif

mystruct.c

#include <stdlib.h>
#include "mystruct.h"

#define MYSTRUCT_TAG 0x4d795374L   /* equivalent to the string "MySt" */

MyStruct *create_MyStruct(...)
{
    MyStruct *mystruct;
    
    mystruct = (MyStruct*)malloc(sizeof(MyStruct));
    if(mystruct != NULL)
    {
        ...             /* initialize fields of MyStruct */
        mystruct->tag = MYSTRUCT_TAG;
    }
    
    return mystruct;
}

void delete_MyStruct(MyStruct *p_mystruct)
{
    if(p_mystruct == NULL || p_mystruct->tag != MYSTRUCT_TAG)
        return;
    
    ...                 /* delete any allocated members of MyStruct */
    p_mystruct->tag = 0;
    free(p_mystsruct);
	
    return;
}

Every function that operates on a MyStruct structure checks to see that the pointer is not NULL and that the tag field is set to the value MYSTRUCT_TAG. If not, the function returns an error.

This will prevent any function from accidentally operating on values of the wrong type and cause a buggy program to abort as soon as an uninitialized, previously freed or impropertly typed pointer is used. The test itself is fairly fast (two integer comparisons) and should not be too much of a performance drag on realistic programs.

Validity Checking
Validity checking is only usefull when you have complex data structures with well-defined invariants such as an AVL-tree or a linear linked list. With these kinds of structures we can write fairly straight-forward code to verify that the invariants are satisfied and not worry that our validity checking code is also buggy.

As with tagging, validity checking may require some addition to or modification of the data structures being validated. The most common modification I have seen is the addition of a touched field to linked data structures. The touched field is initialized to some constant value (e.g. zero) when the structure is built. The validity checking function sets the value of the touched field as it traverses the structure. If the validity checker encounters a strucuture whose touched field has already been set, then a loop exists in the structure (often, such structures forbid loops) and the structure is invalid.

Other common modifications include element counts, tree/list threading (links to the root and parent node in branch and leaf nodes, or to the head, tail and neighbor nodes in list nodes), and refernce counts for link-limited tree structures. These values are maintained as the structure is built and validated against actual traversal of the structure by the validity checking routine.

Validity checking is much less usefull when the amount of effort needed to check invariants is roughly equal to the effort required to write the program itself. In this case we would have to worry about serious and subtle flaws in the validity checking code, which would just increase the testing and debugging burden.

Still, for an important class of codes, validity checking is an acceptable option, and it allows us to construct probablistic, automated testing frameworks. We can get very high levels of quality assurance from an automated testing framework, since we actually have a chance of covering either a majority of all test cases, or, at least, getting wide enough coverage of the problem space to statistically trust the results.

Buffered Logging
If we are concerned about the overhead of writing to the log frequently, or if we can't afford the storage required for a full log file on each run of the program (the logs generated at the highest level of detail can be hundreds of megabytes in size), we may want to consider buffered logging.

The log messages can be stored in a circular memory buffer until an error condition (or any interesting condition) is detected, at which point the buffer contents are dumped to the log file on disk. The memory buffer will be of a fixed size and will overwrite the oldest messages in the buffer when it becomes full, which means that we will only get a partial trace of execution. With a large enough buffer, however, we should be able to see sufficient execution state preceeding the error condition to be able to diagnose any problem.

Buffered logging will allow the program to run at full speed under normal circumstances, but still give us full debug logging when we need it. We will also be able to regulate the size of the resulting log file, by limiting the size of the memory buffer.

The main disadvantage of buffered logging is that it is useless if the bug under investigation corrupts the programs memory or causes the program to abort unexpectedly. Nevertheless, for stable production codes buffered logging offers an ideal compromise between ease of maintenance, execution speed and storage efficiency.

Applicability and Limits

I have used this method to successfully debug systems from the initial unit test stage all the way through deployment and maintenance. Where I have used this method in a program I have only rarely needed to resort to an interactive debugging tool. The rare instances where I have had to resort to an interactive debugging tool, however, are instructive:

  1. On one system, where I was forced to use an antiquated compiler, I encountered a compiler bug that occasionally corrupted the return stack when a function taking a variable number of parameters (such as printf) was called. The log showed what appeared to be an infinite loop that started in the middle of a straight-line piece of code, crossed a function call and ended somewhere in the body of the function call.

    In fact, even using the interactive debugger I was unable to figure out what was happening. A colleague alerted me to an errata from the compiler manufacturer which mentioned the bug in fairly vague terms. I used the debugger, stepping through the machine code listing and monitored the stack position that contained the return address for the affected function call, to verify that the compiler had emitted bad code. I then had to write a work around for a perfectly legal call to sprintf, thus avoiding the compiler bug.

  2. In another system I uncovered a race condition between my program and a remote system. The debug log only revealed that a request made by my program was returning a duplicate value. Examination of the protocol specification revealed that the request/response exchange didn't have full handshaking, allowing duplicate messages to be returned if requests were issued too rapidly.

    Again, the interactive debugger was of little help. When I tried to duplicate the bug in the interactive debugger, it disappeared. The debugger was introducing just enough delay between requests to allow the other program to process the next transaction. Only carefull analysis of our specifications revealed the true problem.

  3. The main failure of this method, and the only circumstance under which interactive debugging actually shines, is in the earliest stages of unit testing, when the code is likely to be very unstable: wild pointers, unterminated strings, and simple typos tend to wreak havoc on a system that is dependant on basic stable operation of the host program.

    Even under these common circumstances, however, I have found that I can often identify and solve the problem just by examining the log and the source code. Wild pointers and unterminated strings have a recognizable signature in the log: both appear as garbage characters in the output. Simple typos tend to be more noticable when your attention is concentrated on a section of code known to be misbehaving. In many cases, while I was stumped by the problem, a colleague was able to spot the error while looking over my shoulder.

Overall, I still maintain that a log file and defensive programming is at least as good as using an interactive debugger at finding and fixing most bugs. I just don't want to claim that defensive programming can find or fix all classes of bugs: some bugs are inherently difficult and the only way to find or fix them is to apply a fair amount of brain-power to the problem. Compiler flaws and race conditions fall into the category of inherently hard problems and we cannot expect any tool or technique to help very much with these kinds of bugs.

Automated Testing and Debugging

Automated testing is another technique I have used to good effect on some projects. In general, automated testing is only useful when you can write both a subroutine to verify proper behavior of a program, and a subroutine to generate known good and known bad inputs to that program.

An automated testing framework would call the program under test with random inputs that are known to be either good or bad. The output of the program would be verified with the verification subroutine, ensuring that known bad input generates the desired error indication and known good input generates correct output. Several thousand (or hundred-thousand) iterations would be run, hopefully covering a large number of possible test cases. For some classes of programs (those with easily verified outputs and easily constructed good and bad inputs) this technique can deliver very high assurance or correctness.

For many programs, however, it is prohibitively difficult to generate known good and bad inputs on a random basis or to easily verify that the program's output is correct. For most programs of interest (those that do anything useful) it is likely to be as much work to write the test harness (input generator and output verifier) as it was to write the program itself. In this circumstance we would never be certain that the test harness was working correctly, so we could never have any assurance of the tests' validity.

Combining automated testing with defensive programming allows us to relax the requirements for a comprehensive test harness. Instead we will build a program that detects possible error conditions and then triggers a buffered logging system to dump the most recent log messages to the log file.

By relaxing the verification requirement to dection of possible errors we allow some non-error conditions to be incorrectly identified as errors. This could make the verification routine easier to write (we may also allow some error conditions to go undetected, but we obviously prefer false positives over false negatives). We can accept false error indications during the testing phase because we will have humans on hand to examine the output and make the final decisions.

Stages of the Debugging Process

After using automated testing techniques on a couple of projects I began to notice a pattern to my debugging sessions. In each case, certain kinds of bugs would only be found during certain phases of the debugging process: some bugs were common only in the early phases of debugging while other would only be found later on. Some awareness of the stages of debugging and what kinds of bugs are found in which stages is helpfull in designing a defensive programming strategy. It may also be usefull to know what kind of errors survive until very late in the program life-cycle, since this will influence what sort of work will be required of maintainance programmers.

The first phase of debugging is simply uncovering syntax errors and typos. Traditionally this phase was conducted at the programmer's desk and was largely manual. We now have a wide variety of tools to aid the programmer in this task, including language sensitive editors, syntax checkers and a wealth of errors and warnings from the compiler itself. Defensive programming doesn't help very much with syntax errors, but it can alert you to typos that alias existing program variables. Such typos can go unnoticed for long periods of time, only causing minor, and seemingly random, malfunctions.

The second phase of debugging is characterized by bugs that cause the program to abort very early in its run, usually due to errors while reading input data of some kind. Some of these bugs will actually be in the input itself, not in the program code that reads the input: my experience suggests that it is equally likely to make a mistake creating an input file by hand as it is to make mistakes writing the input subroutines in the program itself.

The third phase of debugging is dominated by memory allocation bugs (that is, if your program uses dynamic allocation and linked data structures). Bugs in this phase manifest as either outright crashes (segmentation faults) after a period of aparently successful processing, or as garbage output caused by corrupted string literals (for instance, output from a printf statement where the garbage character occur within the text of the literal format string). Other manifestations include crashes inside heap allocation routines (malloc/free) and crashes in standard library routines.

The fourth phase of debugging is the most difficult. After all of the fatal input and memory allocation bugs have been worked out, you are left with the task of verifying that the program is actually doing what you intended it to do. Early in the third phase, you will be fixing gross and obvious errors in the program logic, but this will rapidly give way to a whole host of subtler errors that may take a great deal of time and effort, not to mention a fair amount of inspiration, to identify, characterize and fix.

Lessons to Learn

One lesson to take away from this pattern is that you should always suspect the test input data and test harness, as well as the code you are testing. Too many times I have been led on a wild goose chase because I was certain that the bug must have been in the code I was testing, not in the input data or the test harness.

Input data failures tend to be simple and stupid: systematically omitting a field or two in a long record-structured file, using the wrong kind of newline characters, miscalculating a record count in a header or footer, etc. Defensive programming will help with some of this (if you wrote a good input verifier, it should alert you to badly formatted input with an unambiguous and descriptive failure message), but you must actively resist the temptation to assume that your input is correct.

Test harness failures tend to reside in either the input generator routine (which should cause failures of the input verifier in the program under test) or from the output validation routine. The output validation errors are the most pernicious, since output validation is likely to be nearly as complex as the program that is being validated. The only advice I can offer here is to make output validation as simple as possible, even if it means using less efficient algorithms than you might prefer. In an automated test environment you can afford to let the test harness run for hours or days, if necessary, so efficiency is not essential.

Conclusions

Defensive programming techniques (design for test, automated testing, etc.) can have a large payoff over the life of a program. Early in the design and coding process, these techniques actively prevent bugs by concentrating the programmers' attention on details of the program that are most prone to error. During the unit testing and debugging phases, the techniques accelerate the debugging process by allowing the program to run at or close to full speed during the test runs (running in an interactive debugger is 5-6 orders of magnitude slower than normal execution speed. The overhead from logging is only 2-3 orders of magnitude). Once the program is delivered, the same features used during debugging are available to operations, support and maintenance staff to monitor the program in production and diagnose failures.

Defensive programming provides at least as much insight into the operation of a program as do interactive/on-line debugging tools. Since defensive programming tends to provide immediate feedback to the basic questions that motivate the use of interactive/on-line debugging tools the tester or support engineer does not need to develop a custom testing or debugging strategy, which speeds up the diagnostic process.


Last Modified 21 December 2006

Home