Section 9.1
Introduction to Correctness and Robustness
A PROGRAM IS correct
if accomplishes the task that it was designed to perform. It is
robust if it can handle illegal inputs and
other unexpected situations in a reasonable way. For example, consider
a program that is designed to read some numbers from the user and
then print the same numbers in sorted order. The program is
correct if it works for any set of input numbers. It is robust if
it can also deal with non-numeric input by, for example, printing
an error message and ignoring the bad input. A non-robust program
might crash or give nonsensical output in the same circumstance.
Every program should be correct. (A sorting program that doesn't
sort correctly is pretty useless.) It's not the case that
every program needs to be completely robust. It depends on who will
use it and how it will be used. For example, a small utility program that you
write for your own use doesn't have to be at all robust.
The question of correctness is actually more subtle than
it might appear. A programmer works from a specification
of what the program is supposed to do. The programmer's work is
correct if the program meets its specification. But does that
mean that the program itself is correct? What if the specification
is incorrect or incomplete? A correct program should be
a correct implementation of a complete and correct specification.
The question is whether the specification correctly expresses the
intention and desires of the people for whom the program is being
written. This is a question that lies largely outside the domain
of computer science.
Most computer users have personal experience with programs
that don't work or that crash. In many cases, such problems
are just annoyances, but even on a personal computer there can be
more serious consequences, such as lost work or lost money.
When computers are given more important tasks, the consequences
of failure can be proportionately more serious.
Just a few years ago, the failure of two multi-million
space missions to Mars was prominent in the news. Both failures were
probably due to software problems, but in both cases the problem
was not with an incorrect program as such. In September 1999,
the Mars Climate Orbiter
burned up in the Martian atmosphere because data that was
expressed in English units of measurement (such
as feet and pounds) was entered into a computer program
that was designed to use metric units (such as centimeters and grams).
A few months later, the Mars Polar Lander probably crashed because its software
turned off its landing engines too soon. The program was supposed to
detect the bump when the spacecraft landed and turn off the
engines then. It has been determined that deployment of the
landing gear might have jarred the spacecraft enough to activate
the program, causing it to turn off the engines when the spacecraft
was still in the air. The unpowered spacecraft would then have
fallen to the Martian surface. A more robust system would have
checked the altitude before turning off the engines!
There are many equally dramatic stories of problems caused by
incorrect or poorly written software. Let's look at a few incidents
recounted in the book Computer Ethics by Tom
Forester and Perry Morrison. (This book covers various ethical
issues in computing. It, or something like it, is essential
reading for any student of computer science.)
In 1985 and 1986, one person was killed and several were injured
by excess radiation,
while undergoing radiation treatments by a mis-programmed computerized
radiation machine. In another case, over a ten-year period
ending in 1992, almost 1,000 cancer patients
received radiation dosages that were 30% less than prescribed because
of a programming error.
In 1985, a computer at the Bank of New York started
destroying records of on-going security transactions because of an
error in a program. It took less than 24 hours to fix the program,
but by that time, the bank was out $5,000,000 in overnight interest
payments on funds that it had to borrow to cover the problem.
The programming of the inertial guidance system of the F-16 fighter plane
would have turned the plane upside-down when it crossed the equator,
if the problem had not been discovered in simulation. The Mariner
18 space probe was lost because of an error in one line of a program.
The Gemini V space capsule missed its scheduled landing target by
a hundred miles, because a programmer forgot to take into account the
rotation of the Earth.
In 1990, AT&T's long-distance telephone service was disrupted
throughout the United States when a newly loaded computer program proved
to contain a bug.
These are just a few examples. Software problems are all too common.
As programmers, we need to understand why that is true and what can
be done about it.
Part of the problem, according to the inventors of Java, can be
traced to programming languages themselves. Java was designed to
provide some protection against certain types of errors.
How can a language feature help prevent errors? Let's look
at a few examples.
Early programming languages did not require variables to be
declared. In such languages, when a variable name is used in a
program, the variable is created automatically. You might
consider this more convenient than having to declare every
variable explicitly. But there is an unfortunate consequence:
An inadvertent spelling error might introduce an extra variable
that you had no intention of creating. This type of error
was responsible, according to one famous story, for yet another
lost spacecraft. In the FORTRAN programming language,
the command "DO 20 I = 1,5" is the first statement of
a loop. Now, spaces are insignificant in FORTRAN,
so this is equivalent to "DO20I=1,5". On the other
hand, the command "DO20I=1.5", with a period instead of
a comma, is an assignment statement that assigns the value 1.5
to the variable DO20I. Supposedly, the inadvertent
substitution of a period for a comma in a statement of this type
caused a rocket to blow up on take-off. Because FORTRAN doesn't
require variables to be declared, the compiler would be happy to accept
the statement "DO20I=1.5." It would just create a new variable
named DO20I. If FORTRAN required variables to be
declared, the compiler would have complained that the variable DO20I
was undeclared.
While most programming languages today do require variables to
be declared, there are other features in common programming languages
that can cause problems. Java has eliminated some of these features.
Some people complain that this makes Java less efficient and less
powerful. While there is some justice in this criticism, the increase in
security and robustness is probably worth the cost in most
circumstances. The best defense against some
types of errors is to design a programming language in which
the errors are impossible. In other cases, where the error can't
be completely eliminated, the language can be designed so that
when the error does occur, it will automatically be detected.
This will at least prevent the error from causing further harm,
and it will alert the programmer that there is a bug that needs
fixing. Let's look at a few cases where the
designers of Java have taken these approaches.
An array is created with a certain number of locations, numbered
from zero up to some specified maximum index. It is an error to
try to use an array location that is outside of the specified range.
In Java, any attempt to do so is detected automatically by the
system. In some other languages, such as C and C++, it's up to the
programmer to make sure that the index is within the legal range.
Suppose that an array, A, has three locations, A[0],
A[1], and A[2]. Then A[3], A[4],
and so on refer to memory locations beyond the end of the array.
In Java, an attempt to store data in A[3] will be
detected. The program will be terminated (unless the error is
"caught", as discussed in Section 3).
In C or C++, the computer will just go ahead and store the
data in memory that is not part of the array. Since there is no
telling what that memory location is being used for, the
result will be unpredictable. The consequences could be much
more serious than a terminated program. (See, for example, the
discussion of buffer overflow errors later in this section.)
Pointers are a notorious source of programming errors.
In Java, a variable of object type holds either a pointer to
an object or the special value null. Any attempt to
use a null value as if it were a pointer to an actual
object will be detected by the system. In some other languages,
again, it's up to the programmer to avoid such null pointer errors.
In my old Macintosh computer, a null pointer was actually
implemented as if it were a pointer to memory location zero.
A program could use a null pointer to change values stored in
memory near location zero. Unfortunately, the Macintosh stored
important system data in those locations. Changing that
data could cause the whole system to crash, a consequence more
severe than a single failed program.
Another type of pointer error occurs when a pointer value
is pointing to an object of the wrong type or to a segment of
memory that does not even hold a valid object at all. These types
of errors are impossible in Java, which does not allow programmers
to manipulate pointers directly. In other languages, it is possible
to set a pointer to point, essentially, to any location in memory.
If this is done incorrectly, then using the pointer can have
unpredictable results.
Another type of error that cannot occur in Java is a
memory leak. In Java, once there are no longer any pointers
that refer to an object, that object is "garbage collected" so
that the memory that it occupied can be reused. In other
languages, it is the programmer's responsibility to return
unused memory to the system. If the programmer fails to do
this, unused memory can build up, leaving less memory for programs
and data. There is a story that many common programs for
Windows computers have so many memory leaks that the computer
will run out of memory after a few days of use and will have to be
restarted.
Many programs have been found to suffer from
buffer overflow errors. Buffer
overflow errors often make the news because they are responsible for
many network security problems. When one computer receives data from
another computer over a network, that data is stored in a buffer.
The buffer is just a segment of memory that has been allocated
by a program to hold data that it expects to receive. A buffer
overflow occurs when more data is received than will fit in the
buffer. The question is, what happens then? If the error is detected
by the program or by the networking software, then the only thing
that has happened is a failed network data transmission. The real
problem occurs when the software does not properly detect buffer
overflows. In that case, the software continues to store data
in memory even after the buffer is filled, and the extra data goes
into some part of memory that was not allocated by the program as
part of the buffer. That memory might be in use for some other purpose.
It might contain important data.
It might even contain part of the program itself. This is where
the real security issues come in. Suppose that a buffer overflow
causes part of a program to be replaced with extra data received
over a network. When the computer goes to execute the part of
the program that was replaced, it's actually executing data that
was received from another computer. That data could be anything.
It could be a program that crashes the computer or takes it over.
A malicious programmer who finds a convenient buffer overflow error in
networking software can try to exploit that error to trick other
computers into executing his programs.
For software written completely in Java, buffer overflow errors
are impossible. The language simply does not provide any way
to store data into memory that has not been properly allocated.
To do that, you would need a pointer that points to unallocated
memory or you would have to refer to an array location that lies
outside the range allocated for the array. As explained above,
neither of these is possible in Java. (However, there
could conceivably still be errors in Java's standard classes,
since some of the methods in these classes are actually written in the
C programming language rather than in Java.)
It's clear that language design can help prevent errors or
detect them when they occur. Doing so involves restricting what
a programmer is allowed to do. Or it requires tests, such
as checking whether a pointer is null, that take some
extra processing time. Some programmers feel that the sacrifice of
power and efficiency is too high a price to pay for the extra
security. In some applications, this is true. However, there are
many situations where safety and security are primary considerations.
Java is designed for such situations.
There is one area where the designers of Java chose not
to detect errors automatically: numerical computations.
In Java, a value of type int is represented as
a 32-bit binary number. With 32 bits, it's possible
to represent a little over four billion different values.
The values of type int range from
-2147483648 to 2147483647. What happens when the result
of a computation lies outside this range? For example,
what is 2147483647 + 1? And what is 2000000000 * 2?
The mathematically correct result in each case cannot be
represented as a value of type int. These are
examples of integer overflow.
In most cases, integer overflow should be considered an
error. However, Java does not automatically detect such
errors. For example, it will compute the value of 2147483647 + 1
to be the negative number, -2147483648. (What happens is that
any extra bits beyond the 32-nd bit in the correct answer are
discarded. Values greater than 2147483647 will "wrap around"
to negative values. Mathematically speaking, the result is
always "correct modulo 232".)
For example, consider the 3N+1 program, which was first
introduced in Section 3.2. Starting
from a positive integer N, the program computes a certain
sequence of integers:
while ( N != 1 ) {
if ( N % 2 == 0 ) // If N is even...
N = N / 2;
else
N = 3 * N + 1;
System.out.println(N);
}
But there is a problem here: If N is too large,
then the value of 3*N+1 will not be mathematically
correct because of integer overflow. The problem arises
whenever 3*N+1 > 2147483647, that is
when N > 2147483646/3. For a completely correct
program, we should check for this possibility before
computing 3*N+1:
while ( N != 1 ) {
if ( N % 2 == 0 ) // If N is even...
N = N / 2;
else {
if (N > 2147483646/3) {
System.out.println("Sorry, but the value of N has become");
System.out.println("too large for your computer!");
break;
}
N = 3 * N + 1;
}
System.out.println(N);
}
The problem here is not that the original algorithm for
computing 3N+1 sequences was wrong. The problem is
that it just can't be correctly implemented using 32-bit integers.
Many programs ignore this type of problem. But integer overflow
errors have been responsible for their share of serious computer
failures, and a completely robust program should take the
possibility of integer overflow into account. (The infamous
"Y2K" bug was, in fact, just this sort of error.)
For numbers of type double, there are even more
problems. There are still overflow errors, which occur when
the result of a computation is outside the range of values that
can be represented as a value of type double. This
range extends up to about 1.7 times 10 to the power 308. Numbers
beyond this range do not "wrap around" to negative values.
Instead, they are represented by special values that have no
numerical equivalent. The values Double.POSITIVE_INFINITY
and Double.NEGATIVE_INFINITY represent numbers outside
the range of legal values. For example, 20 * 1e308 is computed to be
Double.POSITIVE_INFINITY. Another special value
of type double, Double.NaN, represents
an illegal or undefined result. ("NaN" stands for
"Not a Number".) For example, the result of dividing by zero
or taking the square root of a negative number is Double.NaN.
You can test whether
a number x is this special non-a-number value by calling
the boolean-valued function Double.isNaN(x).
For real numbers, there is the added complication that
most real numbers can only be represented approximately on
a computer. A real number can have an infinite number of
digits after the decimal point. A value of type double
is only accurate to about 15 digits. The real number 1/3, for
example, is the repeating decimal 0.333333333333..., and there is
no way to represent it exactly using a finite number of digits.
Computations with real numbers generally involve a loss of
accuracy. In fact, if care is not exercised, the result of
a large number of such computations might be completely wrong!
There is a whole field of computer science, known as
numerical analysis, which is
devoted to studying algorithms that manipulate real numbers.
Not all possible errors are detected automatically in Java.
Furthermore, even when an error is detected automatically,
the system's default response is to report the error and terminate the
program. This is hardly robust behavior! So, a programmer still needs
to learn techniques for avoiding and dealing with errors. These are
the topics of the rest of this chapter.