Program correctness

Got it! This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.

Assertion-based axiomatic semantics methods attach assertions to flow of control arcs.

Partial correctness - when the loop (program) stops, the goal is achieved
Total correctness - the loop (program) is partially correct and also stops

The number of errors in a symbolic text is proportional to the number of symbols in that text. The larger the text, the more errors it contains.

A partially correct program is a program that, when it stops, will have achieved the postcondition, given the precondition.
A totally correct program is a program that is partially correct and does, in fact, stop.

If preconditions are checked too much in code, the efficiency of the system can be degraded.

Not checking preconditions can cause subtle problems to arise.

This might happen, though, in large software projects where the programmers are responsible for their code to the extent that they need to show that, if a problem arises, it is not their fault.

There are three primary ways of showing correctness.

enumeration: assignment, input, output, if-then-else
induction: loops
abstraction: procedures, functions, modules, objects

Correctness via enumeration shows that every possibility is correct. Examples:

input statements
output statements
assignment statements
conditional statements (e.g., if/then/else)

Correctness via induction shows that, given a base case that is correct, and a step case that is correct and that follows from any preceding case, including the base case, all of the cases are correct.

repetition (e.g., loops)
recursively-defined structures (i.e., structural induction)

Correctness via abstraction is more encompassing method of showing correctness via preconditions and postconditions, etc.

procedures/functions/parameters
input/output interfaces

In complicated systems, abstractions is the only viable means of assuring ourselves that a program is correct.

If programs could be proven correct, no testing would be required.

{ pre: a[1..n] is sorted, n >= 0 } a[0]:= key; i:= n; { inv: key < a[i+1..n] } while a[i] > key do begin i:= i-1; end; { post: a[1..i] <= key < a[i+1..n] }

In practice, testing is required.

The industry average is about 50 errors per 1000 lines of code.

Good news: If you can prove a part of a program correct, you do not need to test it.

Bad news: It can be very hard, or even impossible, to prove certain programs correct.

How can you test your program to make sure (verify) it works according to specification? You cannot test your program to make sure (verify) it works according to specification?

Program testing can be used to show the presence of bugs, but never to show their absence!

Edsger Dijkstra (program testing)

Structured programming

Dijkstra has said that if debugging is the process of removing bugs from a program, then programming must be the process of putting them in.

As I have now said many times and written in many places: program testing can be quite effective for showing the presence of bugs, but is hopelessly inadequate for showing their absence.

Edsger Dijkstra (program bugs)

A discipline of programming

Here is the reasoning adapted from Dijkstra. Consider showing the correctness of a method for multiplying two numbers. The method is correct if, given two numbers, a correct result is always supplied. Let each number be represented by 32 bits. Then there are

2³² * 2³² = 2⁶⁴

possible ways to multiply the 2 numbers.

Smaller example to show method:

2⁴ * 2⁴ = 2⁴⁺⁴ = 2⁸ = 256 = 16 * 16 = 2⁴ * 2⁴

Let us also suppose that we have a table by which to check the results (this would be a big table). If we can check about 2³⁰ possibilities per second (a billion possibilities a second), it would take about 2³⁴ seconds to check all possible combinations. There are about 2²⁵ seconds in a year, so it would take 2⁹ years (about 500 years).

Now let each number be represented by 64 bits. Then there are

2⁶⁴ * 2⁶⁴ = 2¹²⁸

possible ways to multiply the 2 numbers.

Now the time is about this many years.

500*2⁶⁴ ≈ 500*15,000,000,000,000,000,000 years = 7,500,000,000,000,000,000,000 years = 7,500 trillion trillion years

The hardware will fail before we can check even a tiny fraction of the possibilities. The same is true of software. The bottom line is that we buy computers that will only ever use a tiny fraction of all possible computations.

But we expect every one of them to be correct. Obviously we cannot check correctness by exhaustive enumeration.

As a result, testing can never show the correctness of a program, but it can show the presence of errors. There are, however, mathematical ways to prove certain programs correct.

Programs, to a large extent, can be verified to be correct using various program correctness techniques.

The field is called software verification.

In a beginning programming course, one need only know that such techniques exist and that programs cannot be shown or proven correct by testing.

Who uses formal verification of software?

Formal software verification is very costly to use but is used when the cost of failure is very high.

NASA - space missions
Chip manufacturers (Intel, AMD, etc.) - cost of failed processors (Pentium bug) are very high.
Amazon - web and cloud services

Program testing is only useful for showing that a program appears to be good enough to use for some purpose.

A student can encounter such issues when a program works for them with some test data, but then does not work on another computer and/or with other test data.

At the release of the Windows 2000 operating system in February, 2000, a "leaked" memo from Microsoft claimed more the 63,000 bugs and other known problems with the software, which, about three years behind schedule, was released anyway, on February 17, 2000.

assertions
bounds checking (i.e., range checking)

Distributed, concurrent, and/or parallel software systems are exponentially more difficult to show correct than are sequential software systems.

Testing is much harder because execution never follows the same sequence for any two runs of the program.