Bad Software: What To Do When Software Fails.
[ Main | List
of Articles | Bad Software | UCC 2B | Law of Software
Quality | Digital Signatures | Bookstore | Court Cases
| Links | Press
Releases | About Us | What's
New ]
| Cem Kaner, Ph.D.,
J.D. |
c/o Dept of Computer
Sciences, Florida Tech |
| Law Office of Cem
Kaner |
150 West University
Blvd. |
| kaner@kaner.com |
Melbourne, FL 32901-6988 |
Software Negligence and Testing Coverage1
Copyright (c) Cem Kaner 1996. All rights reserved.
Several months ago, a respected member of the software quality
community posed the following argument to me:
|
A program fails in the field, and someone dies. This leads
to a trial. When the QA manager takes the stand, the plaintiff's
lawyer brings out three facts:
- The failure occurred when the program reached a specific
line of code.
- That line had never been tested, and that's why the bug had
not been found before the product was released.
- A coverage monitor was available. This is a
tool that allows the tester to determine which lines of code
have not yet been tested. Had the QA manager used the tool, and
tested to a level of complete coverage (all lines tested), this
bug would have been found, and the victim would be alive today.
Therefore, the lawyer has proved that the company was negligent
and the victim's family will win the lawsuit.
|
The question is, what's wrong with this argument? Anything?
After all, the company had a well-understood tool readily available
and if they had only used it, someone would not have died. How
could the company not be liable for negligence?
OVERVIEW
This presentation explores the legal concept of negligence
and the technical concept of coverage. The article advances
several related propositions:
Coverage
- The phrase, complete coverage, is misleading. This
"completeness" is measured only relative to a specific
population of possible test cases, such as lines of code, branches,
n-length sub-paths, predicates, etc. Even if you achieve complete
coverage for a given population of tests (such as, all lines
of code tested), you have not done complete, or even adequate,
testing.
- We can and should expand the list of populations of possible
test cases. We can measure coverage against each of these populations.
The decision as to whether to try for 1%, 10%, 50% or 100% coverage
against any given population is non-obvious. It involves tradeoffs
based on thoughtful judgment.
Negligence2
- Negligence liability attaches when injury or loss is caused
by a failure to satisfy a duty that was imposed by law, as a
matter of public policy. So the question of whether or not it
is negligent to fail to test every line of code turns
into a question of whether the company had a public duty to test
every line of code.
- The nature of the duty involved depends on the role of the
company in the development and release of the program.
- A company that publishes a software product has a duty to
take reasonable measures to ensure that its products are safe.
- A company that sells software development services might
have a duty to its client to deliver a program that is reasonably
competently coded and tested.
- A company that acts as an independent test laboratory might
owe a duty of competent testing and reporting to the client or
to the public.
- In any of these three cases, it is not obvious whether failure
to achieve 100% line coverage is negligent. The plaintiff will
have to prove that the tradeoff made by the software company
was unreasonable.
WHAT IS NEGLIGENCE?
I'll start by considering the situation of the software developer/publisher.
This provides the room to explore the coverage issues that are
the focus of this paper. The situations of the service providers
are of independent interest to the testing community and so will
also be considered below.
Under negligence law, software development companies must not
release products that pose an unreasonable risk of personal
injury or property damage.2 An injured
customer can sue your company for negligence if your company did
not take reasonable measures to ensure that the product
was safe.
Reasonable measures are those measures that a reasonable,
cautious company would take to protect the safety of its customers.
How do we determine whether a company has taken reasonable measures?
One traditional approach in law involves a simple cost-benefit
analysis. This was expressed as a formula by Judge Learned Hand
in the classic case of United States v. Carroll Towing Co.3:
Let B be the burden (expense) of preventing
a potential accident.
Let L be the severity of the loss if the
accident occurs.4
Let P be the probability of the accident.
Then failure to attempt to prevent a potential accident is
unreasonable if B< P
x L.
For example, suppose that a software error will cause a total
of $1,000,000 in damage to your customers. If you could prevent
this by spending less than $1,000,000, but don't, you are negligent.
If prevention would cost more than $1,000,000, and you don't spend
the money, you are not negligent.
In retrospect, after an accident has occurred, now that we
know that there is an error and what it is, it will almost always
look cheaper to have fixed the bug and prevented the accident.
But if the company didn't know about this bug when it released
the program, our calculations should include the cost of finding
the bug. What would it have cost to make the testing process
thorough enough that you would have found this bug during testing?
For example, if a bug in line 7000 crashes the program, B
would not be the cost of adding one test case that miraculously
checks this line (plus the cost of fixing the line). B
would be:
- the cost of strengthening the testing so that line 7000's
bug is found in the normal course of testing, or
- the cost of changing the design and programming practices
in a way that would have prevented this bug (and others like
it) in the first place.
Coming back to the coverage question, it seems clear that you
can prevent the crash-on-line-7000 bug by making sure that you
at least execute every line in the program. This is line coverage.
Line coverage measures the number / percentage of lines of
code that have been executed. But some lines contain branches-the
line tests a variable and does different things depending on the
variable's value. To achieve complete branch coverage, you
check each line, and each branch on multi-branch lines. To achieve
complete path coverage, you must test every path through
the program, an impossible task.5
The argument made at the start of this article would have us
estimate B as the cost of achieving complete line
coverage. Is that the right estimate of what it would cost
a reasonable software company to find this bug? I don't think
so.
Line coverage is just one narrow type of coverage of the program.
Yes, complete line coverage would catch a syntax error on line
7000 that crashes the program, but what about all the other bugs
that wouldn't show up under this simple testing? Suppose that
it would cost an extra $50,000 to achieve complete line coverage.
If you had an extra $50,000 to spend on testing, is line coverage
what you would spend it on? Probably not.
Most traditional coverage measures look at the simplest building
blocks of the program (lines of code) and the flow of control
from one line to the next. These are easy and obvious measures
to create, but they can miss important bugs.
A great risk of a measurement strategy is that it is too tempting
to pick a few convenient measures and then ignore anything else
that is more subtle or harder to measure. When people talk of
complete coverage or 100% coverage, they are using
terribly misleading language. Many bugs will not be detected even
if there is complete line coverage, complete branch coverage,
or even if there were complete path coverage.
If you spend all of your extra money trying to achieve complete
line coverage, you are spending none of your extra money looking
for the many bugs that won't show up in the simple tests that
can let you achieve line coverage quickly. Here are some examples:
- A key characteristic of object-oriented programming is that
each object can deal with any type of data (integer, real, string,
etc.) that you pass to it. Suppose that you pass data to an object
that it wasn't designed to accept. The program might crash or
corrupt memory when it tries to deal with it.. Note that you
won't run into this problem by checking every line of code, because
the failure is that the program doesn't expect this situation,
therefore it supplies no relevant lines for you to test.
There is an identifiable population of tests that can reveal
this type of problem. If you pass every type of data to every
object in your product, you will find every error that involves
an object that doesn't properly handle a type of data that is
passed to it. You can count the number of possible tests involved
here, and you can track the number you've actually run. Therefore,
we can make a coverage measure here.
- A Windows program might fail when printing.6
You achieve complete coverage of printer compatibility tests
(across printers) if you use the set of all Windows-supported
printers, using all Windows printer drivers available for each
of these printers. These drivers are part of the operating system,
not part of your program, but your program can fail or cause
a system failure when working with them. The critical test case
is not whether a particular line of code is tested, but whether
it is tested in conjunction with a specific driver.
- Suppose that you test a desktop publishing program. One effective
way to find bugs and usability failures is to use the program
to create interesting documents. This approach is particularly
effective if you use a stack of existing documents and try to
make exact copies of them with your program. To create your stack,
perhaps you'll use all the sample files and examples that come
with PageMaker, Quark, FrameMaker, and one or two other desktop
publishers. In this case, you achieve complete coverage if you
recreate all of the samples provided by all available desktop
publishers.
The Appendix to this article lists 101 measures of testing
coverage. Line coverage is just one of many. There are too many
possible tests for you to achieve complete coverage for every
type of coverage in the list.
I hope that the list helps you make priority decisions consciously
and communicate them explicitly. The tradeoffs will differ across
applications-in one case you might set an objective of 85% for
line coverage,7 100% for data coverage,
but only 5% for printer / driver compatibility coverage. For a
different program whose primary benefit is beautiful output, you
would assign printer coverage a much higher weight.
If you had an extra $50,000 to spend, would you focus your
efforts on increasing line coverage or increasing some of the
others? Surely, the answer should depend on the nature of your
application, the types of risks involved in your application,
and the probable effectiveness of the different types of tests.
The most desirable strategy will be the one that is most likely
to find the most bugs, or to find the most serious bugs.
The legal (negligence) test for the coverage tradeoffs that
you make is reasonability. No matter what tradeoffs you
make, and no matter how much money you spend on testing, you will
miss some bugs.8 Whether or not those
bugs are products of negligence in the testing process depends
on your reasonability, not on your luck in selecting just the
right tests.
Your task is to prioritize among tests in the way that a reasonably
careful company would-and to me that means to select the test
strategy that you rationally believe is the most likely to find
the most bugs or the most serious bugs.
There is no magic talisman in coverage that you can use blindly
and be free of negligence liability. Being reasonable in your
efforts to safeguard your customer requires careful thought and
analysis. Achieving complete (line, branch, whatever) coverage
will not insulate you. The plaintiff's attorney will just ask
you why you spent all that money on line coverage, at the expense
of, say, interrupt coverage. Try to assign your weights sensibly,
in a way that you can explain and justify.
The same reasoning applies to customer satisfaction in general.
If your approach will control the risks, you've done your job.
But if you can identify gaps that leave an unreasonable degree
of risk to customer safety or satisfaction, there is no reasonable
alternative to addressing those risks.
As a final note, I hope that you'll take a moment to appreciate
the richness, multidimensionality, and complexity of what we do
as testers. Sometimes we hear that only programmers should be
testers, or that all testing should be driven from a knowledge
of the workings of the code. This list highlights the degree to
which that view is mistaken. Programming skills and code knowledge
are essential for glass box testing tasks, but as we explore the
full range of black box testing approaches, we find that we also
need skills and knowledge in:
- the application itself (subject matter experts)
- safety and hazard analysis
- usability, task analysis, human error (human factors analysis)
- hardware (modems, printers, etc.)
- customer relations.
A person who has these skills but who can't program may be
an invaluable member of a black box testing team.
SOFTWARE MALPRACTICE
Malpractice? Can you be sued for malpractice? I've heard alarmist
talk about software malpractice - the main point of this section
is to say, "Calm down."
A person commits malpractice if she provides professional services
that don't meet the level that would be provided by a reasonable
member of the profession in this community. For example, if you
hire a lawyer to draft a contract, and the lawyer makes mistakes
that any reasonable lawyer would never make, then you can sue
the lawyer for malpractice.
Many people in the software community sell services rather
than products. We sell our programming or testing skills on a
contract basis, doing work that is defined by our customer. What
if you do a terrible job? In any contract for services, the service
provider is under a duty to perform the services in a reasonable
and workmanlike manner.9 If you hand
back badly written code that has never been tested, you might
face claims for breach of contract, misrepresentation, negligence,
or malpractice:
- You're probably liable for breach of contract if you promised
working code and you delivered garbage.
- You're probably liable for misrepresentation if you said
that you knew how to write this type of application, but you've
never done one before.
- If the only harm caused by your product is economic (it cost
your customer money), then there's an excellent chance that the
courts will refuse to allow a negligence-based suit to go forward.
If you did a lousy job of programming and testing, your customer
should sue you for breach of contract. 10
If the suit does go forward as a negligence suit, then your customer
may have to prove that no reasonable person would have done as
bad a job as you. This population of "reasonable persons
who sell software services" includes hobbyists, recent university
graduates, high school students, and a variety of other odd characters.
Proving negligence can be a big challenge.
- The malpractice case is much clearer and much more dangerous.
The care you took in providing your services is compared to the
level of care that is to be expected from a professional software
service provider. If your work doesn't live up to that standard,
and your customer loses money as a result, then it doesn't matter
whether a hobbyist would have done the job more sloppily than
you. You acted as a professional. You provided sub-professional
services. You lose.
There's just a small problem in this malpractice case - what
professional standards should your work be compared to?
It's relatively easy to compare lawyers' work to professional
standards because there are professional standards. I belong
to the California Bar. Can't be a lawyer in California without
being a member of the State Bar. To be admitted to the Bar, I
had to take a three-day exam on basic legal knowledge that every
practicing lawyer is expected to have. And I had to take another
tough exam on professional responsibility - my professional and
ethical obligations to my clients and to the public. These exams
illustrate a standard of knowledge that is accepted by the entire
legal community. As an attorney, I am required by my profession's
Code of Professional Responsibility to competently represent my
clients. If I fail to follow this Code, I can be hauled before
a State Bar Court, prosecuted for violating the profession's code
of ethics, and severely punished.
We have nothing comparable in the software community. Some
groups have attempted to certify programmers but none has won
general (let alone universal) acceptance. Similarly, there are
no widely accepted standards for software testers. There is also
no enforceable code of ethics.
Someone who advertises their services in terms of an industry
certification is inviting an evaluation against a higher standard.
My letterhead advertises that I am an ASQC-Certified Quality Engineer
(ASQC-CQE). This implicitly invites any (technical, rather than
legal) client of mine to judge me against a standard of knowledge
that one would expect from an ASQC-CQE, which is a more precisely
defined standard than that of a "reasonable person who does
quality-related work." It therefore makes a claim of malpractice
easier to argue (against me) in my case than in the general case.
You can choose to expose yourself to that risk, or you can choose
not to.
NEGLIGENT CERTIFICATION
An independent software testing company can get itself into
all sorts of interesting trouble. Under the right circumstances,
the lab can get itself into the same trouble as any other software
service provider (see the discussions of product development negligence
and malpractice above). In addition, let me mention two other
well known court cases:
- Hanberry v. Hearst Corporation: A consumer sued Good
Housekeeping magazine for negligent endorsement of a defectively
designed shoe.
- Hempstead v. General Fire Extinguisher Corporation:
A worker injured by the explosion of a fire extinguisher sued
Underwriters' Laboratories for negligence in inspecting, testing,
and approving the design of the extinguisher.
In both cases, the public was told that this was a product
that had been evaluated and approved. The organization that had
allegedly evaluated and approved the product was sued.
If the public is told that, because it was you who tested
the product, the public should believe that a product that
you tested is reliable and safe, then that product had better
be reliable and safe. Otherwise, cranky and injured customers
will come to your door too, saying that they bought the product
because they trusted your recommendation.
APPENDIX: THE MANY TYPES OF TESTING COVERAGE
This appendix lists 101 coverage measures. Coverage
measures the amount of testing done of a certain type. Because
testing is done to find bugs, coverage is also a measure of your
effort to detect a certain class of potential errors. For example,
100% line coverage doesn't just mean that you've executed every
line of code; it also means that you've tested for every bug that
can be revealed by simple execution of a line of code.
Please note that this list is far from complete. For example,
it doesn't adequately cover safety issues. Nor does it convey
the richness of the tests and test strategies that you can derive
from customer complaints and surveys and from tests involving
controlled customer observation. And you will add measures as
you analyze the application that you're testing.
- Line coverage. Test every line of code (Or
Statement coverage: test every statement).
- Branch coverage. Test every line, and every
branch on multi-branch lines.
- N-length sub-path coverage. Test every sub-path
through the program of length N. For example, in a 10,000 line
program, test every possible 10-line sequence of execution.
- Path coverage. Test every path through the
program, from entry to exit. The number of paths is impossibly
large to test.
- Multicondition or predicate coverage. Force
every logical operand to take every possible value. Two different
conditions within the same test may result in the same branch,
and so branch coverage would only require the testing of one
of them.
- Trigger every assertion check in the program.
Use impossible data if necessary.
- Loop coverage. "Detect bugs that exhibit
themselves only when a loop is executed more than once."
- Every module, object, component, tool, subsystem, etc.
This seems obvious until you realize that many programs rely
on off-the-shelf components. The programming staff doesn't have
the source code to these components, so measuring line coverage
is impossible. At a minimum (which is what is measured here),
you need a list of all these components and test cases that exercise
each one at least once.
- Fuzzy decision coverage. If the program makes
heuristically-based or similarity-based decisions, and uses comparison
rules or data sets that evolve over time, check every rule several
times over the course of training.
- Relational coverage. "Checks whether the
subsystem has been exercised in a way that tends to detect off-by-one
errors" such as errors caused by using < instead of <=.
This coverage includes:
- Every boundary on every input variable.
- Every boundary on every output variable.
- Every boundary on every variable used in intermediate
calculations.
- Data coverage. At least one test case for each
data item / variable / field in the program.
- Constraints among variables: Let X and Y be
two variables in the program. X and Y constrain each other if
the value of one restricts the values the other can take. For
example, if X
is a transaction date and Y
is the transaction's confirmation date, Y can't occur before X.
- Each appearance of a variable. Suppose that
you can enter a value for X
on three different data entry screens, the value of X is displayed
on another two screens, and it is printed in five reports. Change
X at
each data entry screen and check the effect everywhere else X appears.
- Every type of data sent to every object. A
key characteristic of object-oriented programming is that each
object can handle any type of data (integer, real, string, etc.)
that you pass to it. So, pass every conceivable type of data
to every object.
- Handling of every potential data conflict.
For example, in an appointment calendaring program, what happens
if the user tries to schedule two appointments at the same date
and time?
- Handling of every error state. Put the program
into the error state, check for effects on the stack, available
memory, handling of keyboard input. Failure to handle user errors
well is an important problem, partially because about 90% of
industrial accidents are blamed on human error or risk-taking.
Under the legal doctrine of foreseeable misuse, the manufacturer
is liable in negligence if it fails to protect the customer from
the consequences of a reasonably foreseeable misuse of the product.
- Every complexity / maintainability metric against every
module, object, subsystem, etc. There are many such measures.
Jones lists 20 of them. People sometimes ask whether any of these
statistics are grounded in a theory of measurement or have practical
value. But it is clear that, in practice, some organizations
find them an effective tool for highlighting code that needs
further investigation and might need redesign.
- Conformity of every module, subsystem, etc. against
every corporate coding standard. Several companies believe
that it is useful to measure characteristics of the code, such
as total lines per module, ratio of lines of comments to lines
of code, frequency of occurrence of certain types of statements,
etc. A module that doesn't fall within the "normal"
range might be summarily rejected (bad idea) or re-examined to
see if there's a better way to design this part of the program.
- Table-driven code. The table is a list of addresses
or pointers or names of modules. In a traditional CASE statement,
the program branches to one of several places depending on the
value of an expression. In the table-driven equivalent, the program
would branch to the place specified in, say, location 23 of the
table. The table is probably in a separate data file that can
vary from day to day or from installation to installation. By
modifying the table, you can radically change the control flow
of the program without recompiling or relinking the code. Some
programs drive a great deal of their control flow this way, using
several tables. Coverage measures? Some examples:
- check that every expression selects the correct table element
- check that the program correctly jumps or calls through every
table element
- check that every address or pointer that is available to
be loaded into these tables is valid (no jumps to impossible
places in memory, or to a routine whose starting address has
changed)
- check the validity of every table that is loaded at any customer
site.
- Every interrupt. An interrupt is a special
signal that causes the computer to stop the program in progress
and branch to an interrupt handling routine. Later, the program
restarts from where it was interrupted. Interrupts might be triggered
by hardware events (I/O or signals from the clock that a specified
interval has elapsed) or software (such as error traps). Generate
every type of interrupt in every way possible to trigger that
interrupt.
- Every interrupt at every task, module, object, or even
every line. The interrupt handling routine might change
state variables, load data, use or shut down a peripheral device,
or affect memory in ways that could be visible to the rest of
the program. The interrupt can happen at any time-between any
two lines, or when any module is being executed. The program
may fail if the interrupt is handled at a specific time. (Example:
what if the program branches to handle an interrupt while it's
in the middle of writing to the disk drive?)
The number of test cases here is huge, but that doesn't mean
you don't have to think about this type of testing. This is path
testing through the eyes of the processor (which asks, "What
instruction do I execute next?" and doesn't care whether
the instruction comes from the mainline code or from an interrupt
handler) rather than path testing through the eyes of the reader
of the mainline code. Especially in programs that have global
state variables, interrupts at unexpected times can lead to very
odd results.
- Every anticipated or potential race. Imagine
two events, A
and B.
Both will occur, but the program is designed under the assumption
that A
will always precede B.
This sets up a race between A
and B
-if B
ever precedes A,
the program will probably fail. To achieve race coverage, you
must identify every potential race condition and then find ways,
using random data or systematic test case selection, to attempt
to drive B
to precede A
in each case.
Races can be subtle. Suppose that you can enter a value for a
data item on two different data entry screens. User 1 begins
to edit a record, through the first screen. In the process, the
program locks the record in Table 1. User 2 opens the second
screen, which calls up a record in a different table, Table 2.
The program is written to automatically update the corresponding
record in the Table 1 when User 2 finishes data entry. Now, suppose
that User 2 finishes before User 1. Table 2 has been updated,
but the attempt to synchronize Table 1 and Table 2 fails. What
happens at the time of failure, or later if the corresponding
records in Table 1 and 2 stay out of synch?
- Every time-slice setting. In some systems,
you can control the grain of switching between tasks or processes.
The size of the time quantum that you choose can make race bugs,
time-outs, interrupt-related problems, and other time-related
problems more or less likely. Of course, coverage is an difficult
problem here because you aren't just varying time-slice settings
through every possible value. You also have to decide which tests
to run under each setting. Given a planned set of test cases
per setting, the coverage measure looks at the number of settings
you've covered.
- Varied levels of background activity. In a
multiprocessing system, tie up the processor with competing,
irrelevant background tasks. Look for effects on races and interrupt
handling. Similar to time-slices, your coverage analysis must
specify
- categories of levels of background activity (figure out something
that makes sense) and
- all timing-sensitive testing opportunities (races, interrupts,
etc.).
- Each processor type and speed. Which processor
chips do you test under? What tests do you run under each processor?
You are looking for:
- speed effects, like the ones you look for with background
activity testing, and
- consequences of processors' different memory management rules,
and
- floating point operations, and
- any processor-version-dependent problems that you can learn
about.
- Every opportunity for file / record / field locking.
- Every dependency on the locked (or unlocked) state
of a file, record or field.
- Every opportunity for contention for devices or resources.
- Performance of every module / task / object.
Test the performance of a module then retest it during the next
cycle of testing. If the performance has changed significantly,
you are either looking at the effect of a performance-significant
redesign or at a symptom of a new bug.
- Free memory / available resources / available stack
space at every line or on entry into and exit out of every module
or object.
- Execute every line (branch, etc.) under the debug version
of the operating system. This shows illegal or problematic
calls to the operating system.
- Vary the location of every file. What happens
if you install or move one of the program's component, control,
initialization or data files to a different directory or drive
or to another computer on the network?
- Check the release disks for the presence of every file.
It's amazing how often a file vanishes. If you ship the product
on different media, check for all files on all media.
- Every embedded string in the program. Use a
utility to locate embedded strings. Then find a way to make the
program display each string.
Operation of every function / feature / data handling operation
under:
- Every program preference setting.
- Every character set, code page setting, or country
code setting.
- The presence of every memory resident utility (inits,
TSRs).
- Each operating system version.
- Each distinct level of multi-user operation.
- Each network type and version.
- Each level of available RAM.
- Each type / setting of virtual memory management.
- Compatibility with every previous version of the program.
- Ability to read every type of data available in every
readable input file format. If a file format is subject
to subtle variations (e.g. CGM) or has several sub-types (e.g.
TIFF) or versions (e.g. dBASE), test each one.
- Write every type of data to every available output
file format. Again, beware of subtle variations in file
formats-if you're writing a CGM file, full coverage would require
you to test your program's output's readability by every
one of the main programs that read CGM files.
- Every typeface supplied with the product. Check
all characters in all sizes and styles. If your program adds
typefaces to a collection of fonts that are available to several
other programs, check compatibility with the other programs (nonstandard
typefaces will crash some programs).
- Every type of typeface compatible with the program.
For example, you might test the program with (many different)
TrueType and Postscript typefaces, and fixed-sized bitmap fonts.
- Every piece of clip art in the product. Test
each with this program. Test each with other programs that should
be able to read this type of art.
- Every sound / animation provided with the product.
Play them all under different device (e.g. sound) drivers
/ devices. Check compatibility with other programs that should
be able to play this clip-content.
- Every supplied (or constructible) script to
drive other machines / software (e.g. macros) / BBS's and information
services (communications scripts).
- All commands available in a supplied communications
protocol.
- Recognized characteristics. For example, every
speaker's voice characteristics (for voice recognition software)
or writer's handwriting characteristics (handwriting recognition
software) or every typeface (OCR software).
- Every type of keyboard and keyboard driver.
- Every type of pointing device and driver at every resolution
level and ballistic setting.
- Every output feature with every sound card and associated
drivers.
- Every output feature with every type of printer and
associated drivers at every resolution level.
- Every output feature with every type of video card
and associated drivers at every resolution level.
- Every output feature with every type of terminal and
associated protocols.
- Every output feature with every type of video monitor
and monitor-specific drivers at every resolution level.
- Every color shade displayed or printed to every color
output device (video card / monitor / printer / etc.) and associated
drivers at every resolution level. And check the conversion
to grey scale or black and white.
- Every color shade readable or scannable from each type
of color input device at every resolution level.
- Every possible feature interaction between video card
type and resolution, pointing device type and resolution, printer
type and resolution, and memory level. This may seem
excessively complex, but I've seen crash bugs that occur only
under the pairing of specific printer and video drivers at a
high resolution setting. Other crashes required pairing of a
specific mouse and printer driver, pairing of mouse and video
driver, and a combination of mouse driver plus video driver plus
ballistic setting.
- Every type of CD-ROM drive, connected to every type
of port (serial / parallel / SCSI) and associated drivers.
- Every type of writable disk drive / port / associated
driver. Don't forget the fun you can have with removable
drives or disks.
- Compatibility with every type of disk compression software.
Check error handling for every type of disk error, such as full
disk.
- Every voltage level from analog input devices.
- Every voltage level to analog output devices.
- Every type of modem and associated drivers.
- Every FAX command (send and receive operations) for
every type of FAX card under every protocol and driver.
- Every type of connection of the computer to the telephone
line (direct, via PBX, etc.; digital vs. analog connection and
signaling); test every phone control command under every telephone
control driver.
- Tolerance of every type of telephone line noise and
regional variation (including variations that are out of spec)
in telephone signaling (intensity, frequency, timing, other characteristics
of ring / busy / etc. tones).
- Every variation in telephone dialing plans.
- Every possible keyboard combination. Sometimes
you'll find trap doors that the programmer used as hotkeys to
call up debugging tools; these hotkeys may crash a debuggerless
program. Other times, you'll discover an Easter Egg (an undocumented,
probably unauthorized, and possibly embarrassing feature). The
broader coverage measure is every possible keyboard combination
at every error message and every data entry point. You'll
often find different bugs when checking different keys in response
to different error messages.
- Recovery from every potential type of equipment failure.
Full coverage includes each type of equipment, each driver, and
each error state. For example, test the program's ability to
recover from full disk errors on writable disks. Include floppies,
hard drives, cartridge drives, optical drives, etc. Include the
various connections to the drive, such as IDE, SCSI, MFM, parallel
port, and serial connections, because these will probably involve
different drivers.
- Function equivalence. For each mathematical
function, check the output against a known good implementation
of the function in a different program. Complete coverage involves
equivalence testing of all testable functions across all possible
input values.
- Zero handling. For each mathematical function,
test when every input value, intermediate variable, or output
variable is zero or near-zero. Look for severe rounding errors
or divide-by-zero errors.
- Accuracy of every graph, across the full range
of graphable values. Include values that force shifts in the
scale.
- Accuracy of every report. Look at the correctness
of every value, the formatting of every page, and the correctness
of the selection of records used in each report.
- Accuracy of every message.
- Accuracy of every screen.
- Accuracy of every word and illustration in the manual.
- Accuracy of every fact or statement in every data file
provided with the product.
- Accuracy of every word and illustration in the on-line
help.
- Every jump, search term, or other means of navigation
through the on-line help.
- Check for every type of virus / worm that could ship
with the program.
- Every possible kind of security violation of the program,
or of the system while using the program.
- Check for copyright permissions for every statement,
picture, sound clip, or other creation provided with the program.
- Verification of the program against every program requirement
and published specification.
- Verification of the program against user scenarios.
Use the program to do real tasks that are challenging
and well-specified. For example, create key reports, pictures,
page layouts, or other documents events to match ones that have
been featured by competitive programs as interesting output or
applications.
- Verification against every regulation (IRS, SEC, FDA,
etc.) that applies to the data or procedures of the program.
Usability tests of:
- Every feature / function of the program.
- Every part of the manual.
- Every error message.
- Every on-line help topic.
- Every graph or report provided by the program.
Localizability / localization tests:
- Every string. Check program's ability to display
and use this string if it is modified by changing the length,
using high or low ASCII characters, different capitalization
rules, etc.
- Compatibility with text handling algorithms under other
languages (sorting, spell checking, hyphenating, etc.)
- Every date, number and measure in the program.
- Hardware and drivers, operating system versions, and
memory-resident programs that are popular in other countries.
- Every input format, import format, output format, or
export format that would be commonly used in programs that are
popular in other countries.
- Cross-cultural appraisal of the meaning and propriety
of every string and graphic shipped with the program.
Footnotes
- Portions of this paper were originally published in C. Kaner,
"Software Negligence & Testing Coverage", Software
QA Quarterly, Vol. 2, #2, p. 18, 1995.
-
- Note the restriction on negligence suits. Most lawsuits over
defective software are for breach of contract or fraud, partially
because they don't involve personal injury or property damage.
-
- Federal Reporter, Second Series, volume 159, page
169 (United States Court of Appeals, 2nd Circuit, 1947); for
a more recent discussion see W. Landes and R. Posner, The
Economic Structure of Tort Law, Harvard University Press,
1987.
-
- See C. Kaner "Quality Cost Analysis: Benefits and Risks"
Software QA, Vol. 3, No. 1, p. 23, 1996. The amount, PxL,
is an estimate of the External Failure Costs associated with
this potential accident. Note that there are two different estimates
of an external failure cost. You can estimate the amount it will
cost your company if the failure occurs, or you can estimate
how much it will cost your customers. Most cost-of-quality analyses
will look at your company's costs. In negligence cases, judges
and juries look at customers' losses.
-
- G. Myers, The Art of Software Testing, Wiley, 1979.
-
- I've been asked by software quality workers who are not familiar
with mass-market issues why anyone would want to spend much effort
on printer testing. Here is some relevant information. The technical
support costs associated with printer incompatibilities are significant
for several mass-market software companies. For example, at a
presentation on "Wizards" at the OpCon West 96 Customer
Service & Support Conference, March 18, 1996, Keith Sturdivant
reported that print/merge calls on Microsoft Word had been averaging
over 50 support-minutes per caller until Microsoft developed
a special print troubleshooting Wizard. In my experience at other
companies, this is not a surprisingly high number. In "Benchmark
Report: Technical Support Cost Ratios," Softletter,
Vol. 10, #10, p. 1, September 21, 1993, Jeffrey Tarter reported
an average tech support call cost of $3 per minute. At this rate,
the 50 minute printer call cost is $150 - for a product that
is bundled free with a computer or sold at retail for as little
as $99. The cost and aggravation to the customers is also very
high.
-
- It can be very expensive and difficult to achieve
100% line or branch coverage. Grady reports that values of 80-85%
are reasonably achievable. R.B. Grady, Practical Software
Metrics for Project Management and Process Improvement, PTR
Prentice Hall (Hewlett-Packard Professional Books), 1992, p.
171.
-
- See Chapter 2 of C. Kaner, J. Falk, and H.Q. Nguyen, Testing
Computer Software (2nd. Ed.), Van Nostrand Reinhold, 1993.
-
- This is clearly discussed in R.T. Nimmer The Law of Computer
Technology: Rights, Licenses, Liabilities (2nd
Ed.) Warren, Gorham & Lamont, 1992 (supplemented 1994). See
Chapter 9 generally, especially section 9.16.
-
- This is a complex issue for lawyers, that probably shouldn't
be a complex issue. The new Article 2B of the Uniform Commercial
Code will clean up most or all of it by making this a clearly
defined contract issue. For more on the drafting of Article 2B,
see C. Kaner, "Uniform Commercial Code Article 2B: A New
Law of Software Quality," in press, Software QA,
Vol. 3, No. 2, 1996. To read the latest draft of Article 2B,
check the Uniform Commercial Code Article 2B Revision Home
Page, at http://www.law.uh.edu/ucc2b.
- The American Society for Quality Control recently developed
a Certification for Software Quality Engineers (CSQE). I contributed
to this effort by working in the meeting that developed the final
"body of knowledge" to be used in the certification
exam. As an ASQC-Certified Quality Engineer, I generally support
the ASQC's certification efforts, and I think that there is genuine
personal value to be gained from the study and effort required
for a CSQE.
- I don't want the following comments to be taken as an attack
on the CSQE designation or process. They are not an attack. They
are a statement of limitations, made by someone who was part
of the process, and I am making them to help avoid the confusion
that could result in someone trying to identify the CSQE Body
of Knowledge as a legally enforceable community standard.
- The CSQE Body of Knowledge questionnaire was given to a select
group of people in the software community. It went to some ASQC
members (I never received one, even though I am moderately active
in ASQC, a member of ASQC's software division, and reasonably
active in the software testing community. Nor did Hung Nguyen,
even though he co-authored Testing Computer Software,
is a Senior Member of the ASQC, and is very active in the San
Francisco ASQC chapter.) My understanding is that it was sent
to a random sample of ASQC members. The questionnaire also went
to several SPIN members and to attendees at one or two testing-related
conferences. It missed several societies. For example, it never
went to the Association for Computing Machinery membership, even
though this is the largest computer-related professional society
in the US. It never went to the Human Factors & Ergonomics
Society, even though that organization includes many members
who are directly involved in software quality (designing, testing
and evaluating software user interfaces for safety, usability,
etc.) I know several software testers who are not active in ASQC.
- Survey results were contained in the working document, The
Profession of Software Quality Engineering: Results from a Survey
of Tasks and Knowledge Areas for the Software Quality Engineer
- A Job Analysis Conducted on Behalf of the American Society
for Quality Control by Scott Wesley, Ph.D. and Michael Rosenfeld,
Ph.D. of Educational Testing Services. This document notes than
only 18.1% of the recipients of the survey responded, compared
to 36%, 38%, and 48% response rates for surveys for other ASQC
designations.
- The survey published several types of demographic information
about respondents. It didn't analyze the responses in terms of
software market segment, but my sense from looking at the other
data was that relatively few came from the mass-market software
industry. It also appeared that relatively few people from the
mass-market software industry contributed to the development
of the questionnaire. I have limited expertise in some other
areas, so I am less confident in identifying other software markets
as under-represented.
- My sense is that software development and testing are not
homogenous. I believe that we approach problems differently for
mass-market consumer software than for life-critical diagnostic
systems or for life-critical embedded systems. The differences
are not merely in thoroughness or degree to which they are systematic.
The approaches are qualitatively different. For example, some
groups in Microsoft have what looks like a mature development
and testing process, but it is not an SEI-mature process. The
ASQC-Certified Quality Engineer approach recognizes the diversity
of the Quality Control community across industries. The ASQC-Certified
Software Quality Engineer approach does not appear to recognize
this diversity. The ASQC-CSQE Body of Knowledge appears to treat
the field as homogenous.
- The ASQC-CQE Body of Knowledge requires background knowledge
in the history of the quality control movement. The ASQC-CSQE
Body of Knowledge appears much less scholarly to me. Couple this
with a lack of university training available for software quality,
and I question the degree to which we can expect as a matter
of law (for malpractice purposes) that the typical software quality
worker would have a thoughtful, historical insight into the techniques
and approaches that she uses.
In sum, I believe that the CSQE Body of Knowledge represents
an interesting body of knowledge to study. I believe that it
would be good for an experienced member of the community to know
the material covered by the CSQE. And I believe that a CSQE designation
tells me a fair bit about the level of knowledge and commitment
of a job candidate or coworker. But I do not believe that it
is based on a representative study of the software quality community's
practices and I do not believe that it can or should serve as
a standard that is useful as evidence of community norms.
As noted in the main body, however, if someone says they are
a CSQE in their advertising, they are representing themselves
as knowledgeable of the CSQE Body of Knowledge and this invites
comparison of that person's efforts to those one would
expect from a CSQE.
- California Appellate Reports, 2nd Series, Vol.
276, p. 680 (California Court of Appeal, 1969).
- Federal Supplement, Vol. 269, p. 109 (United States District
Court for the District of Delaware, applying Virginia law, 1969).
- N. Leveson, Safeware: System Safety and Computers,
Addison-Wesley, 1995.
- See G. Myers, The Art of Software Testing, Wiley,
1979, and Chapter 2 of C. Kaner, J. Falk, and H.Q. Nguyen, Testing
Computer Software (2nd. Ed.), Van Nostrand Reinhold, 1993.
- G. Myers, The Art of Software Testing, Wiley, 1979
(multicondition coverage) and B. Beizer, Software Testing
Techniques (2nd Ed.), Van Nostrand Reinhold, 1990.
- B. Marick, The Craft of Software Testing, Prentice
Hall, 1995, p. 146.
- B. Marick, The Craft of Software Testing, Prentice
Hall, 1995, p. 147.
- Boundaries are classically described in numeric terms, but
any change-point in a program can be a boundary. If the program
works one way on one side of the change-point and differently
on the other side, what does it matter whether the change-point
is a number, a state variable, an amount of disk space or available
memory, or a change in a document from one typeface to another,
etc.? See C. Kaner, J. Falk, and H.Q. Nguyen, Testing Computer
Software (2nd. Ed.), Van Nostrand Reinhold, 1993, p. 399-401.
- B.S. Dhillon, Human Reliability With Human Factors,
Pergamon Press, 1986, p. 153
- This doctrine is cleanly explained in S. Brown (Ed.) The
Product Liability Handbook: Prevention, Risk, Consequence, and
Forensics of Product Failure, Van Nostrand Reinhold, 1991,
pp. 18-19.
- C. Jones, Applied Software Measurement, McGraw-Hill,
1991, p. 238-341.
- B. Beizer, Software Testing Techniques (2nd Ed.),
Van Nostrand Reinhold, 1990, provides a sympathetic introduction
to these measures. R.L. Glass, Building Quality Software,
Prentice Hall, 1992, and R.B. Grady, D.L. Caswell, Software
Metrics: Establishing a Company-Wide Program, Prentice Hall,
1987, provide valuable perspective.
- For example, C. Kaner, J. Falk, and H.Q. Nguyen, Testing
Computer Software (2nd Ed.), Van Nostrand Reinhold, 1993,
pp. 47-48; also R.L. Glass, Building Quality Software,
Prentice Hall, 1992, "Software metrics to date have not
produced any software quality results which are useful in practice"
p. 303.
- R.B. Grady, D.L. Caswell, Software Metrics: Establishing
a Company-Wide Program, Prentice Hall, 1987 and R.B. Grady,
Practical Software Metrics for Project Management and Process
Improvement, PTR Prentice Hall (Hewlett-Packard Professional
Books), 1992, p. 87-90.
- Here as in many other areas, see Appendix 1 of C. Kaner,
J. Falk, and H.Q. Nguyen, Testing Computer Software (2nd
Ed.), Van Nostrand Reinhold, 1993 for a list and discussion of
several hundred types of bugs, including interrupt-related, race-condition-related,
etc.
| Cem Kaner consults on technical and management issues, practices
law, and teaches within the software development community. His
book, Testing Computer Software, received the Award of
Excellence in the Society for Technical Communication's 1993
Northern California Technical Publications Competition. He has
managed every aspect of software development, including software
development projects, software testing groups and user documentation
groups. He has also worked as a programmer, a human factors analyst
/ UI designer, a salesperson, a technical writer, an associate
in an organization development consulting firm, and as an attorney
(typically representing customers and software development service
providers). He has also served pro bono as a Deputy District
Attorney, as an investigator/mediator for Santa Clara County's
Consumer Affairs Department, as an Examiner for the California
Quality Awards. He holds a B.A. (Math, Philosophy, 1974), a J.D.
(1993), and a Ph.D. (Experimental Psychology, 1984) and is Certified
in Quality Engineering by the American Society for Quality Control.
He teaches at UC Berkeley Extension, and by private arrangement,
on software testing and on the law of software quality. |
The articles at this web site are not legal advice. They do
not establish a lawyer/client relationship between me and you.
I took care to ensure that they were well researched at the time
that I wrote them, but the law changes quickly. By the time you
read this material, it may be out of date. Also, the laws of the
different States are not the same. These discussions might not
apply to your circumstances. Please do not take legal action on
the basis of what you read here, without consulting your own attorney.
Questions or problems regarding this web site should be directed
to Cem Kaner, kaner@kaner.com.
Last modified: Tuesday November 11, 1997. Copyright ©
1997, Cem Kaner. All rights reserved.