TESTING
AND DEBUGGING OF DISTRIBUTED SOFTWARE
José C. Cunha,
Henryk Krawczyk
This paper
introduces the topic of testing and debugging of distributed software in this
special issue of the Computers and Artificial Intelligence Journal.
A global picture is given of the problems involved in developing
distributed applications in order to motivate
the need for testing and debugging activities. The main issues and approaches of
testing and debugging are surveyed, the focus
being on the identification of current and future trends. We
conclude by introducing the papers which were selected for this special
issue.
EXECUTION
REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS
Jacques Chassin
de Kergommeaux, Michiel Ronsse,
Koen De
Bosschere
Clusters of
shared-memory symmetric multiprocessors are
increasingly used for high performance computing. To exploit in a
convenient way both the inner parallelism of nodes and the parallelism between
nodes, programming models for communicating threads are being developed.
However, most of these models result in programs exhibiting non-deterministic
behavior. This makes cyclic debugging of programs impossible, unless an
efficient execution replay system can be provided. This article describes such
an execution replay system for distributed thread programming combining
synchronization primitives for threads sharing the same node, with communication
primitives for threads of different nodes.
The execution replay system combines the most efficient trace size
reduction technique for shared memory, based on the use of logical clocks, with
a very efficient compression technique for trace data that originates from the
test functions used in non-blocking communications.
TESTABLE
ENVIRONMENTS OF DISTRIBUTED OBJECTS
Bogdan Wiszniewski
This
paper addresses the issue of remote testing and diagnosis of distributed objects
and their services by introducing the concept of a Testable Environment of
Distributed Objects (TEDO). A project is currently underway which involves
development of TEDO based on fault-tolerant extensions of the
Common Object Request Broker Architecture (CORBA). A major part of this
effort is a development of test suite supporting remote object connectivity and
dynamic testbed configuration. Of particular importance are the increased
performance costs incurred by the extended object functionality, as well as
interoperability issues arising from heterogeneity of programming environments
and their related commercial tools.
ADAPTIVE
DISTRIBUTED BREAKPOINT DETECTION AND CHECKPOINT SPACE REDUCTION IN MESSAGE
PASSING PROGRAMS
Chyi-Ren
Dow, Cheng-Min Lin
Breakpoint
setting is one of the fundamental mechanisms for debugging programs; however,
the detection of breakpoints in distributed programs is more difficult than
in sequential programs. To identify program errors, the status of a
distributed program must be rolled back to its earliest global state after
detecting breakpoints. Breakpoints are considered as checkpoints in this work so
the techniques of finding minimum consistent global checkpoint can be applied to
find the earliest consistent global states. Four detection schemes for different
types of breakpoints are developed, including disjunctive,
stable conjunctive, generic
conjunctive, and unconditional
breakpoints. In order to reduce the checkpoint space, a typed checkpoint
prevention scheme and a causal garbage collection scheme are also presented.
Results obtained from a variety of experiments demonstrate that the combination
of the prevention and garbage collection techniques can reduce the checkpoint
space to a reasonable size.
INCREMENTAL
TRACING AND PROCESS ISOLATION FOR DEBUGGING
PARALLEL PROGRAMS
Dieter
Kranzlmüller
Testing and debugging parallel programs is often difficult and tedious since concurrently executing tasks may generate a multiplicity of bugs well-known from sequential programs. Additional malign effects established by communication and synchronization, like race conditions and deadlocks, further limit or even prohibit the use of purely sequential debuggers. Consequently, many different approaches for parallel program debugging have been developed, that exploit several instances of sequential debuggers enhanced by specialized techniques to overcome various parallel debugging obstacles. Such an approach is process isolation, which tries to enable debugging of selected processes with sequential debuggers while preserving the behavior of all the other processes in the system. The corresponding debugging strategy is based on a three phases record&replay mechanism, which performs incremental tracing to obtain the required debugging data. During the first phase, only the ordering of occurring events is traced to support correct re-execution of nondeterministic parallel programs. During the second phase, the amount of tracing is increased to obtain all data necessary to generate an event graph display, that describes the relations between events on concurrent processes. Based on this display of a program's behavior, the user selects the target for process isolation. The required data for sequential re-execution of this target process is then obtained during the third re-execution of a program. Afterwards, the user can execute the isolated process over and over again with a sequential debugger, without the need of executing any other processes than the isolated one.