Volume 19, 2000, No. 6


TESTING AND DEBUGGING OF DISTRIBUTED SOFTWARE

José C. Cunha, Henryk Krawczyk 

This paper introduces the topic of testing and debugging of distributed software in this special issue of the Computers and Artificial Intelligence Journal.  A global picture is given of the problems involved in developing distributed applications in order to  motivate the need for testing and debugging activities. The main issues and approaches of testing and debugging are surveyed, the  focus being on the identification of current and future trends. We  conclude by introducing the papers which were selected for this special issue.

 

EXECUTION REPLAY AND DEBUGGING OF DISTRIBUTED MULTI-THREADED PARALLEL PROGRAMS

Jacques Chassin de Kergommeaux, Michiel Ronsse,

Koen De Bosschere 

Clusters of shared-memory symmetric multiprocessors are  increasingly used for high performance computing. To exploit in a convenient way both the inner parallelism of nodes and the parallelism between nodes, programming models for communicating threads are being developed. However, most of these models result in programs exhibiting non-deterministic behavior. This makes cyclic debugging of programs impossible, unless an efficient execution replay system can be provided. This article describes such an execution replay system for distributed thread programming combining synchronization primitives for threads sharing the same node, with communication primitives for threads of different nodes.  The execution replay system combines the most efficient trace size reduction technique for shared memory, based on the use of logical clocks, with a very efficient compression technique for trace data that originates from the test functions used in non-blocking communications.

 

TESTABLE ENVIRONMENTS OF DISTRIBUTED OBJECTS

Bogdan Wiszniewski

 This paper addresses the issue of remote testing and diagnosis of distributed objects and their services by introducing the concept of a Testable Environment of Distributed Objects (TEDO). A project is currently underway which involves development of TEDO based on fault-tolerant extensions of the  Common Object Request Broker Architecture (CORBA). A major part of this effort is a development of test suite supporting remote object connectivity and dynamic testbed configuration. Of particular importance are the increased performance costs incurred by the extended object functionality, as well as interoperability issues arising from heterogeneity of programming environments and their related commercial tools.

  

ADAPTIVE DISTRIBUTED BREAKPOINT DETECTION AND CHECKPOINT SPACE REDUCTION IN MESSAGE PASSING PROGRAMS

Chyi-Ren Dow, Cheng-Min Lin 

Breakpoint setting is one of the fundamental mechanisms for debugging programs; however, the detection of breakpoints in distributed programs is more difficult than  in sequential programs. To identify program errors, the status of a distributed program must be rolled back to its earliest global state after detecting breakpoints. Breakpoints are considered as checkpoints in this work so the techniques of finding minimum consistent global checkpoint can be applied to find the earliest consistent global states. Four detection schemes for different types of breakpoints are developed, including disjunctive,  stable conjunctive,  generic conjunctive, and  unconditional breakpoints. In order to reduce the checkpoint space, a typed checkpoint prevention scheme and a causal garbage collection scheme are also presented. Results obtained from a variety of experiments demonstrate that the combination of the prevention and garbage collection techniques can reduce the checkpoint space to a reasonable size.

  

INCREMENTAL TRACING AND PROCESS ISOLATION FOR  DEBUGGING PARALLEL PROGRAMS

 Dieter Kranzlmüller 

Testing and debugging parallel programs is often difficult and tedious since concurrently executing tasks may generate a multiplicity of bugs well-known from sequential programs. Additional malign effects established by communication and synchronization, like race conditions and deadlocks, further limit or even prohibit the use of purely sequential debuggers. Consequently, many different approaches for parallel program debugging have been developed, that exploit several instances of sequential debuggers enhanced by specialized techniques to overcome various parallel debugging obstacles. Such an approach is process isolation, which tries to enable debugging of selected processes with sequential debuggers while preserving the behavior of all the other processes in the system. The corresponding debugging strategy is based on a three phases record&replay mechanism, which performs incremental tracing to obtain the required debugging data. During the first phase, only the ordering of occurring events is traced to support correct re-execution of nondeterministic parallel programs. During the second phase, the amount of tracing is increased to obtain all data necessary to generate an event graph display, that describes the relations between events on concurrent processes. Based on this display of a program's behavior, the user selects the target for process isolation. The required data for sequential re-execution of this target process  is then obtained during the third re-execution of a program. Afterwards, the user can execute the isolated process over and over again with a sequential debugger, without the need of executing any other  processes than the isolated one.


Go To Contents