Algorithms for testing fault-tolerance of sequenced jobs
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Previously Published Works bannerUC Riverside

Algorithms for testing fault-tolerance of sequenced jobs

Abstract

We study the problem of testing whether a given set of sequenced jobs can tolerate transient faults. We present efficient algorithms for this problem in several fault models. A fault model describes what types of faults are allowed and specifies assumptions on their frequency. Two types of faults are considered: hidden faults, that can only be detected after a job completes, and exposed faults, that can be detected immediately. First, we give an O(n)-time fault-tolerance testing algorithm, for both exposed and hidden faults, if the number of faults does not exceed a given parameter k. Then we consider the model in which any two faults are separated in time by a gap of length at least Δ, where Δ is at least twice the maximum job length. For exposed faults, we give an O(n)-time algorithm. For hidden faults, we give an algorithm with running time O(n 2), and we prove that if job lengths are distributed uniformly over an interval [0,p max ], then this algorithm’s expected running time is O(n). Our experimental study shows that this linear-time performance extends to other distributions. Finally, we provide evidence that improving the worst-case performance may not be possible, by proving an Ω(n 2) lower bound, in the algebraic computation tree model, on a slight generalization of this problem.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View