Distributed Algorithms

Table of Contents

Fault tolerance

A process may crash – unexpectedly stop executing events.

Assume network is complete – there’s a bidirectional channel between any two processes. So process crashes never make remaining network disconnected. Assume crashing of processes can’t be observed.

Consensus

Binary consensus

Assumptions:

No algorithm for 1-crash consensus always terminates.

b-potent set S of processes: if by only executing events at processes in S, some process in S can decide b

No Las Vegas algorithm for k ≥ N/2 consensus

Bracha-Toueg crash consensus algorithm

Let k < N/2.

Let k < N/2. This is a Las Vegas algorithm that terminates with probability 1.

Failure detection

Failure detector at process tracks which process have (or may have) crashed. Given an upper bound on network latency and heartbeat messages, one can implement a failure detector. For this setting, terminating crash consensus algorithms exist.

Assume time domain with total order.

H(p, t) is set of processes that p suspects to be crashed at time t (“failure detector history”)

Require that failure detectors are complete: from some time onward, each crashed process is suspected by each alive process.

“strongly accurate” failure detector: if only crashed processes are ever suspected

“weakly accurate”: if some process is never suspected by any process

“eventually strongly accurate”: if from some time onward, only crashed processes are suspected

“eventually weakly accurate”: if from some time onward, some process is never suspected