Next: Parallel Implementations on Up: Portability versus Efficiency?

Parallel Depth-First Search

Depth-first search can be performed in parallel by partitioning the search space into many small, disjunct parts (subtrees) that can be explored concurrently. We have developed a scheme called search-frontier splitting [18] that can be applied to breadth-first-, depth-first- and best-first search. It partitions the search space into g subtrees that are taken from a `search-frontier' containing nodes n with the same cost value . Search-frontier splitting has two phases:

Initially, all processors generate (synchronously) the same `search-frontier' and store the nodes in their local memories. A `search frontier' is a level in the tree where all nodes have the same cost-value. Each node represents an indivisible piece of work for the next phase. Hence, on a p-processor system, the search frontier should be chosen large enough, so that it contains at least p nodes.
In the asynchronous search phase, each processor selects a disjunct set of frontier nodes and expands them in depth-first fashion. When a processor becomes idle, it requests a work packet (=unprocessed frontier node) from another processor. Work requests are forwarded from one processor to the next until one of them has work to share. When there are no work packets left, the search space is exhausted and the search is terminated. When a processor finds a solution, all others are informed by a broadcast.

Search-frontier splitting is a general scheme. It can be employed in depth-first branch-and-bound (DFBB) and in iterative-deepening depth-first search (IDA*). In DFBB, the search continues until all nodes have been either explored or pruned due to inferiority. Newly established bounds are communicated to all processors to improve the local bounds as soon as possible.

In the iterative-deepening variant, AIDA* [18], subsequent iterations are started on the previously established frontier node arrays. Work packets change ownership when they are sent to other processors, thereby improving the global load-balance over the iterations. Lightly loaded processors, which have asked for work in the last iteration will be better utilized in the next. As a consequence, the communication overhead decreases during the search process [18].

Next: Parallel Implementations on Up: Portability versus Efficiency?

Previous: N x N--Puzzle

WWW-Administration
Tue May 16 19:29:30 MET DST 1995