Phase 3: Asynchronous Search with Dynamic Load Balancing

Next: Implementation Details Up: AIDA* Previous: Phase 2: Generating

Phase 3: Asynchronous Search with Dynamic Load Balancing

The following iterations start on the frontier nodes . All processors expand the nodes of their local array up to the current search-threshold. Since the size of the subtrees emanating from the nodes is not known a priori, dynamic load balancing is required.

Our implementation of AIDA* employs a simple task attraction scheme. The processing elements are connected in a torus topology (i.e. a mesh with wrap-around links in the rows and columns). Each processor is a member of two rings with elements: the horizontal and the vertical ring.

A processor first expands its local frontier nodes of level . When running out of work, it sends a work_request in clockwise order along the horizontal link of the torus. The first processor with unexpanded frontier nodes in its array sends a work packet back to the requester. If none of the processors on the horizontal ring has work to share, the request continues its path along the ring and eventually returns to the requester, indicating that the current iteration run out of work on this horizontal ring of the torus. The requester now sends a work request along the vertical ring using the same mechanism. If again no processor responds with a work packet, an out_of_work message is sent on both rings and this processor starts the next iteration. We call this a weak synchronization - as opposed to a hard barrier synchronization. It keeps all processors working at about the same iteration, while not requiring too much idle time [Newborn, 1988]. In practice, our weak synchronization works much like a majority consensus approach. When searching for a first solution, care must be taken that all processors working on shallower iterations finish their search before returning the optimal solution.

Note, that any work package is exclusively owned by a single processor. Whenever a package is transferred to another processor, it changes ownership. This is done with the expectation that all subtrees grow at about the same rate from one iteration to the next. Hence, the load balance will automatically improve during the search. In fact, the number of work packets decreases with increasing search time (cf., Fig. 7).

Next: Implementation Details Up: AIDA* Previous: Phase 2: Generating

Volker Schnecke
Mon Dec 19 17:27:56 MET 1994