Speedups

Next: Initial Work Distribution Up: Empirical Results Previous: Empirical Results

Speedups

Figure 3: Speedup on a ring (15-puzzle)

Figure 3 shows speedup graphs of the 15-puzzle run on a unidirectional ring with up to 256 processors. We selected the ring topology for our experiments, because due to the large communication diameter, it is difficult to achieve good speedups. Algorithms exhibiting good performance on a relatively small ring are likely to run efficiently on much larger networks with a small diameter (torus, hypercube, etc.). As can be seen, this is true for AIDA*, which outperforms PIDA*, especially on the larger networks.

Taking Korf's [6] 100 random 15-puzzle problem instances and dividing them into four classes , we run the 50 hardest problems of the classes and on our transputer system. Speedup anomalies are eliminated by normalizing the CPU time results to the number of node expansions. The data given is the average of the 25 single speedups achieved in the corresponding problem class.

In both algorithms, PIDA* and AIDA*, work requests are forwarded along the ring until a processor has work to share. The donor then selects an unsearched node, deletes it from the stack (for PIDA*) or frontier node array (for AIDA*) and sends it in the opposite direction to the requester. When no processor has work to share, the original work-request makes a full round through the ring, indicating that no work is available.

Figure 4: Work-load balancing in fixed-packet DFS

On a torus network, the work-packets are sent first along the horizontal and then along the vertical rings, see Fig. 4. This results in a widespreaded distribution of the local work-load among the whole system. Each processor consideres a different subset of processors. So the maximum travel-length of the work-packets doesn't increase lineary with system-size.

Figure 5: AIDA* speedup on torus topologies

Figure 5 shows the AIDA* performance on torus topologies of up to processors. On this system, a class problem takes only 7 seconds to solve, while the larger problem instances take 43 seconds on the average.

The topmost graph in Figure 5 illustrates AIDA*'s performance on the larger 19-puzzle. Thirteen problem instances have been generated and run on tori with an average CPU-time of 30 minutes on the 1024-node system.

Next: Initial Work Distribution Up: Empirical Results Previous: Empirical Results

Volker Schnecke
Fri Dec 16 11:20:32 MET 1994