In the first phase, AIDA* performs a breadth-first search to
expand the same search node frontier on every processor.
The breadth-first search is stopped when a sufficient number of nodes is generated
to give every processor its own fine-grained work packets
for further asynchronous exploration.
In practice, this is not done in a single step,
but an intermediate search frontier is stored on every processor,
so that the more fine-grained work packets can be generated
asynchronously on all processors (for details see [16]).
At the end of the initial work distribution phase,
each processor `owns' work packets for the 15-puzzle,
when being run on 1024 processors.
Figure 6: Time of first phase (15-puzzle)
The time spent in the first phase depends on the node expansion time and the
system size, because enough nodes must be generated to keep every processor
busy thereafter.
In the 15-puzzle, three seconds are spent in the first phase on a 1024-node system
[16], resulting in a 40% overhead
of the total elapsed time for the small class problems
(Fig. 6) on the 1024-node system.
Clearly, AIDA* - and more generally -
our fixed-packet DFS, is ill suited for running small problem instances.
But many other methods
would not work either because they suffer the same initial work
distribution problem.