Among the various approaches for parallel depth-first search (DFS),
the stack-splitting schemes are most popular.
However, as shown in this paper,
dynamical stack-splitting is not suitable for
massively parallel systems with several hundred processors.
Initial work-load imbalances and work packets of dissimilar sizes
cause a high communication overhead.
We compare work-load balancing strategies of two depth-first searches
and propose a scheme that uses fine-grained fixed-sized work packets.
In its iterative-deepening variant (named AIDA*)
the global workload distribution improves from one iteration to the next.
As a consequence, the communication overhead
decreases with increasing search time!