Parallel Simulations of Flow

	Institute of Computer Science
	Parallel Simulations of Flow

The introduction of the computer into engineering techniques has resulted in the growth of a completely new field termed computational fluid dynamics (CFD). This field has led to the development of new mathematical methods for solving the equations of fluid mechanics. These improved methods have permitted advanced simulations of flow phenomena on the computer. As such calculations are very time consuming and inherently parallel the use of multiprocessor systems for this task seems to be a very natural idea.

An explicit finite element scheme based on a two step Taylor-Galerkin algorithm allows the solution of the Euler and Navier-Stokes Equations for a wide variety of flow problems. To obtain useful results for realistic problems one has to use grids with an extremely high density to get a good resolution of the interesting parts of a given flow. Since these details are often limited to small regions of the calculation domain, it is efficient to use unstructured grids to reduce the number of elements and grid points.

One problem arising when implementing parallel algorithms is the lack of standards both on the hardware and software side. As things like processor topology, parallel operating system, programming languages, etc. have a much greater influence on parallel than on sequential algorithms, one has to choose an environment where it is possible to get results which can be generalized to a larger set of other environments.

We think that future supercomputers will be massively parallel systems of the MIMD class with distributed memory and strong communication capabilities. The hardware platforms which are availiable for us - Transputer systems of different sizes - fulfill these demands.

We think the model for parallelism should be message passing integrated into standard programming languages. So we implement our parallel algorithms in C using the Parix programming environment, which supplies a very flexible and comfortable interprocessor communication library. This programming model seems to be very close to a message passing standard which has to be set up in the near future.

The principle idea for the parallelization of a grid based algorithm is a division of this grid into parts which can be calculated separately on different processors. In our case this is possible, because we use a time stepping algorithm where in every time step the calculations for one element only depend on the values of elements which have nodes in common with this element. All inner nodes of a separate part of the grid can be calculated independently from other parts. The border nodes of every part must then be corrected after each time step using interprocessor communication.

To get an efficient program with this kind of parallelization, which is known as data decomposition, we have to choose a logical topology for the processor network, which is suitable for the given problem. Our research has shown that for the two-dimensional grids we are using a 2D-mesh is a good choice.

The main problem to solve is the implementation of a decomposition algorithm which reads in the global data structures, calculates a division of the grid and distributes the corresponding local data sets to the appropriate processor of the parallel system. Such a division often requires interprocessor connections which are not supplied by the basic logical topology. These connections are built dynamically with the so called virtual links of Parix.

The essential part of the whole program is the division algorithm which determinates the quality of the resulting distribution. This algorithm has to take different facts into consideration to achieve efficient parallel calculations. First it must ensure that all processors have nearly equal calculation times, because idle processors slow down the speedup. To achieve this we first distribute the elements as evenly as possible and then minimize the overhead caused by double calculated nodes and resulting communications.

Another point which has to be taken into account is the time consumed by the division algorithm itself which must be considerably less than that of the actual flow calculation. Therefore we cannot calculate an optimal solution for this NP-complete problem, but have to develop a good heuristic.

This task is mostly sequential and as the program has to deal with large global data sets we decided to map this process onto a workstation outside the Transputer system. Since nowadays such a workstation is much faster than a single Transputer, this is no patchedup solution, but the performance of the whole calculation even increases.

As a result of the achieved fast calculations we were able to implement an on-line visualization of the flows. The ability of interaction between the user and the parallel algorithm was achieved by implementing this on-line-graphic as a X-application running on a workstation connected via pipes to the Transputer system.

Summarizing the above one can say, that the use of modern parallel computers fulfill the demands of number crunching problems in the field of computational fluid dynamics and should be a new alternative in the future of supercomputing in CFD.

More information: [ Practical Computer Science | Computer Science Group | University of Osnabrück ]