For enhancement the back-propagation algorithm has been parallelized in different manners: First the training set can be partitioned for the batch learning implementation. The neural network is duplicated on every processor of the parallel machine, and each processor works with a subset of the training set. After each epoch of training the training results are broadcasted and merged.
A second approach is the parallel calculation of the matrix products that are used in the learning algorithm. The neurons on each layer are partitioned into p disjoint sets and each set is mapped on one of the p processors. The new activations are distributed after each training pair. We have implemented this on-line training in two variants: For the first parallelization one matrix product is not determined on one processor, but it is calculated while the subsums are sent around on the processor cycle. The second method tries to reduce communication. Therefor it leads to an overhead in both storage and number of computational operations.
The parallel implementations take place on parallel PARSYTEC systems: a T800 Multicluster and a PowerXplorer. The parallelizations run both with the runtime envirionments PARIX and PVM/PARIX.