To reduce communication time between the processors we have also implemented the idea of Yoon, Nang and Maeng [8]. For each parallel calculated activation of a neuron its receptive and projective weights are stored on the responsible processor. Figure 9 shows the distribution of the neurons and the weight matrices among three processors.
In each backward step one processor updates the weights of its projective and receptive neurons. So we have to store the weight matrices twice with the same overhead in calculation. The advantage of this parallelization is that we can do the p-1 communications for the broadcast without interruptions after the error calculation is finished.