How to parallelize the given code

Hi,
I have converted the GE algorithm given in the book. Now I want to parallelize it using MPI. Can somebody please guide me how to do that. My serial version is given below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
void GaussianElimination(double **A, double *b,double *y) {
     cout<<"Inside Gaussian 1";
     for(int k=0; k<=n-1; ++k) {
        cout<<"k =" <<k<<endl;
        for(int j= k+1; j <=n-1; ++j){
           cout<<"A["<<k<<"]["<<k<<"] =" <<A[k][k] <<endl;
           A[k][j] = A[k][j]/A[k][k]; 
         }
      
       
        y[k] = b[k]/A[k][k];
        A[k][k]=1;
        cout<<"Reached";
        for(int i=k+1; i<=n-1; i++){
           for(int j=k+1; j<=n-1; j++)
              A[i][j] = A[i][j] -A[i][k] *A[k][j];
           b[i]= b[i] -A[i][k] * y[k];
           A[i][k] = 0;
        }
     }
     cout<<"answers"<<endl;
     for(int i=0; i<=n-1; i++){
        for(int j=0;j<=n-1;j++)
        cout<<A[i][j]<<" ";
        cout<<endl;
     }
}        


Some body please guide me.

Zulfi.
Well, Gaussian elimination works by a succession of row operations (that is why you should make sure you understand the algorithm), so I suggest that you distribute whole rows amongst the processors.

At each step the processor with the pivot row broadcasts its row (MPI_Bcast) to all other processors which do the appropriate row operations on their own rows to eliminate the element in the pivot column.

To try and achieve load balancing, distribute the rows cyclically between processors. e.g if you had 4 processors, then
processor 0 should deal with rows 0, 4, 8, ...
processor 1 should deal with rows 1, 5, 9, ...
processor 2 should deal with rows 2, 6, 10, ...
processor 3 should deal with rows 3, 7, 11, ...
The modulo operation (%) should sort this out for you.

Partial pivoting - which you should do to avoid both algorithm failure and massive round-off problems for ill-conditioned matrices - will require some reduction operation to get the maximum absolute value in a given column (see MPI_Allreduce) and some swapping of rows, potentially rows on different processors (see MPI_Sendrecv).

I suggest that you do the row operations in the elimination stage with the augmented matrix (i.e. the RHS b[] is appended as the last column of matrix A[][]).
Last edited on
Hi,
Is there any written algorithm?

Zulfi.
Hi,
Thanks. I saw that article before but never realized that it talks about parallel GE algorithm.

Zulfi.
Registered users can post here. Sign in or register to post.