How to speedup performance of Guassian Elimination Program using Threads

Hi,
I am trying to run Guass Elimination program using 8, 4, 2, 1 threads but I am not getting good results. i.e 8 thread result is slower than the single thread result.

/gauss2 10 8 2 (Dimension =10, threads = 8): Elapsed time = 0.886 ms.
/gauss2 10 4 2 (Dimension = 10, thread = 4): Elapsed time = 0.461 ms.
/gauss2 10 2 2 (Dimension = 10,thread = 2): Elapsed time = 0.304 ms.
/gauss2 10 1 2 (Dimension = 10, thread = 1): Elapsed time = 0.291 ms.

I am using Pthread and applying locks on my matrices A, B, X at the time of writing.A, B & X are global variables.

Some body please guide me how to improve the performance of threaded version.

I have to use pthreads but I can also use Barriers but I dont have any knowledge about it.


Zulfi.
> I am using Pthread and applying locks on my matrices A, B, X at the time of writing.A, B & X are global variables.

If you just take a single-threaded program and just throw in threads and a bunch of global locks for good measure, then yes, you're always going to be worse off.

You need to have a plan when it comes to threads.
Specifically, you need to understand what parts of each matrix are exclusively accessed by a single thread, and what parts are accessed concurrently by multiple threads.

An example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>

#define SIZE 4
#define NTHREADS 4

struct matParams {
    int top;
    int left;
    int width;
    int height;
    int (*mat)[SIZE];
};

void *worker(void *p) {
    struct matParams *params = (struct matParams*)p;
    for ( int r = 0 ; r < params->height ; r++ ) {
        for ( int c = 0 ; c < params->width ; c++ ) {
            printf("%d ", params->mat[r+params->top][c+params->left] );
        }
        printf("\n");
    }
}

int main() {
    int mat[SIZE][SIZE] = {
        { 0, 0, 1, 1, },
        { 0, 0, 1, 1, },
        { 2, 2, 3, 3, },
        { 2, 2, 3, 3, },
    };

    // define the dimensions of the sub-matrix to
    // work on
    struct matParams params[NTHREADS] = {
        { 0, 0, 2, 2, mat },
        { 0, 2, 2, 2, mat },
        { 2, 0, 2, 2, mat },
        { 2, 2, 2, 2, mat },
    };

    printf("Sanity check single threaded\n");
    for ( int m = 0 ; m < NTHREADS ; m++ ) {
        worker(&params[m]);
    }

    printf("Show time\n");
    pthread_t   ids[NTHREADS];
    for ( int m = 0 ; m < NTHREADS ; m++ ) {
        pthread_create(&ids[m],NULL,worker,&params[m]);
    }

    // wait for all the threads to finish
    for ( int m = 0 ; m < NTHREADS ; m++ ) {
        pthread_join(ids[m],NULL);
    }

    return 0;
}

Each quadrant is exclusively accessed by a single thread, so there is no need for any locks.

But maybe you have seams between the quadrants.
At which point,
- do you consider locks between adjacent quadrants?
- or do you wait for all the threads to finish, then fix things in main?




Hi,
I can recognize the 4 matrices in mat[Size][Size] but I can't understand the struct matParams. I am actually working on Guass Elimination.

Do you have some working mathematical example.

Thanks for your time.

Zulfi.
I've no idea what your algorithm is, or how you've chosen to implement parallelism, whether it's even suitable for being made parallel, or even if it's the best place to start in making a parallel implementation.

https://www.google.com/search?q=gaussian+elimination+parallel+algorithm

> but I can't understand the struct matParams.
It's how I chose to partition a 4x4 matrix into 4 separate 2x2 matrices that could be worked on in parallel.

If you're using pthreads, then you only have a single void* parameter to a thread. So if you want to pass more parameters, the typical way is to use a pointer to a structure.

You can make that structure contain whatever you want.
Hi,

Thanks for your message.

I am able to understand param[THREAD]. It is an array of structure.

I have got a mathematical implementation also. Just I have to verify if Gauss Elimination works in this way or not.

However, I am developing a code based upon what you said:

<Specifically, you need to understand what parts of each matrix are exclusively accessed by a single thread, and what parts are accessed concurrently by multiple threads>

My teacher showed me this. I have to write the code and let you know.

Thanks for your cooperation.

God bless you.
Zulfi.
zak100 wrote:
God bless you

jesus f'ing christ!
Hi,
I have following Guassian Alg:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (norm = 0; norm < N - 1; norm++) {
    for (row = norm + 1; row < N; row++) {
      multiplier = A[row][norm] / A[norm][norm];
      for (col = norm; col < N; col++) {
	
         A[row][col] -= A[norm][col] * multiplier;
        
      }
    
      B[row] -= B[norm] * multiplier;
     
    }
  }


I parallelized it using pthreads and lock but then I tried to parallelize it using your comments:

<Specifically, you need to understand what parts of each matrix are exclusively accessed by a single thread, and what parts are accessed concurrently by multiple threads>

I thought that I have two loops and I can run the inner loop in parallel with needing locks. So i put the inner loop in a thread function and passed the values of row & norm as an argument using a void pointer like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void *gaussEP(void *otherLIV){
   int norm, row, col;  /* Normalization row, and zeroing
			* element row and col */
  float multiplier;
  /* LIV = Loop Initializing Variables */
  
  printf("Computing Serially.\n");
  multiplier = A[(int *)otherLIV[0]][(int *)otherLIV[1]]/A[(int *)otherLIV[1]][(int *) otherLIV[1]];
  for (col = (int *) otherLIV[1]; col < N; col++) {
	
         A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
        
      }
      
      B[(int *)otherLIV[0]] -= B[(int *) otherLIV[1]] * multiplier;
      
    }


but I have three problems:

i) I don't know if the above lf the above logic is correct or not.

Some body please guide me if the above logic is correct or not.

2) I don't know how to put the two outer 'for' loops.

3) I am getting compilation errors;


gauss3.c: In function ‘gaussEP’:
gauss3.c:265:34: warning: dereferencing ‘void *’ pointer
multiplier = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
^
gauss3.c:265:34: error: void value not ignored as it ought to be
multiplier = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
~~~~~~~~^~~
gauss3.c:265:55: warning: dereferencing ‘void *’ pointer
multiplier = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
^
gauss3.c:265:55: error: void value not ignored as it ought to be
multiplier = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
~~~~~~~~^~~
gauss3.c:265:78: warning: dereferencing ‘void *’ pointer
er = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
^
gauss3.c:265:78: error: void value not ignored as it ought to be
er = A[(int *)*otherLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
~~~~~~~~^~~
gauss3.c:265:100: warning: dereferencing ‘void *’ pointer
therLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
^
gauss3.c:265:100: error: void value not ignored as it ought to be
therLIV[0]][(int *)*otherLIV[1]]/A[(int *)*otherLIV[1]][(int *) *otherLIV[1]];
~~~~~~~~^~~
gauss3.c:266:30: warning: dereferencing ‘void *’ pointer
for (col = (int *) otherLIV[1]; col < N; col++) {
^
gauss3.c:266:14: error: invalid use of void expression
for (col = (int *) otherLIV[1]; col < N; col++) {
^
gauss3.c:268:28: warning: dereferencing ‘void *’ pointer
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:268:12: error: invalid use of void expression
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:268:49: warning: dereferencing ‘void *’ pointer
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:268:33: error: invalid use of void expression
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:268:75: warning: dereferencing ‘void *’ pointer
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:268:59: error: invalid use of void expression
A[(int *) otherLIV[0]][(int *) otherLIV[1]] -= A[(int *) otherLIV[1]][col] * multiplier;
^
gauss3.c:272:24: warning: dereferencing ‘void *’ pointer
B[(int *)otherLIV[0]] -= B[(int *) otherLIV[1]] * multiplier;
^
gauss3.c:272:9: error: invalid use of void expression
B[(int *)otherLIV[0]] -= B[(int *) otherLIV[1]] * multiplier;
^
gauss3.c:272:50: warning: dereferencing ‘void *’ pointer
B[(int *)otherLIV[0]] -= B[(int *) otherLIV[1]] * multiplier;
^
gauss3.c:272:34: error: invalid use of void expression
B[(int *)otherLIV[0]] -= B[(int *) otherLIV[1]] * multiplier;



The part of code to create threads and invoke the thread function is given below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
otherLIV = malloc(sizeof(int) * (N*N));
  /* Gaussian Elimination */
      tids = malloc(sizeof(pthread_t) * nt);
	if (tids == NULL) {
		printf("Error : could not init the tids\n");
		return -1;
	}
  /* create threads */

       row =0; norm =0;
       otherLIV[0] = row;
       otherLIV[1] = norm;
	for (i = 0; i < nt; i++) {
		if (pthread_create(&tids[i], NULL, &gaussEP, &otherLIV) != 0) {
			printf("Error : pthread_create failed on spawning thread %d\n", i);
			return -1;
		}
       }
        /* join threads */
	for (i = 0; i < nt; i++) {
		if (pthread_join(tids[i], &res) != 0) {
			printf("Error : pthread_join failed on joining thread %d\n", i);
			return -1;
		}
	}


Some variables declared as local are given below:

1
2
3
pthread_t *tids = NULL;
  int *otherLIV= NULL;
  void *res = NULL;


Some body please guide me.

Zulfi.



Last edited on
Forget about all that malloc and gaussEP error ridden code you crufted. It isn't fixable, and won't work.

Whilst threads would work "in theory", there is an enormous overhead in starting and stopping threads.

Your initial experiments with OMP in another thread are going to be the way to go.

Start with some exercises.
https://computing.llnl.gov/tutorials/openMP/exercise.html


1
2
3
4
5
6
7
8
9
10
11
// This loop cannot be made parallel, look at the norm+1 inter-row dependency.
for (norm = 0; norm < N - 1; norm++) {
    // this loop looks like it could be made parallel.
    for (row = norm + 1; row < N; row++) {
      multiplier = A[row][norm] / A[norm][norm];
      for (col = norm; col < N; col++) {
         A[row][col] -= A[norm][col] * multiplier;
      }
      B[row] -= B[norm] * multiplier;
    }
  }


http://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-loop.html
Scroll down to "Program the LU factorization algorithm without pivoting. "

I found this as well.
https://github.com/gmendonca/gaussian-elimination-pthreads-openmp
Registered users can post here. Sign in or register to post.