how to use MPI for 2d and 1d ?

Forum

Forum
General C++ Programming
how to use MPI for 2d and 1d ?

how to use MPI for 2d and 1d ?

Hello everyone:

{[solved] = Thanks to lastchance} I would like to know how to use MPI in parallel processing. I would like to apply converting the 2d to 1d and vs.

Last edited on

HAKAN FRED (33)

I couldnt fit the code in one post and and i split it to 2 post.
First half of the code

1
2

The code will update and post here soon

Last edited on

lastchance (6980)

@HakanFred,

You are trying to write far too much in one go. Build it up slowly. Start with a domain simply partitioned into 2 and transfer the relevant buffers of data between processors 0 and 1. Then move on to processors in a 1-d chain 0:1:2:3...(comm_sz-1). This is easier than a 2-d domain decomposition. You haven't put any MPI send or receive calls in yet.

I don't think you should be using MPI_Allgather. The whole point of domain decomposition is that each processor's memory should only store its own part of the domain (plus halo data at its borders). The safest way of outputting it correctly at the end is to send it in sequential chunks to the root processor for output to file.

Some references:
The book "Guide to Scientific Computing in C++" by Pitt-Francis and Whiteley has a chapter on MPI in C++ (Chapter 11) and includes the classic halo cell example. Note that their book uses the C++ MPI bindings which, I think, have actually been removed in MPI3 leaving just the C and Fortran ones.

There's a basic, but surprisingly good introductory tutorial for MPI at:
http://mpitutorial.com/tutorials/

Here's a very basic example of communicating between two processors on my Windows PC. (Note the fascinating output order! It makes it look as if the length was received before it was sent! You can't guarantee the order of output on different processors and it will change every time that you run the program.) It sends a char array here; you will need a 1-d array of doubles for each line of halo cells. Also, to improve transfers with more processors in the chain you will probably want to use MPI_Sendrecv, rather than separate MPI_Send and MPI_Recv.

#include "mpi.h"
#include <iostream>
#include <cstring>
using namespace std;

int main( int argc, char* argv[] )
{
   const int MAXLEN = 256;
   int rank, size, tag = 0;
   MPI_Status stat;
   char buffer[MAXLEN];
   int len = 0;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   MPI_Comm_size( MPI_COMM_WORLD, &size );

   // Check that all processors are alive
   cout << "Processor " << rank << " of " << size << " is alive" << endl;

   MPI_Barrier( MPI_COMM_WORLD );

   // Send and receive some information; send the length first
   if ( rank == 0 )
   {  
      cout << "\nWhat message do you want to send? ";
      cin.getline( buffer, MAXLEN - 1 );

      len = strlen( buffer );

      MPI_Send( &len, 1, MPI_INT, 1, tag, MPI_COMM_WORLD );
      cout << "Processor " << rank << " sent length " << len << endl;

      tag++;
      MPI_Send( buffer, len, MPI_CHAR, 1, tag, MPI_COMM_WORLD );
      cout << "Processor " << rank << " sent message " << buffer << endl;
   }
   else if ( rank == 1 )
   {
      MPI_Recv( &len, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &stat );
      cout << "Processor " << rank << " received length " << len << endl;

      tag++;
      MPI_Recv( buffer, len, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &stat );
      buffer[len] = '\0';
      cout << "Processor " << rank << " received message " << buffer << endl;
   }

   MPI_Finalize();
}

C:\cprogs\parallel>"C:\Program Files\MPICH2\bin\mpiexec" -n 8 test1.exe 
Processor 2 of 8 is alive
Processor 0 of 8 is alive
Processor 5 of 8 is alive
Processor 3 of 8 is alive
Processor 1 of 8 is alive
Processor 4 of 8 is alive
Processor 6 of 8 is alive
Processor 7 of 8 is alive

What message do you want to send? Hello
Processor 1 received length 5
Processor 0 sent length 5
Processor 0 sent message Hello
Processor 1 received message Hello

If you want to broadcast to ALL processors then use MPI_Bcast - it will be more efficient than lots of sends and receives.

#include "mpi.h"
#include <iostream>
#include <cstring>
using namespace std;

int main( int argc, char* argv[] )
{
   const int MAXLEN = 256;
   int rank, size, tag = 0;
   MPI_Status stat;
   char buffer[MAXLEN];
   int len = 0;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   MPI_Comm_size( MPI_COMM_WORLD, &size );

   // Send to ALL processors
   if ( rank == 0 )
   {  
      cout << "\nWhat message do you want to send? ";
      cin.getline( buffer, MAXLEN - 1 );
      len = strlen( buffer );
   }

   MPI_Bcast( &len, 1, MPI_INT, 0, MPI_COMM_WORLD );
   MPI_Bcast( buffer, len, MPI_CHAR, 0, MPI_COMM_WORLD );
   buffer[len] = '\0';
   cout << "Processor " << rank << " received message " << buffer << endl;

   MPI_Finalize();
}

C:\cprogs\parallel>"C:\Program Files\MPICH2\bin\mpiexec" -n 8 test2.exe 

What message do you want to send? Hello @HakenFred
Processor 0 received message Hello @HakenFred
Processor 1 received message Hello @HakenFred
Processor 2 received message Hello @HakenFred
Processor 6 received message Hello @HakenFred
Processor 4 received message Hello @HakenFred
Processor 7 received message Hello @HakenFred
Processor 5 received message Hello @HakenFred
Processor 3 received message Hello @HakenFred

Last edited on

lastchance (6980)

@HakanFred,

Here's a diagram for a 24 x 5 grid split between 4 processors.
Split this as a 1-d chain. Each processor needs a (6+2) x 5 array of its own. Halo information that it will send is marked with an S, that which it will receive with an R.

For this case it will need to set up 5-element buffers for each of send-to-left, send-to-right, receive-from-left, receive-from-right halo regions. In the general case each will contain the number of elements on one side.

With such a chain, "left" would be rank-1 and "right" would be rank+1 for each processor except the end ones. You can post all send and receive calls on each side simultaneously using MPI_Sendrecv. The special cases of the end processors can be dealt with by replacing the sending or receiving rank with the special process name MPI_PROC_NULL.


                 |. . . . . .|. . . . . .|. . . . . .|. . . . . .|
                 |. . . . . .|. . . . . .|. . . . . .|. . . . . .|
                 |. . . . . .|. . . . . .|. . . . . .|. . . . . .|
                 |. . . . . .|. . . . . .|. . . . . .|. . . . . .|
                 |. . . . . .|. . . . . .|. . . . . .|. . . . . .|

Processor 0:     |. . . . . S|R
                 |. . . . . S|R
                 |. . . . . S|R
                 |. . . . . S|R
                 |. . . . . S|R

Processor 1:                R|S . . . . S|R
                            R|S . . . . S|R
                            R|S . . . . S|R
                            R|S . . . . S|R
                            R|S . . . . S|R

Processor 2:                            R|S . . . . S|R
                                        R|S . . . . S|R
                                        R|S . . . . S|R
                                        R|S . . . . S|R
                                        R|S . . . . S|R

Processor 3:                                        R|S . . . . .|
                                                    R|S . . . . .|
                                                    R|S . . . . .|
                                                    R|S . . . . .|
                                                    R|S . . . . .|

When you require output (as seldom as possible if using parallel processing) then send the subdomains to the root processor. It will deal with them in order provided that it posts the relevant receives in order.

If you wanted 2 directions of domain decomposition then you will need additional halo buffers for top and bottom of each subdomain. However, numbering of adjacent processors becomes more difficult, so I would sort out a 1-d chain first.

Here's an example using sendRecv.

#include "mpi.h"
#include <iostream>
using namespace std;

int main( int argc, char* argv[] )
{
   int rank, size;
   MPI_Status stat;

   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   MPI_Comm_size( MPI_COMM_WORLD, &size );

   // Messages (just send the rank number for now)
   int SL = rank;
   int SR = rank;

   // Receive buffers
   int RL, RR;

   // Adjacent processors to left and right
   int rankL = rank > 0        ? rank - 1 : MPI_PROC_NULL;
   int rankR = rank < size - 1 ? rank + 1 : MPI_PROC_NULL;

   // Use the rank of the sending processor as the tag
   MPI_Sendrecv( &SL, 1, MPI_INT, rankL, rank,             // sending to left
                 &RL, 1, MPI_INT, rankL, rankL,            // receiving from left 
                 MPI_COMM_WORLD, &stat );                  
   MPI_Sendrecv( &SR, 1, MPI_INT, rankR, rank,             // sending to right
                 &RR, 1, MPI_INT, rankR, rankR,            // receiving from right
                 MPI_COMM_WORLD, &stat );

   char p = '.';      // Just indicates a boundary

   if      ( rank == 0        ) cout << "Processor " << rank << " reports " << p  << " | " << rank << " | " << RR << endl;
   else if ( rank == size - 1 ) cout << "Processor " << rank << " reports " << RL << " | " << rank << " | " << p  << endl;
   else                         cout << "Processor " << rank << " reports " << RL << " | " << rank << " | " << RR << endl;

   MPI_Finalize();
}

C:\cprogs\parallel>"C:\Program Files\MPICH2\bin\mpiexec" -n 8 sendRecv.exe 
Processor 2 reports 1 | 2 | 3
Processor 0 reports . | 0 | 1
Processor 5 reports 4 | 5 | 6
Processor 1 reports 0 | 1 | 2
Processor 4 reports 3 | 4 | 5
Processor 7 reports 6 | 7 | .
Processor 3 reports 2 | 3 | 4
Processor 6 reports 5 | 6 | 7

Last edited on

HAKAN FRED (33)

@lastchance
Thank you so much for the better explanation of the MPI for me and now actually making sense to me.

Last edited on

HAKAN FRED (33)

1
2

    
The code will update soon

Last edited on

lastchance (6980)

@hakanfred

You will find it easier to partition your global array if (all) processors kept a record of the global i and j indices at the start and end of each block. Then you can index relative to these. This would be vital if blocks had different sizes.

If you want to change 2-d indices i,j to 1-d index n for packing then it depends whether you have halo cells round the edges or not. If you have NO halo cells then it is the usual
n = i*width + j
If you need to allow for possible halo cells (which are NOT included in the pack) then it will be
n = (i-istart)*halo_width + (j-jstart)
where istart may be 0 or 1 depending on whether you have halo cells to the left or not.
Your packed array size should be width*height, where these are dimensions without any halo cells.

There are other oddities about your code (which is very complex - why not break it down into small functions?).
- You have commented out the lines that do the actual temperature update. Also, these lines ought to be using halo_width and halo_height, not width and height.
- Are you not transferring halo values between adjacent blocks? Please look at my diagram in my previous post. I would expect to see MPI_Sendrecv. It is much less efficient to use MPI_Gatherv and MPI_Scatterv. These should be used only at start and end when you need file output. Apart from a single gathering of data for output, you should only transfer small amounts of data along block boundaries between processors, not whole arrays. Inter-processor communication is extremely slow compared with that in local memory: you should be seeking to keep the former to an absolute minimum.
- Each block needs to be made higher and wider to accommodate halo cells at its borders. You are calculating halo_width and halo_height, but you aren't using them.
- Why not make each slice an object? This is c++ after all.

Finally, consider doing the whole calculation with a 1-d array. It is extremely complicated to use new and delete with multidimensional arrays. Imagine if you were doing this in 3-d rather than 2-d!

Last edited on

lastchance (6980)

@hakenfred
This is quite a long way from what your supervisor wants you to submit, but you may get some ideas from it, particularly for the packing/unpacking routines and the MPI sends and receives.
Note that:
- this uses 1-d arrays throughout, not multidimensional arrays; how you index this is up to you; my orientation diagram is

 j
/|\
 |
 |
 |
 +------->i

which reflects the orientation on paper, not the storage order in an array.

The subdomain partitioning for 4 processors is

2 | 3
--+--
0 | 1

The packing order for MPI_Gatherv and MPI_Scatterv is just { 0, 1, 2, 3 }.

- EDITED - Now changed to use MPI_scatterv for initial distribution from root and MPI_Gatherv for final collection at root
- MPI_Sendrecv is used for the halo transfers
- each subdomain is put in a class (for convenience)

The code is too long for a single post - I have had to split it between posts

First part of code

#include "mpi.h"
#include <iostream>
#include <iomanip>
#include <vector>
#include <cmath>   // Edited
using namespace std;


//=====================================================================


// Class declarations and function prototypes

class Subdomain
{
public:
   int ni, nj, dim;                         // initial dimensions without halo data
   int hni, hnj, hdim;                      // doimensions with halo data
   bool haloW, haloE, haloS, haloN;         // adjacent halo data?
   int istart, iend, jstart, jend;          // start and end indices for non-halo data
   int rank, rankW, rankE, rankS, rankN;    // local processors
   vector<double> A;                        // 1-d array of all data (including halo)

   double *Wsend, *Wrecv, *Esend, *Erecv, *Ssend, *Srecv, *Nsend, *Nrecv, *pdata;  // buffers for halo data

   Subdomain( int w, int h, int r, int rw, int re, int rs, int rn ); // constructor
   ~Subdomain();                                                     // destructor
   int index( int i, int j );                                        // returns 1-d index
   void halo();                                                      // carries out halo exchanges
   void packHalo( char side, double arr[] );                         // pack halo data for SEND
   void unpackHalo( char side, double arr[] );                       // unpack halo data for RECEIVE
   void pack( double arr[] );                                        // pack non-halo data (whole array)
   void unpack( double arr[] );                                      // extract non-halo data (whole array)
   double iterate();                                                 // equation solver (returns change)
};

void initialise( double global[], int NI, int NJ );                  // initialise global matrix
void output( ostream &strm, double global[], int NI, int NJ );       // output global matrix


//=====================================================================


int main( int argc, char* argv[] )
{
   int rank, size;
   int root = 0;
   int proc;
   const double TOLERANCE = 1.0e-10;
   const int MAX_ITERATIONS = 1000;
   double *global, *buffer;            // will hold global matrix and transfer buffer (ROOT ONLY)

   const int NI = 16;                  // domain sizes (plenty of other ways of setting these)
   const int NJ = 16;


   //=======================
   // Initialise MPI system
   //=======================
   MPI_Init( &argc, &argv );
   MPI_Comm_size( MPI_COMM_WORLD, &size );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   bool isRoot = ( rank == root );


   //===========
   // Partition
   //===========
   int *istart = new int[size];
   int *iend   = new int[size];
   int *jstart = new int[size];
   int *jend   = new int[size];
   // Split into subdomains (assumes size is either 1 or a multiple of 2)
   int PX = size / 2;   if ( PX == 0 ) PX = 1;
   int PY = size / PX;
   int isize = ( NI + 0.5 ) / PX;      // estimate subdomain width  (correct at ends)
   int jsize = ( NJ + 0.5 ) / PY;   
   // Set start and end of each subdomain
   proc = 0;
   for ( int j = 0; j < PY; j++ )
   {
      for ( int i = 0; i < PX; i++ )
      {
         istart[proc] = i * isize;
         iend  [proc] = ( i + 1 ) * isize - 1;   if ( i == PX - 1 ) iend[proc] = NI - 1;
         jstart[proc] = j * jsize;
         jend  [proc] = ( j + 1 ) * jsize - 1;   if ( j == PY - 1 ) jend[proc] = NJ - 1;
         proc++;
      }
   }
   // Set up arrays counts[] and displs[] for MPI_Scatterv and MPI_Gatherv
   int *counts = new int[size];
   int *displs = new int[size];
   displs[0] = 0;
   for ( proc = 0; proc < size; proc++ )
   {
      if ( proc > 0 ) displs[proc] = displs[proc-1] + counts[proc-1];
      counts[proc] = ( iend[proc] - istart[proc] + 1 ) * ( jend[proc] - jstart[proc] + 1 );
   }


   //=============================
   // Subdomain for this procesor
   //=============================
   int ni = iend[rank] - istart[rank] + 1;
   int nj = jend[rank] - jstart[rank] + 1;
   int rw = rank - 1;    if ( istart[rank] == 0      ) rw = MPI_PROC_NULL;
   int re = rank + 1;    if ( iend  [rank] == NI - 1 ) re = MPI_PROC_NULL;
   int rs = rank - PX;   if ( jstart[rank] == 0      ) rs = MPI_PROC_NULL;
   int rn = rank + PX;   if ( jend  [rank] == NJ - 1 ) rn = MPI_PROC_NULL;
   Subdomain me( ni, nj, rank, rw, re, rs, rn );


   //============
   // Initialise
   //============
   if ( isRoot )
   {
      // Root processor sets whole matrix
      buffer = new double[NI*NJ];
      global = new double[NI*NJ];
      initialise( global, NI, NJ );
   }

   // Pack on root processor
   if ( isRoot )
   {
      int n = 0;
      for ( proc = 0; proc < size; proc++ )
      {
         for ( int j = jstart[proc]; j <= jend[proc]; j++ )
         {
            for ( int i = istart[proc]; i <= iend[proc]; i++ ) buffer[n++] = global[j*NI+i];
         }
      }
   }

   // Distribute to all processors
   MPI_Scatterv( buffer, counts, displs, MPI_DOUBLE, me.pdata, me.dim, MPI_DOUBLE, root, MPI_COMM_WORLD );

   // Unpack
   me.unpack( me.pdata );


   //=======
   // Solve
   //=======
   double residual = 1.0;
   int iterations = 1;
   while ( residual > TOLERANCE && iterations <= MAX_ITERATIONS )
   {
      // Exchange halo data
      me.halo();

      // One iteration
      double change = me.iterate();

      // Get global residual
      MPI_Allreduce( &change, &residual, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD );
      iterations++;
   }


   //========
   // Gather
   //========
   // Pack data on current processor
   me.pack( me.pdata );

   // Gather to root
   MPI_Gatherv( me.pdata, me.dim, MPI_DOUBLE, buffer, counts, displs, MPI_DOUBLE, root, MPI_COMM_WORLD );

   // Unpack on root processor
   if ( isRoot ) 
   {
      int n = 0;
      for ( int proc = 0; proc < size; proc++ )
      {
         for ( int j = jstart[proc]; j <= jend[proc]; j++ )
         {
            for ( int i = istart[proc]; i <= iend[proc]; i++ ) global[j*NI+i] = buffer[n++];
         }
      }
   }


   //========
   // Output
   //========
   if ( isRoot ) 
   {
      cout << "Domain: " << NI << " x " << NJ << endl;
      cout << "Processors: " << PX << " x " << PY << endl;
      cout << "Iterations: " << iterations << endl;
      cout << "Residual:   " << residual << "\n\n";
      output( cout, global, NI, NJ );
      delete [] global;
      delete [] buffer;
   }


   // Tidy up
   delete [] istart;
   delete [] iend  ;
   delete [] jstart;
   delete [] jend  ;
   delete [] counts;
   delete [] displs;

   MPI_Finalize();
}


//======================================================================

Last edited on

lastchance (6980)

Second part of code

void initialise( double global[], int NI, int NJ )
{
   double TW = 1.0, TE = -1.0, TS = 0.0, TN = 0.0;         // Boundary values
   int i, j, n;

   // Initialise domain
   for ( n = 0; n < NI * NJ; n++ ) global[n] = 0.0;

   // Set boundaries
   i = 0;
   for ( j = 0; j < NJ; j++ ) global[j * NI + i] = TW;

   i = NI - 1;
   for ( j = 0; j < NJ; j++ ) global[j * NI + i] = TE;

   j = 0;
   for ( i = 0; i < NI; i++ ) global[j * NI + i] = TS;

   j = NJ - 1;
   for ( i = 0; i < NI; i++ ) global[j * NI + i] = TN;
}


//======================================================================


void output( ostream &strm, double global[], int NI, int NJ )
{
   strm.setf( ios::fixed );
   strm.precision( 4 );
   const int WID = 8;

   for ( int j = NJ - 1; j >= 0; j-- )           // Edited 01/11/2017 for consistent index notation
   {
      for ( int i = 0; i < NI; i++ )  
      {
         int n = j * NI + i;
         cout << setw( WID ) << global[n] << " ";
      }
      cout << '\n';
   }
}


//======================================================================


Subdomain::Subdomain( int w, int h, int r, int rw, int re, int rs, int rn )
   : ni( w ), nj( h ), rank( r ), rankW( rw ), rankE( re ), rankS( rs ), rankN( rn )
{ 
   dim = ni * nj;

   haloW = ( rankW != MPI_PROC_NULL );
   haloE = ( rankE != MPI_PROC_NULL );
   haloS = ( rankS != MPI_PROC_NULL );
   haloN = ( rankN != MPI_PROC_NULL );

   hni  = ni + haloW + haloE;
   hnj  = nj + haloS + haloN;
   hdim = hni * hnj;

   istart = 0 + haloW;
   iend   = istart + ni - 1;
   jstart = 0 + haloS;
   jend   = jstart + nj - 1;

   A = vector<double>( hdim );

   Wsend = new double[nj ];
   Wrecv = new double[nj ];
   Esend = new double[nj ];
   Erecv = new double[nj ];
   Ssend = new double[ni ];
   Srecv = new double[ni ];
   Nsend = new double[ni ];
   Nrecv = new double[ni ];
   pdata = new double[dim];
}


Subdomain::~Subdomain()
{ 
   delete [] Wsend;
   delete [] Wrecv;
   delete [] Esend;
   delete [] Erecv;
   delete [] Ssend;
   delete [] Srecv;
   delete [] Nsend;
   delete [] Nrecv;
   delete [] pdata;
}


int Subdomain::index( int i, int j ) 
{ 
   return j * hni + i;
}


void Subdomain::halo()
{ 
   MPI_Status stat;

   if ( haloW ) packHalo( 'W', Wsend );
   if ( haloE ) packHalo( 'E', Esend );
   if ( haloS ) packHalo( 'S', Ssend );
   if ( haloN ) packHalo( 'N', Nsend );

   MPI_Sendrecv( Wsend, nj, MPI_DOUBLE, rankW, rank, Wrecv, nj, MPI_DOUBLE, rankW, rankW, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Esend, nj, MPI_DOUBLE, rankE, rank, Erecv, nj, MPI_DOUBLE, rankE, rankE, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Ssend, ni, MPI_DOUBLE, rankS, rank, Srecv, ni, MPI_DOUBLE, rankS, rankS, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Nsend, ni, MPI_DOUBLE, rankN, rank, Nrecv, ni, MPI_DOUBLE, rankN, rankN, MPI_COMM_WORLD, &stat );

   if ( haloW ) unpackHalo( 'W', Wrecv );
   if ( haloE ) unpackHalo( 'E', Erecv );
   if ( haloS ) unpackHalo( 'S', Srecv );
   if ( haloN ) unpackHalo( 'N', Nrecv );
}


void Subdomain::packHalo( char side, double arr[] )
{
   int p = 0;
   switch( side )
   {
      case 'W':
         for ( int j = jstart; j <= jend; j++ ) arr[p++] = A[index(istart,j)];
         break;
      case 'E':
         for ( int j = jstart; j <= jend; j++ ) arr[p++] = A[index(iend,j)];
         break;
      case 'S':
         for ( int i = istart; i <= iend; i++ ) arr[p++] = A[index(i,jstart)];
         break;
      case 'N':
         for ( int i = istart; i <= iend; i++ ) arr[p++] = A[index(i,jend)];
         break;
   }
}


void Subdomain::unpackHalo( char side, double arr[] )
{
   int p = 0;
   switch( side )
   {
      case 'W':
         for ( int j = jstart; j <= jend; j++ ) A[index(0,j)] = arr[p++];
         break;
      case 'E':
         for ( int j = jstart; j <= jend; j++ ) A[index(hni-1,j)] = arr[p++];
         break;
      case 'S':
         for ( int i = istart; i <= iend; i++ ) A[index(i,0)] = arr[p++];
         break;
      case 'N':
         for ( int i = istart; i <= iend; i++ ) A[index(i,hnj-1)] = arr[p++];
         break;
   }
}


void Subdomain::pack( double arr[] )
{
   int p = 0;
   for ( int j = jstart; j <= jend; j++ )
   {
      for ( int i = istart; i <= iend; i++ ) arr[p++] = A[index(i,j)];
   }
}


void Subdomain::unpack( double arr[] )
{
   int p = 0;
   for ( int j = jstart; j <= jend; j++ )
   {
      for ( int i = istart; i <= iend; i++ ) A[index(i,j)] = arr[p++];
   }
}


double Subdomain::iterate()
{
   double change = 0.0;
   int istep = 1;
   int jstep = hni;

   for ( int j = 1; j < hnj - 1; j++ )
   {
      for ( int i = 1; i < hni - 1; i++ )
      {
         int n = index(i,j);
         double temp = A[n];
         A[n] = ( A[n+istep] + A[n-istep] + A[n+jstep] + A[n-jstep] ) / 4.0;             // Basic Laplacian
         change += abs( A[n] - temp );
      }
   }
   return change;
}

C:\>mpiexec -n 4 partition.exe
Domain: 16 x 16
Processors: 2 x 2
Iterations: 566
Residual:   9.79635e-11

  0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000 
  1.0000   0.4902   0.2827   0.1797   0.1183   0.0758   0.0426   0.0138  -0.0138  -0.0426  -0.0758  -0.1183  -0.1797  -0.2827  -0.4902  -1.0000 
  1.0000   0.6782   0.4609   0.3176   0.2178   0.1425   0.0809   0.0262  -0.0262  -0.0809  -0.1425  -0.2178  -0.3176  -0.4609  -0.6782  -1.0000 
  1.0000   0.7616   0.5651   0.4123   0.2926   0.1954   0.1121   0.0366  -0.0366  -0.1121  -0.1954  -0.2926  -0.4123  -0.5651  -0.7616  -1.0000 
  1.0000   0.8031   0.6257   0.4737   0.3450   0.2342   0.1358   0.0445  -0.0445  -0.1358  -0.2342  -0.3450  -0.4737  -0.6257  -0.8031  -1.0000 
  1.0000   0.8251   0.6607   0.5120   0.3794   0.2608   0.1523   0.0501  -0.0501  -0.1523  -0.2608  -0.3794  -0.5120  -0.6607  -0.8251  -1.0000 
  1.0000   0.8367   0.6800   0.5340   0.4000   0.2770   0.1626   0.0536  -0.0536  -0.1626  -0.2770  -0.4000  -0.5340  -0.6800  -0.8367  -1.0000 
  1.0000   0.8418   0.6886   0.5441   0.4096   0.2847   0.1675   0.0553  -0.0553  -0.1675  -0.2847  -0.4096  -0.5441  -0.6886  -0.8418  -1.0000 
  1.0000   0.8418   0.6886   0.5441   0.4096   0.2847   0.1675   0.0553  -0.0553  -0.1675  -0.2847  -0.4096  -0.5441  -0.6886  -0.8418  -1.0000 
  1.0000   0.8367   0.6800   0.5340   0.4000   0.2770   0.1626   0.0536  -0.0536  -0.1626  -0.2770  -0.4000  -0.5340  -0.6800  -0.8367  -1.0000 
  1.0000   0.8251   0.6607   0.5120   0.3794   0.2608   0.1523   0.0501  -0.0501  -0.1523  -0.2608  -0.3794  -0.5120  -0.6607  -0.8251  -1.0000 
  1.0000   0.8031   0.6257   0.4737   0.3450   0.2342   0.1358   0.0445  -0.0445  -0.1358  -0.2342  -0.3450  -0.4737  -0.6257  -0.8031  -1.0000 
  1.0000   0.7616   0.5651   0.4123   0.2926   0.1954   0.1121   0.0366  -0.0366  -0.1121  -0.1954  -0.2926  -0.4123  -0.5651  -0.7616  -1.0000 
  1.0000   0.6782   0.4609   0.3176   0.2178   0.1425   0.0809   0.0262  -0.0262  -0.0809  -0.1425  -0.2178  -0.3176  -0.4609  -0.6782  -1.0000 
  1.0000   0.4902   0.2827   0.1797   0.1183   0.0758   0.0426   0.0138  -0.0138  -0.0426  -0.0758  -0.1183  -0.1797  -0.2827  -0.4902  -1.0000 
  0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000

Last edited on

HAKAN FRED (33)

@lastchance

Thanks a lot for providing such a descriptive code to me. You are right i am not going to use more 1D or 2D array.
I have gone through your code to understand what is going and i would like to know if there is any possibility that you don't use the class in your code because i am kinda of a beginner in C++ and don not have enough background in classes.
Alo, can we write: if (my_rank == 0), instead of writing: if ( isRoot ) , because i would like to know how to specify my master process and worker process.
I also added your codes from 2 post as a single file to compile and run in mac terminal but when i am compiling with : "mpicxx mpi_code.cxx -o mpi_code" or "mpicxx -std=c++11 mpi_code.cxx -o mpi_code" , it gave me an error as:

mpi_code.cxx:49:16: warning: unused variable 'stat' [-Wunused-variable]
MPI_Status stat;
^
mpi_code.cxx:412:23: error: call to 'abs' is ambiguous
change += abs( A[n] - temp );
^~~
/usr/include/stdlib.h:129:6: note: candidate function
int abs(int) __pure2;
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/cstdlib:159:44: note: candidate function
inline _LIBCPP_INLINE_VISIBILITY long abs( long __x) _NOEXCEPT {return labs(__x);}
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/cstdlib:161:44: note: candidate function
inline _LIBCPP_INLINE_VISIBILITY long long abs(long long __x) _NOEXCEPT {return llabs(__x);}
^
1 warning and 1 error generated.

Would you please let me know what i am doing wrong when i am compiling. I tried to include cstdlb library, but didn't change anything.

Thanks for help.

Last edited on

lastchance (6980)

Hello @Hakan Fred

can we write: if (my_rank == 0), instead of writing: if ( isRoot )

Sure, it's exactly the same thing here; it's just that I prefer to make it a bit more obvious what I'm doing. (I'm also using rank for the current processor number, rather than my_rank). Incidentally, I'm only marking out processor 0 for input/output: I don't regard it as a true master-slave relationship as the root processor does just as much work as all the other processors during the main solve routine.

It's just a warning, not an error, about
MPI_Status stat;
in int main(). I had previously been using MPI_Send and MPI_Recv for transferring data between other processors and root, and this was simply left over from that. Feel free to remove it in main(), but leave it in the routine that does the halo exchange as it's needed by MPI_Sendrecv.

For abs() my compiler must have let me get away with it. Perhaps one of the other included headers pulled it in. Just add
#include <cmath>
as it is using double variables.

Sorry, but I'm not going back to rewrite it without a class. I hope that it is still straightforward to see what is going on. You need to consider whether any of the ideas are any use for your own code. The class isn't brilliant - it uses new and delete and I haven't got round to writing proper copy constructors and copy assignments yet; maybe I'll just use the data buffer of a vector in future and circumvent the problem.

MPI_scatterv and MPI_gatherv aren't my preferred methods for transferring to the "master" (root) node as they necessitate this processor having to use potentially quite a lot of memory. However, if you are set on using them then there are some useful summaries at
https://www.open-mpi.org/doc/v1.10/man3/MPI_Gatherv.3.php
https://www.open-mpi.org/doc/v1.10/man3/MPI_Scatterv.3.php
I hope that you can at least see how I have packed and unpacked the data in my sample code.

Nice to see somebody else using MPI! It has made a monumental difference to some of my work codes (in Fortran). The speed-up is vast when you have access to some really big computational facilities.

Last edited on

HAKAN FRED (33)

@lastchance

Thanks for quick reply.
It Ok, i can try to understand the class. I really appreciate that you including useful reading links and book. It helps me to increase my knowledge in MPI programming.

Now i am successfully compiled it by adding the library, but i couldn't execute it for myisde.
Would please correct me if i am executing wrong: mpirun -np 4 mpi_code

wen i am executing it look like this:

dhcp-134-129-216-89:Desktop HAKAN$ mpirun -np 2 ./mpi_code
[dhcp-134-129-216-89:98057] *** An error occurred in MPI_Sendrecv
[dhcp-134-129-216-89:98057] *** reported by process [2627141633,4607182418800017408]
[dhcp-134-129-216-89:98057] *** on communicator MPI_COMM_WORLD
[dhcp-134-129-216-89:98057] *** MPI_ERR_TAG: invalid tag
[dhcp-134-129-216-89:98057] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dhcp-134-129-216-89:98057] *** and potentially your MPI job)

I really would like to see the results in my end.

Thanks

Last edited on

lastchance (6980)

@hakan fred,

If you can compile it successfully as you say then
mpirun -np 4 <name of executable>
should probably run it with 4 processors. Does that work for your other programs? I don't know what system you are using. Does it give any error messages?

I tested it with mpich using a Windows PC. I'll have a look at a different (unix) system when I get to work tomorrow.

lastchance (6980)

OK, just seen your update on errors.
Your system appears not to like the tags chosen in MPI_Sendrecv. I've just set them to the sending processor, although at the receive end this could be MPI_PROC_NULL, and so negative. For each of the four MPI_Sendrecv lines you might be able to get away with just setting the tags as a constant integer (sorry, not in a position to try this at present). Try setting the 5th and 10th arguments of each of those calls to 10. (That's replacing rank in position 5 and rankW/E/S/N in position 10).

HAKAN FRED (33)

Thanks for reply:

I tested and added those comment and still give me a error:
dhcp-134-129-216-89:Desktop HAKAN$ mpirun -np 4 ./test
[dhcp-134-129-216-89:99859] *** An error occurred in MPI_Sendrecv
[dhcp-134-129-216-89:99859] *** reported by process [1703739393,4607182418800017408]
[dhcp-134-129-216-89:99859] *** on communicator MPI_COMM_WORLD
[dhcp-134-129-216-89:99859] *** MPI_ERR_TAG: invalid tag
[dhcp-134-129-216-89:99859] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dhcp-134-129-216-89:99859] *** and potentially your MPI job)
[dhcp-134-129-216-89.cs.edu:99858] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[dhcp-134-129-216-89.cs.edu:99858] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Thanks

Last edited on

lastchance (6980)

@hakan fred,

Can you try changing routine void Subdomain::halo() to

void Subdomain::halo()
{ 
   MPI_Status stat;

   int tags = rank;
   int tagr = MPI_ANY_TAG;

   if ( haloW ) packHalo( 'W', Wsend );
   if ( haloE ) packHalo( 'E', Esend );
   if ( haloS ) packHalo( 'S', Ssend );
   if ( haloN ) packHalo( 'N', Nsend );

   MPI_Sendrecv( Wsend, nj, MPI_DOUBLE, rankW, tags, Wrecv, nj, MPI_DOUBLE, rankW, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Esend, nj, MPI_DOUBLE, rankE, tags, Erecv, nj, MPI_DOUBLE, rankE, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Ssend, ni, MPI_DOUBLE, rankS, tags, Srecv, ni, MPI_DOUBLE, rankS, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( Nsend, ni, MPI_DOUBLE, rankN, tags, Nrecv, ni, MPI_DOUBLE, rankN, tagr, MPI_COMM_WORLD, &stat );

   if ( haloW ) unpackHalo( 'W', Wrecv );
   if ( haloE ) unpackHalo( 'E', Erecv );
   if ( haloS ) unpackHalo( 'S', Srecv );
   if ( haloN ) unpackHalo( 'N', Nrecv );
}

I have now tested this on both my own PC and on a large unix cluster and both this and the original worked fine (provided I added the #include <cmath> as already noted - I've edited the original code to include this). There is no way that any tags can be out of range now; one is a wildcard and the other is processor rank. Try re-downloading my original code (I've edited the bits that prevented you compiling earlier), then updating the function above.

Your run command looks fine: it's what I used on the unix system.

If this still doesn't work (at compile or run stage) then please provide details of all error messages, plus details of your compiler and MPI implementation.

Last edited on

HAKAN FRED (33)

@lastchance

Thanks for updates.
Yes. now by adding the tags and tagr into the halo function, its execute well. I have thinking about this MPI setting up and i was thinking if i want to use evenly divisible dimension, why i am not using MPI_Gather instead of MPI_Gatherv and MPI_Scatterv to make my job easier. Do i need to change alot in MPI set up in your code or? I know that i have asked you alot of question, but i will be more than happy if you help me in this idea too.

Thanks alot.
Hakan

Last edited on

lastchance (6980)

#include "mpi.h"
#include <iostream>
#include <iomanip>
#include <vector>
#include <cmath>
using namespace std;

struct Partition
{
   int istart, iend, jstart, jend;
   int num;
};

class Subdomain
{
public:
   int ni, nj, dim;                           // Dimensions without halo
   int hni, hnj, hdim;                        // Dimensions with halo
   bool haloW, haloE, haloS, haloN;           // Halo?
   int istart, iend, jstart, jend;            // Index limits non-halo
   int rank, rankW, rankE, rankS, rankN;      // Local processors
   vector<double> A;                          // 1-d data array
   MPI_Datatype typeData, typeBase, typeSide; // Packing types

   Subdomain( int w, int h, int r, int rw, int re, int rs, int rn ); // Constructor
   int index( int i, int j );
   void halo();
   double iterate();
};

void initialise( double global[], int NI, int NJ );             // Initialise
void output( ostream &strm, double global[], int NI, int NJ );  // Output

//========

int main( int argc, char* argv[] )
{
   int proc;
   const double TOLERANCE = 1.0e-10;
   const int MAX_ITERATIONS = 10000;
   double *global, *buffer;  // Global matrix and buffer (root only)

   int NI = 16;              
   int NJ = 16;

   // Initialise MPI
   int rank, size;
   MPI_Init( &argc, &argv );
   MPI_Comm_size( MPI_COMM_WORLD, &size );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   int root = 0;
   bool isRoot = ( rank == root );


   // Partition
   int PX = size / 2;   if ( PX == 0 ) PX = 1;
   int PY = size / PX;
   int isize = ( NI + 0.5 ) / PX;   // Estimate width (correct at end)
   int jsize = ( NJ + 0.5 ) / PY;

   Partition *part = new Partition[size];   
   proc = 0;
   for ( int j = 0; j < PY; j++ )
   {
      for ( int i = 0; i < PX; i++ )
      {
         part[proc].istart = i * isize;
         part[proc].iend   = ( i + 1 ) * isize - 1;   if ( i == PX - 1 ) part[proc].iend = NI - 1;
         part[proc].jstart = j * jsize;
         part[proc].jend   = ( j + 1 ) * jsize - 1;   if ( j == PY - 1 ) part[proc].jend = NJ - 1;
         part[proc].num = ( part[proc].iend - part[proc].istart + 1 ) * ( part[proc].jend - part[proc].jstart + 1 );
         proc++;
      }
   }

   int ni = part[rank].iend - part[rank].istart + 1;
   int nj = part[rank].jend - part[rank].jstart + 1;
   int rw = rank - 1;    if ( part[rank].istart == 0      ) rw = MPI_PROC_NULL;
   int re = rank + 1;    if ( part[rank].iend   == NI - 1 ) re = MPI_PROC_NULL;
   int rs = rank - PX;   if ( part[rank].jstart == 0      ) rs = MPI_PROC_NULL;
   int rn = rank + PX;   if ( part[rank].jend   == NJ - 1 ) rn = MPI_PROC_NULL;
   Subdomain me( ni, nj, rank, rw, re, rs, rn );


   // Initialise
   if ( isRoot )
   {
      global = new double[NI*NJ];
      initialise( global, NI, NJ );

      buffer = new double[NI*NJ];
      int n = 0;
      for ( proc = 0; proc < size; proc++ )
      {
         for ( int j = part[proc].jstart; j <= part[proc].jend; j++ )
         {
            for ( int i = part[proc].istart; i <= part[proc].iend; i++ ) buffer[n++] = global[j*NI+i];
         }
      }
   }

   int *counts = new int[size];
   int *displs = new int[size];
   displs[0] = 0;
   for ( proc = 0; proc < size; proc++ )
   {
      if ( proc > 0 ) displs[proc] = displs[proc-1] + part[proc-1].num;
      counts[proc] = part[proc].num;
   }

   // Scatter
   int offset = me.index( me.istart, me.jstart );
   MPI_Scatterv( buffer, counts, displs, MPI_DOUBLE, me.A.data() + offset, 1, me.typeData, root, MPI_COMM_WORLD );


   // Solve
   double residual = 1.0;
   int iterations = 0;
   while ( residual > TOLERANCE && iterations < MAX_ITERATIONS )
   {
      me.halo();                       // Exchange halo data
      double change = me.iterate();    // One iteration
      MPI_Allreduce( &change, &residual, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD );
      iterations++;
   }


   // Output
   MPI_Gatherv( me.A.data() + offset, 1, me.typeData, buffer, counts, displs, MPI_DOUBLE, root, MPI_COMM_WORLD );
   if ( isRoot ) 
   {
      int n = 0;
      for ( int proc = 0; proc < size; proc++ )
      {
         for ( int j = part[proc].jstart; j <= part[proc].jend; j++ )
         {
            for ( int i = part[proc].istart; i <= part[proc].iend; i++ ) global[j*NI+i] = buffer[n++];
         }
      }

      cout << "Domain: " << NI << " x " << NJ << endl;
      cout << "Processors: " << PX << " x " << PY << endl;
      cout << "Iterations: " << iterations << endl;
      cout << "Residual:   " << residual << "\n\n";
      output( cout, global, NI, NJ );
      delete [] global;
      delete [] buffer;
   }

   delete [] part;
   delete [] counts;
   delete [] displs;

   // Free MPI_Datatypes before MPI_Finalize
   MPI_Type_free( &me.typeData );
   MPI_Type_free( &me.typeBase );
   MPI_Type_free( &me.typeSide );
   MPI_Finalize();
}

//========

void initialise( double global[], int NI, int NJ )
{
   double TW = 1.0, TE = -1.0, TS = 0.0, TN = 0.0; 
   int i, j, n;

   // Domain
   for ( n = 0; n < NI * NJ; n++ ) global[n] = 0.0;

   // Boundary
   i = 0;
   for ( j = 0; j < NJ; j++ ) global[j * NI + i] = TW;
   i = NI - 1;
   for ( j = 0; j < NJ; j++ ) global[j * NI + i] = TE;
   j = 0;
   for ( i = 0; i < NI; i++ ) global[j * NI + i] = TS;
   j = NJ - 1;
   for ( i = 0; i < NI; i++ ) global[j * NI + i] = TN;
}

//========

void output( ostream &strm, double global[], int NI, int NJ )
{
   strm.setf( ios::fixed );
   strm.precision( 4 );
   int WID = 8;

   for ( int j = NJ - 1; j >= 0; j-- )
   {
      for ( int i = 0; i < NI; i++ ) cout << setw( WID ) << global[j * NI + i] << " ";
      cout << '\n';
   }
}

//========

Subdomain::Subdomain( int w, int h, int r, int rw, int re, int rs, int rn )
   : ni( w ), nj( h ), rank( r ), rankW( rw ), rankE( re ), rankS( rs ), rankN( rn )
{ 
   dim = ni * nj;

   haloW = ( rankW != MPI_PROC_NULL );
   haloE = ( rankE != MPI_PROC_NULL );
   haloS = ( rankS != MPI_PROC_NULL );
   haloN = ( rankN != MPI_PROC_NULL );

   hni = ni + haloW + haloE;
   hnj = nj + haloS + haloN;
   hdim = hni * hnj;

   istart = 0 + haloW;   iend = istart + ni - 1;
   jstart = 0 + haloS;   jend = jstart + nj - 1;

   A = vector<double>( hdim );

   // Types
   MPI_Datatype typeTemp;

   MPI_Type_vector( nj, ni, hni, MPI_DOUBLE, &typeTemp );  // Main data (ni x nj)
   MPI_Type_create_resized( typeTemp, 0, sizeof( double ), &typeData );
   MPI_Type_commit( &typeData );

   MPI_Type_vector( nj,  1, hni, MPI_DOUBLE, &typeTemp );  // Sides ( 1 x nj)
   MPI_Type_create_resized( typeTemp, 0, sizeof( double ), &typeSide );
   MPI_Type_commit( &typeSide );

   MPI_Type_vector(  1, ni,   1, MPI_DOUBLE, &typeTemp );  // Base & Top (ni x 1)
   MPI_Type_create_resized( typeTemp, 0, sizeof( double ), &typeBase );
   MPI_Type_commit( &typeBase );
}

int Subdomain::index( int i, int j ) 
{ 
   return j * hni + i;
}

void Subdomain::halo()
{ 
   MPI_Status stat;
   int tags = rank;
   int tagr = MPI_ANY_TAG;

   MPI_Sendrecv( A.data()+index(istart,jstart), 1, typeSide, rankW, tags, A.data()+index(0     ,jstart), 1, typeSide, rankW, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( A.data()+index(iend  ,jstart), 1, typeSide, rankE, tags, A.data()+index(hni-1 ,jstart), 1, typeSide, rankE, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( A.data()+index(istart,jstart), 1, typeBase, rankS, tags, A.data()+index(istart,0     ), 1, typeBase, rankS, tagr, MPI_COMM_WORLD, &stat );
   MPI_Sendrecv( A.data()+index(istart,jend  ), 1, typeBase, rankN, tags, A.data()+index(istart,hnj-1 ), 1, typeBase, rankN, tagr, MPI_COMM_WORLD, &stat );
}

double Subdomain::iterate()
{
   double change = 0.0;
   int istep = 1;
   int jstep = hni;

   for ( int j = 1; j < hnj - 1; j++ )
   {
      for ( int i = 1; i < hni - 1; i++ )
      {
         int n = index( i, j );
         double temp = A[n];
         A[n] = ( A[n+istep] + A[n-istep] + A[n+jstep] + A[n-jstep] ) / 4.0;
         change += abs( A[n] - temp );
      }
   }
   return change;
}

Last edited on

HAKAN FRED (33)

Sure.

Last edited on

HAKAN FRED (33)

@lastchance
Thanks alot.

Last edited on

Topic archived. No new replies allowed.