Matrix transpose

I store matrix entries in std::vector<double>, such that the reading is row by row.
This means, for matrix
1 3 4 8 9 3
3 6 8 1 1 2
2 0 9 8 7 6

the std::vector<double> would have entries: {1,3,4,8,9,3,3,6,8,1,1,2,2,0,9,8,7,6}.
To transpose the matrix I use the naive approach of iterating through std::vector, calculating for each
entry its position in a matrix transpose. Recently, I got one suggestion:

"When you have a matrix and a mapping to a one-dimensional vector, transposing is equivalent to a permutation. Now execute this permutation by sorting the elements by requested rank in the permutation result."

I understand that transposing is equivalent to a permutation, but I dont know how to approach the problem this way.
Any suggestion on how to do this (or to do matrix transpose is a faster way) is welcome.
Please see std::next_permutation
http://www.sgi.com/tech/stl/next_permutation.html
Last edited on
I dont think it would help if I go through permutation checking for the right one (If I know which is the right one, I would not need to invoke permutations). Note that the naive algorithm works (going through each entry and calculate its positions), but I wonder whether there is a faster approach (not necessarily related to permutations)
I did this by copying the given vector then copying the elements back to the given vector, but in the the transposed order. Since vectors are new to me I did it using dynamic arrays too. The results using both methods are output. This version does not prompt for array values. The values 10, 11, 12, ... are provided for testing. Is this what you are trying to do?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
// Display 2D matrix from 1D array
// Transpose elements of array so that transpose of matrix is produced.

#include <iostream>// for use of cin, cout
#include <vector>
using namespace std;

// using integer arrays
void Transpose(double* psrc, int& R, int& C)
{
	double* pcopy = new double[R*C];
	for(int j=0; j<R*C; j++)
		pcopy[j] = psrc[j];// copy source
	// overwrite source from copy
	int r=0, c=0, tidx=0;
	for(int j=0; j<R*C; j++)
	{
		r = j/C;
		c = j%C;
		tidx = c*R + r;
		psrc[tidx] = pcopy[j];
	}
	delete [] pcopy;
		
	return;
}
void ArrayOut(double* pi, int& R, int& C)
{
	for(int r=0; r< R; r++)
	{		
		for(int c=0; c< C; c++)
			cout << pi[r*C+c] << " ";
		cout << endl;
	}
	return;
}
// using vectors
void TransposeV(vector<double>& psrc, int& R, int& C)
{
	vector<double> pcopy;
	pcopy.resize(R*C);
	
	for(int j=0; j<R*C; j++)
		pcopy[j] = psrc[j];// copy source
	// overwrite source from copy
	int r=0, c=0, tidx=0;
	for(int j=0; j<R*C; j++)
	{
		r = j/C;
		c = j%C;
		tidx = c*R + r;
		psrc[tidx] = pcopy[j];
	}
			
	return;
}
void VectorOut(vector<double>& pi, int& R, int& C)
{
	for(int r=0; r< R; r++)
	{		
		for(int c=0; c< C; c++)
			cout << pi[r*C+c] << " ";
		cout << endl;
	}
	return;
}

int main()
{		
	int R=1, C=1;	
    
	cout << "Enter rows: ";
	cin >> R;
	cout << "Enter cols: ";
	cin >> C;
	
	cout << "using vectors" << endl << endl;
	vector<double> vdbl;
	vdbl.resize(R*C);
	for(int j=0; j< R*C; j++)
		vdbl[j] = 10+(double)j;// elements assigned sequential values for now
	VectorOut(vdbl,R,C);// show as R rows, C columns
	TransposeV(vdbl,R,C);// transpose elements
	cout << endl;
	VectorOut(vdbl,C,R);// show as C rows, R columns
	cout << endl;
	
	cout << "using dynamic arrays" << endl << endl;
	double* p_dbl = new double[R*C];
	for(int j=0; j< R*C; j++)
		p_dbl[j] = 10+(double)j;// elements assigned here
	ArrayOut(p_dbl,R,C);
	Transpose(p_dbl,R,C);
	cout << endl;
	ArrayOut(p_dbl,C,R);
	delete [] p_dbl;

	system("PAUSE");
	return 0;
}
The answer you got at DaniWeb is the most useful. Doing a proper permutation would entail a little more work than you may want... and won't gain you any efficiency.
How about this? If the matrix is hidden inside a class, you can make transposing an O(1) operation by swapping the internal values used to index the vector, and the width and height.
For example:
1
2
3
4
5
6
7
8
9
10
11
12
void matrix::transpose(){
    this->transposed=!this->transposed;
    std::swap(this->w,this->h);
}

double matrix::index(unsigned x,unsigned y){
    return this->data[(!this->transposed)?(y*this->w+x):(x*this->w+y)];
    //OR
    if (this->transposed)
        std::swap(x,y);
    return this->data[x+y*this->w];
}
Last edited on
...except that wouldn't work, because you must also re-order the data in the vector.
Hm? For a transpose? Why? Is that a requirement the OP mentioned or am I missing something?
Last edited on
helios wrote:
How about this? If the matrix is hidden inside a class, you can make transposing an O(1) operation by swapping the internal values used to index the vector, and the width and height.

But then you are making and extra operation every time you access an element.
If you access the cells a lot more than you transpose your matrix (general case), then there is no improvement at all (if the cells contain heavy objects the swap could be by pointers)


onako wrote:
"When you have a matrix and a mapping to a one-dimensional vector, transposing is equivalent to a permutation. Now execute this permutation by sorting the elements by requested rank in the permutation result."

You're actually doing that in your naive approach. Both are O(row*col).
I think that to not touch diagonal members will be insignificant.
But then you are making and extra operation every time you access an element.
Only for the std::swap() version. The other one takes only a single check longer, since one way or another you have to do one multiplication and one addition.

If you access the cells a lot more than you transpose your matrix (general case), then there is no improvement at all
What do you mean "no improvement"? The algorithm is much simpler, and its minute cost is refunded with interests when transposing large matrices. How can you say O(w*h)->O(1) is no improvement?
Thanks for your messages. I dont think the algorithm can run in O(1). The main problem with the naive approach (calculating the proper positions for each entry) is large stride you would have to perform when operating on very large matrices. The aim is to fight that large stride. I thought of doing the following: iterate through the elements of the original matrix, one by one, and store the entries in a std::vector<std::vector> of size originalMatrix.colNo and originalMatrix.rowNo, respectively. However, I dont think I fight the strde problem with this approach. Any thought on this? Thanks
For his storage requirements, a matrix transpose of any matrix not of the form 1xN or Mx1 (which is stored linearly in row-major order), simply swaping the width (section length) and height (number of sections) will not work. A simple 2x2 matrix suffices to demonstrate:

1 2  -->  1 2 3 4
3 4
1 3  -->  1 3 2 4
2 4


Again, the answer given at DaniWeb is likely the most efficient response you'll get.
For his storage requirements, a matrix transpose of any matrix not of the form 1xN or Mx1 (which is stored linearly in row-major order), simply swaping the width (section length) and height (number of sections) will not work.
Of course, but my suggestion doesn't simply swap the dimensions.
return this->data[(!this->transposed)?(y*this->w+x):(x*this->w+y)];
Unless reordering the data is a requirement, I don't see how it's a better solution than this.
...heh, my brain read your answer completely wrong...

sorry X-)
helios wrote:
What do you mean "no improvement"? The algorithm is much simpler, and its minute cost is refunded with interests when transposing large matrices. How can you say O(w*h)->O(1) is no improvement?

The transpose is faster and O(1), but the extra check in the access of the cells will cost you a lot.

r=c=n;
T=constant in O(n2) transpose;
A = cost of the access;
IF= cost of a check
T*n2 + x*A*n2 < x*(A+IF)*n2
x > T / IF

So, if in average, I made more than x whole matrix access (per transpose) your approach will be slower. This can happen in 1 gauss elimination.
I think that's why list::reverse() cost O(n) instead of O(1)

onako wrote:
I store matrix entries in std::vector<double>

How do you add a column? If you'll transpose, add a row, transpose, then helios's approach will not work.
Why don't you use vector<vector<double> >



Last edited on
I just ran a little benchmark.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>

class matrix{
	std::vector<double> data;
	unsigned w,h;
	bool transposed;
public:
	matrix(unsigned w,unsigned h):data(w*h,0),w(w),h(h){
		this->transposed=0;
	}
	void matrix::transpose(){
		this->transposed=!this->transposed;
		std::swap(this->w,this->h);
	}
	double &index(unsigned x,unsigned y){
		return this->data[y*this->w+x];
	}
	double &index_2(unsigned x,unsigned y){
		return this->data[(!this->transposed)?(y*this->w+x):(x*this->w+y)];
	}
};

int main(){
	const int N=20*1000*1000;
	const unsigned s=1000;
	matrix m(s,s);
	clock_t t0,t1;
	t0=clock();
	for (int a=0;a<N;a++)
		m.index(rand()%s,rand()%s)=rand();
	t1=clock();
	std::cout <<t1-t0<<std::endl;
	t0=clock();
	for (int a=0;a<N;a++)
		m.index_2(rand()%s,rand()%s)=rand();
	t1=clock();
	std::cout <<t1-t0<<std::endl;

	m.transpose();

	t0=clock();
	for (int a=0;a<N;a++)
		m.index_2(rand()%s,rand()%s)=rand();
	t1=clock();
	std::cout <<t1-t0<<std::endl;
	return 0;
}

Compiler command: g++ test.cpp -o test -O3
Output (ms):
1562
1531
1531

Note: changing the matrix's size barely affects the times. Using a 100x100 matrix makes matrix::index_2() 0.1/2E+7 s faster, not slower, than matrix::index(). Without optimizations, matrix::index_2() is still 0.016/2E+7 s faster on average.
Topic archived. No new replies allowed.