Multidimentional arrays are evil

Multidimentional arrays are evil.



I see a lot of noobs get sucked into the vortex that is multidimentional (hereon MD) arrays. MD arrays are a "feature" of C/C++ which allow you to use multiple indeces when looking up things in an array. A typical use is to have a 2D array to represent a grid or a game map with 1 index for X coords and another for Y coords. Another common usage is for a 'Matrix' class or somesuch.

The thing these coders don't know (or don't realize) is that MD arrays suck. They're syntax poison. They're confusing, hard to work with, and (depending on the implementation) may actually consume more memory and yield worse performance.

After seeing more and more posts about MD arrays pop up on these forums, I've decided to sketch out this article to shine some light on MD arrays, and provide a reasonable alternative.

-----------------------------------

MD Array Flavor 1: Straight

The most obvious way to make an MD array is just a straight allocation:

1
2
3
int evil[5][7];

evil[3][2] = 5;


Seems harmless enough. And it is, for the most part. This is the friendliest flavor of MD arrays.

However even with this form, problems creep up. What if you need to pass this array to another function? The way you'd do it is like so:

1
2
3
4
5
6
7
8
int func(int p[][7])
{
  p[2][3] = 1;
}

//...

func(evil);


This will pass 'evil' by pointer to the 'func' function. This isn't all that horrible... but one of its limitations is that it will only accept an array of [x][7]. What if you want to call the function with an MD array of a different size... like int evil[4][5];? Bottom line is you can't. You'd have to take the MD array out of the function completely and turn it into a 1D array and pass the width as a seperate parameter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int func2(int p[], int width)
{
  // p[2][3] = 1;         // no good anymore, 1D array
  p[ (2*width) + 3] = 1;  // the 1D apprach to do the same thing

/*  It's worth noting here that the above (y*width)+x calculation is basically what
    MD arrays compile to anyway.  The compiler just hides it from you with MD arrays.
     Or at least it's true with Straight Flavor MD arrays.  Other flavors are more sinister.
*/
}

//...
int evil[4][5];
func2(evil[0],5);  // ugly 


You might notice I marked that last line as ugly. It's because it is. It is confusing, has unclear syntax, and it is very easy to make a mistake and pass the wrong value as the width.

The other problem with this flavor is that it is completely incompatible with other MD flavors. As you'll soon see.


MD Array Flavor 2: Nested New (dynamic)

Once people graduate from straight MD arrays, they'll start wondering how to make the size of their arrays dynamic. This is where the dreaded "nested new" technique comes in:

1
2
3
4
5
6
7
8
9
10
11
12
// let's say we want to dynamically make int[Y][X]:
int** superevil = new int*[Y];
for(int i = 0; i < Y; ++i)
  superevil[i] = new int[X];

// now we can do this:
superevil[2][3] = 1;

// but cleanup is just as ugly as allocation:
for(int i = 0; i < Y; ++i)
  delete[] superevil[i];
delete[] superevil;


Note that this actually makes a pointer to a pointer, and not really a straight 2D array. And while "superevil" and "evil" (from the previous section) are both accessed with [double][brakets], they both function very differently under the hood.

Nested New MD arrays are inefficient. In addition to allocating all the space for the normal 2D array, you're also allocating space for POINTERS. Not only that, but in order to access any element in the array, you have to dereference two pointers!

And really... look at the messy allocation/destruction code that's required.

And let's go back to calling functions... let's take our two functions from the previous section:

1
2
int func(int p[][7]);
int func2(int p[], int width);


How would you call those functions with 'superevil'?

Guess what. You can't. That's right... you're hosed. Nested New arrays are not really 2D arrays in the same sense that straight arrays are. Straight MD arrays only require 1 dereference, but nested new requires 2 dereferences. In order to pass this to a function, you'd need to make another one:

 
int func3(int** p,int width);


But now you might notice that func3 doesn't work with straight arrays. They're just totally incompatible.

The one saving grace to the nested new approach is that it doesn't require the entire MD array to be contiguous, which means you have a better chance of getting the memory you need if memory is badly fragmented. But this mild benefit is seldom applicable, and is overshadowed by its faults.

MD Array Flavor 3: Vectors of Vectors

The more STL savvy coder will opt for a vector-of-a-vector approach to MD arrays, rather than Nested New:

1
2
3
4
// to make a [5][6] array:
vector< vector<int> > stillevil( 5, vector<int>(6) );

stillevil[2][3] = 1;


Not a lot to say about this one. Like Nested New it's incompatible with other MD array flavors. If you need to pass this array to a function, none of the above functions will work.

Also, like Nested New, it's inefficent in that is requires additional memory and has double indirection.

On the plus side, you don't need cleanup code, and the allocation code is not as ugly (but still pretty ugly!)

Killing MD: 1D Mimics

An MD array can be simulated in a 1D array as long as you know the dims. For example, if you want a 2x3 array, it would conceptually look like the following:


0 1
2 3
4 5


This array has 6 elements (2 * 3). And you'll notice that each row starts with a multiple of the width (2). Therefore you can simulate a 2D array in a 1D array like so:

1
2
3
4
5
// int evil[5][6];      // desired 2D array
int mimic[5*6];         // mimicing 1D array

// evil[2][3] = 1;      // desired [2][3] index
mimic[ (2*6) + 3] = 1;  // mimiced [2][3] index 


All that math looks kind of dumb and inefficient. Really, though, the compiler does all that math for MD arrays too, so this isn't any less efficient. It is pretty ugly though, granted.

So you might be asking yourself... "what is the benefit of this?"

Well, unlike MD array flavors which are all different and totally incompatible, the respective 1D flavors are all compatible!

Consider the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
int func2(int p[], int width);  // remember this function?
                                // it'll work with all flavors of mimicing 1D arrays!

int a[ 5 * 6 ];
func2( a, 6 );  // works
//---

int* b = new int[5*6];
func2( b, 6 );  // works
//---

vector<int> c(5*6);
func2( &c[0], 6 );  // works (but is a little ugly, granted) 


See how graceful it all is? Aside from the ugly math in the indexing, everything flows together so much nicer. This is the first step to killing MD arrays. The next step is even more fantastic.

Killing MD: Abstracted array classes

So while 1D arrays are easier to work with, They do suffer from a few problems:

- look up requires you to know the dims of the array (can't do y*width without knowing the width).
- look up syntax is ugly and easy to botch. It only gets worse as you get into larger dimensions. Imagine a mimiced 3D array: z*width*height + y*width + x.

C++ provides an excellent solution to this problem: classes. Just make an Array2D/Matrix/whatever class to wrap all of this behavior.

(Or don't -- there are probably many premade ones available online that you can use, I never actually bothered looking. Boost probably has one, I'd imagine -- I'll have to research this more some day. If you are reading further in this article I'm going to assume you don't have / don't want to use such a library and want to write your own)

While you can't overload the [bracket] operator to take two parameters*, you can overload the parenthesis operator. So syntax is a little different:

1
2
3
4
5
// int bad[4][5];
Array2D<int> good(4,5);

// bad[1][2] = 3;
good(1,2) = 3;


(* Yes you can overload the brakets and return a weird type to make double brakets work, but it causes all sorts of problems and defeats all the elegance of using a fully contained class)


So how do you make this Array2D class? As long as you don't want bells and whistles, it can be very simple.

Here is a very simple class with no bells or whistles:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
template <typename T>
class Array2D
{
public:
  // constructor
  Array2D(unsigned wd,unsigned ht)
    : nWd(wd), nHt(ht), pAr(0)
  {
    if(wd > 0 && ht > 0)
      pAr = new T[wd * ht];
  }

  // destructor
  ~Array2D()
  {
    delete[] pAr;
  }

  // indexing (parenthesis operator)
  //  two of them (for const correctness)

  const T& operator () (unsigned x,unsigned y) const
  {  return pAr[ y*nWd + x ];   }

  T& operator () (unsigned x,unsigned y)
  {  return pAr[ y*nWd + x ];   }

  // get dims
  unsigned GetWd() const { return nWd; }
  unsigned GetHt() const { return nHt; }


  // private data members
private:
  unsigned nWd;
  unsigned nHt;
  T*       pAr;

  // to prevent unwanted copying:
  Array2D(const Array2D<T>&);
  Array2D& operator = (const Array2D<T>&);
};


That's all there is to it!

If you want to get fancy you can add in resizing stuff, copy-on-write reference counting, boundary checking, etc, etc.
Last edited on
I don't agree with this article, MD arrays are not more evil than 1D arrays, they are just more complex
Vectors aren't supposed to be compatible with arrays

Quoting Stroustrup:
The built-in arrays are a major source of errors – especially when they are used to build multidimensional arrays.
For novices, they are also a major source of confusion. Wherever possible, use vector, list, valarray, string, etc.

STL containers don't have the same problems as built in arrays
I probably should've made the article less opinionated. I actually tried to tone it back (I really dislike MD arrays), but it is still pretty 1 sided.

In any event, I agree that in general arrays should be avoided and that a container class should be used. But since STL doesn't provide a MD container, you're better off getting one from somewhere else (boost?) or making your own (above). It's much more preferable to vectors of vectors, IMO... and that's really the point I was trying to get across here.
Vectors are supposed to be the C++/STL equivalent of an array. That's why operator[]
doesn't do bounds checking.

The real problem with noobs is that they don't understand that the data structure used
internally does not need to reflect how the user of the program perceives it. Just because
tic-tac-toe is a 3x3 (2D) grid does not mean it needs to be stored internally that way. They
could, for example, use the 1D conversion trick (which I happen to like).

FWIW, Boost does provide boost::array<>, which does have the limitation that the array
size is completely fixed because it is a template parameter. All boost::array buys you
is initializer syntax:

 
boost::array<int, 5> a = { 1, 2, 3, 4, 5 };


The reason why I don't like 2D arrays is because it is very easy to confuse the indices, leading
to hard to find bugs. I think of 2D arrays (tables in general) as rows first then columns. I
also think of coordinates as (x,y) -- x first, y second. Unfortunately, x represents the column
and y the row, which means I have a paradox of terminology that leads me inevitably to
inverting the indices all over the place.

To solve that problem, I'll often use strong types for Row and Column:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct Row {
    explicit Row( size_t r ) : row( r ) {}
    size_t operator()() const { return row; }
  private:
    size_t row;
};

struct Col {
    explicit Col( size_t c ) : col( c ) {}
    size_t operator()() const { return col; }
  private:
    size_t col;
};

// Then, the 2D array class does:
T operator()( Row r, Col c ) const {
    return pAr[ r()*nWd + c() ];
}


And now I cannot possibly invert them, at the cost of some syntactical baggage:

1
2
3
4
5
int main() {
    Array2D<int> a( Row(2), Col(3) );

    std::cout << a( Row(1), Col(2) ) << std::endl;
}


When dealing with 2D vectors, I would opt for a factory function:

1
2
3
4
5
6
7
8
9
10
11
template <typename T>
struct Vector2D {
  typedef vector< vector<T> > type;
};

template <typename T>
void createVector2D(T& out, size_t x, size_t y)
{
  out = vector<typename T::value_type>(
    x, vector<typename T::value_type::value_type>(y));
}


This makes creating & constructing the 2D array a bit easier.

1
2
3
4
5
int main()
{
  Vector2D<string>::type foo;
  createVector2D(foo, 3, 4);
}


I almost always resort to converting multi-dimensional structures to 1D though.
All boost::array buys you is initializer syntax:


How does it accomplish that? I tried to do something like that in a class I was writing and wasn't able to figure out to get that to work.
They may be using the C struct initialization way. In C++0x this will be easy
How does it accomplish that? I tried to do something like that in a class I was writing and wasn't able to figure out to get that to work.

Have you tried looking at the code? I've looked at some of the boost code and it seems to be written in a way that is easier to read compared to the std library implementations that I have seen.
Topic archived. No new replies allowed.