std::Arrays, Strings, etc
Here you can find information on dealing with arrays of any kind, strings, lists, sequences, iterators, etc.
Contents
What’s a sequence anyway? 
Mathematically, a
sequence is a list of things (like numbers or events) that are in a specific order. Individual items in the list are called
elements, and they may repeat. The cardinality – the
size or
length – of the sequence is the total number of elements it contains. (The length of some sequences is infinite.)
The idea of a sequence is essential to computation, and you have likely already been introduced to it in various different guises.
arrays 
An array is a sequence of like objects stored adjacently (or
contiguously) in memory. For example,
int primes[] = { 2, 7, 101, 571, 13, 7, 7, 5, 2017 };
Obviously, this array contains some repeats, but it has a very specific order. The first element is the number 2, the second is the number 7, the third is the number 101, etc. In this particular example the elements are all prime numbers, but the defining characteristic for an array is that all the elements are the
same type – in this case they are all
int values. All of these items are stored somewhere in memory, where each element is immediately adjacent to the next.
The
primes array’s address, in this example, is at 0x4790322 in memory (I just made that up – it could be at any usable memory address).
On my 32-bit machine, an
int uses four bytes of memory. Hence, each
int in the array uses four contiguous bytes of memory, and each integer is adjacent to the next integer in the array.
Endianness is the order in which individual bytes of a value are stored in memory. Big-endian stores bytes from most to least significant, as in the example. Little-endian stores bytes from least to most signficant. 0x0000023B equals 571. In little-endian, it would be stored in order 0x3B, 0x02, 0x00, 0x00. PCs are typically little-endian, where workstations tend to be big-endian.
Because of this, the
address of any one element (in this example, any
int) can be calculated as:
| address_of_first_element + (index_of_element * element_size) |
|
So when you say
primes[ n ] or
*(primes + n) what the compiler does for you is calculate
*(int*)(0x4790322 + (n * 4)).
If ‘
n’ is 3, then 0x4790322 + (3 * 4) equals
0x479032E, the address of the fourth element.
This makes for a very quick, efficient element lookup, and is the basis for all pointer arithmetic.
multidimensional arrays 
So far, we have only considered arrays with
one dimension. A multidimensional array is one where the elements themselves are also arrays. Take a minute to consider what that means in terms of memory layout.
Given the array
int M[ 3 ][ 4 ], we now have an array of three elements, each of which is an array of four integers. We call the first dimension the
outer or
major dimension, and the second the
inner or
minor dimension. In memory, it looks like this:
 |
Here is what we wrote to get this array:
1 2 3 4 5 6 7
|
int M[ 3 ][ 4 ] =
{
{ 1, 0, 0, 0 },
{ 0, 1, 0, 0 },
{ 0, 0, 1, 0 }
};
|
It looks a lot like a one-dimensional array when lined up in memory like that, doesn’t it? In C and C++, it really is, and once again the compiler hides the lookup details in a formula much like the one in the previous example.
address_of_first_major_dimension_element
+ (index_of_major_dimension_element * major_dimension_size)
+ (index_of_minor_dimension_element * minor_dimension_size) |
For our example, it would be
*(int*)(0x4790322 + (i * (3 * 4)) + (j * 4))
The major dimension (or outer dimension), as you can see, is the one that uses the most memory per element. (Because each element is itself an array.)
The minor (or inner) dimension uses less memory per element; each element is an int.
Hopefully by now you can see a pattern in the method for calculating the address of an element in the array with multiple dimensions. But for the moment, we do not actually have to worry about all that, all we need to do to index an element is use the usual C/C++ subscript operators:
M[ i ][ j ] = 42;
A later FAQ goes into detail about how to handle arbitrary multidimensional arrays.
|
| |
strings
A string (in C and C++) is an array with
characters as elements. Since this is so common a structure in computing, most languages make it very easy to handle strings. In C++, the
std::string class makes handling mutable, resizable character strings simple. (C, alas, keeps it
simple in a different way.)
containers
Arrays are very simple structures, and somewhat unwieldly (what with having to manage all those pointers and memory directly). C++ provides a collection of standard
containers, or classes that can be used to store any type of data, just as we have already seen with the
std::string class.
An
std::array is a C++11 container class that makes it much easier and safer to handle simple arrays.
A
std::vector is like a string – it manages a mutable, resizable array – except the elements are not necessarily characters. (They
could be characters, however the
std::string class is specially optimized to deal with character strings, whereas the
std::vector class is designed to gracefully handle
anything [that is copyable, at least].)
lists
A list is also a sequence, but the elements need not be stored contiguously in memory. C++ provides a number of list structures, all with varying characteristics that impact their usefulness for a specific situation.
In computer science, a list is usually implemented in terms of a
linked list. The simplest form of a linked list is where each element of the list contains a
pair or
tuple. The first item in the pair or tuple is the object being stored in the list. The next item is a pointer to the next pair (or tuple) in the list. And the third (if any) is a pointer to the previous tuple in the list. (The pair or tuple might be implemented in terms of a simple
struct, though.)
The
std::list is exactly such a structure, and is very useful for quick insertions, deletions, and reorderings. However, unlike an array, where elements can be found by using a simple, direct index into the sequence, a list’s elements can only be found by following the chain of pointers from the first item to the n
th.
The
std::deque is something of a hybrid structure, where the tuples contain small arrays holding only a part of the whole sequence. Use it when you need quick element lookup and also fast element insert and delete. It also handles limited memory resources better than a
std::vector.
Iterators 
Iteration is repeating a process until some condition is met. In languages like C and C++, this means ‘using loops’. An
iterator is a construct that allows someone to access the elements of a sequence, in order.
The quintessential iterator is... a
pointer! This is very convenient when handling arrays and linked lists, but things become a bit more complicated when using C++ container classes. Hence, C++ container classes typically define an
iterator class so that you can iterate over the elements of a container as if it were an array indexed with a pointer. (The STL container classes all define iterator classes. Designing an iterator class is actually rather difficult, so some third-party libraries lack them.)
Here is the same thing in C and C++, using iterators.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
#include <stdio.h>
typedef const int* iterator;
int sum( iterator begin, iterator end )
{
int result = 0;
while (begin != end)
{
result += *begin++;
}
return result;
}
int main()
{
int primes[] = { 2, 3, 5, 7, 11, 13, 17 };
printf( "sum of first 7 primes = %d\n",
sum( primes, primes + 7 ) );
return 0;
}
|
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
#include <iostream>
#include <vector>
typedef std::vector <int> ::const_iterator iterator;
int sum( iterator begin, iterator end )
{
int result = 0;
while (begin != end)
{
result += *begin++;
}
return result;
}
int main()
{
int f_primes[] = { 2, 3, 5, 7, 11, 13, 17 };
std::vector <int> primes( f_primes, f_primes + 7 );
std::cout << "sum of first 7 primes = "
<< sum( primes.begin(), primes.end() ) << "\n";
return 0;
}
|
|
(There are, of course, better ways of doing this; it is just an example. If you can think of a better one, please let
Duoas know.)
Associative Containers 
An associative container is one in which each element (or
value) in the container is indexed by an associated
key. So far the containers we have covered associate a unique,
ordinal integer
index with a value; to access the value, you need to know the ordinal index. The obvious caveat to positional indices is that the key may not be any
arbitrary integer value — it must a counting number falling in a specific range. When talking about associative containers, it is typically for containers where the key is an arbitrary value — integer, string, or otherwise.
The consequence of this is that elements in an associative container do not have explicit ordering. (We will get back to this in a moment.)
In C++, the
std::set container is an associative container where the
key is the value, and where
each key is unique. Essentially, a value is either in the set or it is not, just as it is in mathematics. As with all STL containers, you can get the cardinality of the set with the
size() member function. You can test whether it is
empty() and you can test whether an item is
in the set using the
count() member function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
#include <iostream>
#include <set>
using namespace std;
int main()
{
int f_primes[] = { 2, 3, 5, 7, 11, 13, 17 };
set <int> primes( f_primes, f_primes + 7 );
for (int n = 0; n < 18; n++)
{
cout << n << " is ";
if (!primes.count( n )) cout << "not ";
cout << "prime.\n";
}
return 0;
}
|
0 is not prime.
1 is not prime.
2 is prime.
3 is prime.
4 is not prime.
5 is prime.
6 is not prime.
7 is prime.
8 is not prime.
9 is not prime.
10 is not prime.
11 is prime.
12 is not prime.
13 is prime.
14 is not prime.
15 is not prime.
16 is not prime.
17 is prime. |
A
std::multiset is a set that may contain duplicate keys.
If you want a lookup table, use a
std::map or a
std::multimap, where the first enforces unique keys and the second does not. (The value attached to each key can be anything, unique or not. This is a way of attaching multiple attributes [or values] to a single item [or key].) Again, use the
size(),
empty(), and
count() member functions to obtain the cardinality, emptiness, and presence of keys in the container.
A common associative mapping is found in INI or RC files, which often just list keys and values. Let’s do some
very simple INI parsing.
a.ini
| |
# hello world
iterations = 6
[variables]
quux=Yes! Please quux!
[meta]
foo = what's up?
bar = 23
baz = hello, handsome!
quux = smile!
[variables]
foo=2
bar=3
baz=5
quux = :O)
|
|
Here is a very simple INI file. It has comments, whitespace, and sections.
Notice how it does not have quoted string fields, or multiline values.
Our parser will be very simple. We’ll read the file, one non-empty line at a time, and decide what to do with it.
- If it is a comment, we’ll just ignore it.
- If it is a section tag, we will remember the section name. (Initially, there is no section name.)
- Everything else is assumed to be a key=value pair, where the value is optional. Leading and trailing whitespace is stripped from the key, but only leading whitespace from the value. All keys and values are strings.
In the resulting map, section and key names are joined with a period (‘.’). If you repeat a section and a key, it will replace the earlier value.
|
a.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
|
#include <fstream>
#include <iostream>
#include <map>
#include <sstream>
#include <string>
using namespace std;
typedef map <string, string> ini_data_t;
typedef ini_data_t::const_iterator ini_iterator;
void read_ini( const string& filename, ini_data_t& result )
{
result.clear();
ifstream f( filename.c_str() );
if (!f) return;
string s, section, key, value;
size_t n;
while ((f >> ws) && getline( f, s ))
{
switch (s[ 0 ])
{
case '[': section = s.substr( 1,
s.find_last_of( "]" ) - 1 ) + ".";
case '#': break;
default:
n = s.find( '=' );
key = s.substr( 0, s.find_last_not_of( " \t", n - 1 ) + 1 );
value = s.substr( s.find_first_not_of( " \t", n + 1 ) );
result[ section + key ] = value;
}
}
}
int to_int( const string& s, int default_value = 0 )
{
istringstream ss( s );
int result = default_value;
ss >> result;
return result;
}
int main()
{
ini_data_t ini_data;
read_ini( "a.ini", ini_data );
int iterations = to_int( ini_data[ "iterations" ], 0 );
if (ini_data.count( "meta.quux" ))
{
for (int n = 0; n < iterations; n++)
cout << (n + 1) << " " << ini_data[ "meta.quux" ] << endl;
}
cout << "\nvariables:\n";
string sectionname = "variables";
char sep = '.';
char xep = sep + 1;
ini_iterator iter = ini_data.lower_bound( sectionname + sep );
ini_iterator last = ini_data.upper_bound( sectionname + xep );
for (; iter != last; ++iter)
{
cout << (*iter).first.substr( sectionname.length() + 1 )
<< " = "
<< (*iter).second
<< "\n";
}
return 0;
}
|
1 smile!
2 smile!
3 smile!
4 smile!
5 smile!
6 smile!
variables:
bar = 3
baz = 5
foo = 2
quux = :O) |
Alas, most of this example is spent decoding the INI file, and it is too stupid to handle quoted values, but it demonstrates a couple of important concepts.
First, the
[] operator causes a new key to be added if not already present. On line 48 this is a good thing, since we want an "iterations" value no matter what. However, on line 50 we first check to see if the "meta.quux" value is available for use.
Next, we can actually range across values using the
lower_bound() and
upper_bound() member functions. The example above used a
std::map, where values are unique, and it worked on the trick where the section name and key are joined by a period. For a
std::multimap, where there may be multiple identical keys, this will give you access to all values that have the same key (but in no particular order).
But they are still sequences... 
One of the major mathematical properties of an associative container is that its elements are unordered.
That is, however, not quite true in a computer. If the elements really were not stored in relative order, it would take a very expensive linear search to find an element in the container,
particularly when the key you are searching for is
not in the container. It quickly grows worse the larger the container’s cardinality. (Remember, one of the primary functions of an associative container is to list an item as ‘present’ or ‘not present’.)
Rather than store the elements in some haphazard linear fashion, the elements are actually stored in a fancy structure called a
binary search tree. Without going into details, a BST allows very efficient searches, insertions, and deletions.
It is also a property that when a BST is traversed (that is, when you
iterate over its elements), you get the elements out in sorted order. The standard associative containers actually have a sorting function as part of their type, which is used to access and maintain the BST. Here is an example to print out an entire sequence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
#include <iostream>
#include <set>
#include <sstream>
#include <string>
using namespace std;
int main()
{
size_t count = 0;
set <int> xs;
cout << "Enter a list of integers with repeats:\n";
{
int x;
string s;
getline( cin, s );
istringstream ss( s );
while (ss >> x)
{
xs.insert( x );
count += 1;
}
}
cout << "You entered " << count << " integers.\n";
cout << "Only " << xs.size() << " of them were unique:\n";
set <int> ::const_iterator x;
for (x = xs.begin(); x != xs.end(); ++x)
{
cout << *x << " ";
}
cout << endl;
return 0;
}
|
Enter a list of integers with repeats:
1 7 9 -3 7 2 1 3
You entered 8 integers.
Only 6 of them were unique:
-3 1 2 3 7 9 |
This is a pretty handy trick if you want a sorted list of unique inputs!
Other Kinds of Sequences 
So far, we have only touched on contained, finite sequences. There also exist container sequences that are infinite, though not in
standard C++. They are more typically found in functional languages.
Another kind of sequence we have not yet considered is an
iostream! That’s right, streams are (potentially) infinite sequences.
But that is another topic.