Finding numerical values in Strings

Hello,

I'm trying to read in a large file of data. Sadly, the people who released this data did a terrible job at formatting it, making it very hard to read. My default reader (a method I found referenced on this forum a few months back) already gives back a better result than I anticipated. However, I still need to handle strings of letters and numbers combined. There are two types:

1. From the string "CAPACITY : 100" I need to extract the "100". The number could be any size, so no guarantee that it's the final 3 characters. It is guaranteed to be the only numerical value in that string and the non-numerical part (here: "CAPACITY : " will be identical for each instance. If possible, the ability to read a similar string with non-identical text and two numbers on random locations in that string would be handy, but not required.

2. From the string " 1 82 76" I need to extract the second and third numbers. The amount of digits of each numbers can vary, but they will always be present (so at least one digit) and they'll always be separated by a space. They will always be integers (but if there's an easy way to make it extendable to other numerical types, that could be handy); they can be positive and negative (obviously shown by a minus sign in front of the number itself).

What would be the easiest way to do this?
The easiest way to handle this would be, without a doubt, using regular expressions. Regular Expressions are not (yet) a standard feature of C++, but the Boost C++ Libraries provide such functionality with Boost.Regex

http://onlamp.com/pub/a/onlamp/2006/04/06/boostregex.html

Read this article. It may take a while to get used to regular expressions, but it's the only way to handle that kind of data without going insane.
Hi,

you can use std::stringstream for this.
for your first problem you can to it like this: (let text be the string containing "CAPACITY : 100")
1
2
3
4
5
std::string tmp;
int value;
char c;
std::stringstream ss(text);
ss >> tmp >> c >> value;

tmp will be CAPACITY, c will be : and value will be 100.
but this only works because capacity and : are seperated by a space. if they are not, you dont need the >> c part.

for your second problem: (let text be the string containing "1 82 76")
1
2
3
4
5
int a;
int b;
int c;
std::stringstream ss(text);
ss >> a >> b >> c;

a will be 1, b 82 and c 76

hope this helps.
Mathes, the problematic part is that he probably can't depend on the spacing. Lots of ifs and thens there.
if they are not, you dont need the >> c part.

Rather, if it wasn't spaced like that the string would break. And there is a chance that it's written like that:
CAPACITY:100

In that case, you would need yet another way to extract it.
Last edited on
Thank you both for the quick responses!

I've quickly reviewed the files and it seems the spacing is quite consistent in these files. I've got other files where they're not so consistent (spaces uses as padding so the numbers visually form a neat column, so you get " 3" and " 30", for example), so I guess I'll have to familiarize myself with Regular Expressions anyway, but for now the stringstream methodology will do.

Trying it now; I'll let you know how it worked out.
If it doesn't work because of spacing and what not, maybe you could read a single char to see if it is a number; if so then use the method above otherwise move the file cursor forward with ios::seekg.
joh, that would make it even more complicated.
You can also pick a delimiter and use getline. Consider:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ifstream fin( "data.txt" );
string line;
while( getline( fin, line ) ) {
    // process each line
    if( line.find( "CAPACITY" ) != string::npos ) {
        // cap. line
        stringstream ss( line );
        string token;
        getline( ss, token, ':' ); // token = string before :
        getline( ss, token, ':' ); // token = string after :
        ss.str( token );
        int capacity;
        ss >> capacity; // this will ignore whitespace
        //...
    }
    else {
        // values line - Mathes solution above will work here (also ignoring whitespace)
    }
}

The best solution I can think of, however, is not to use C++ to extract the data... :)
Last edited on
Would definitely work, but I am still for him to look into Regular expressions cause it would help in situations where he wouldn't get away as easily as that.
My thanks to everyone who responded.

Mathes' solution was sufficient for my current problem, so I've used that for now, and hanst99's Regular Expressions seem like the best option when a completely flexible reader is required, so I've added that to my "To Read" bookmarks.

How it looks now:

Three methods, one for each "format" present in my data files:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
int catchInt(string &s) {
	std::stringstream ss(s);
	std::string tmp;
	char c;
	int value;
	ss >> tmp >> c >> value;
	return value;
}

void catchCoords(string &s, int &x, int &y) {
	int a;
	std::stringstream ss(s);
	ss >> a >> x >> y;
}

void catchDemand(string &s, float &d) {
	int a;
	std::stringstream ss(s);
	ss >> a >> d;
}


which are called by passing the string that contains the data and the containers for the values to be saved:
1
2
3
4
5
6
nnode = catchInt(file_f[3][0]);
	// Fetch node data
	for (int l = 0; l < nnode; ++l) {
		catchCoords(file_f[l+7][0], x[l], y[l]);
		catchDemand(file_f[l+8+nnode][0], d[l]);
	}

thereby saving the values where they belong.

I was pleasantly surprised to see that while 'd' is actually defined as a vector of floats, it has no problems with the mostly integer values in the file. I was afraid it would skip non-float values.

Thanks again to everyone who helped out!
Topic archived. No new replies allowed.