want to "harden" a file reading utility:

Pages: 12
Hi, all –

I've written a small routine to populate an integer array with values from a file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <fstream>
#include <string>

#include <getcoeff.h>

using namespace std;

long	getCoeff(string file, long	nbrCells, long* a)
{
	long		rc;
	ifstream    s(file.c_str());

	if (s)
	{
		for (int i=0; i < nbrCells; i++)
			s >> a[i];
		rc = 0;
	}
	else
	{
		cout << "Cannot open " << file << "." << endl;
		rc = 1;
	}
		s.close();
	return rc;
}


This works fine...when there's nothing but valid integer values in the file. Suppose I want to "harden" this routine, to accommodate comments and invalid data in the file.

The reason this is a C++ question is: I don't know how much I can rely (if at all) on the "smarts" of the >> operator in this routine. Will this do anything for me in terms of rejecting non-integer information? If so, is this documented anywhere?

Thanks.
closed account (zb0S216C)
Can you clarify harden? You can read each line with ifstream::getline( )[1]. You can also look into ifstream::read( )[2] and ifstream::readsome( )[3].

References:
[1]http://www.cplusplus.com/reference/iostream/istream/getline/
[2]http://www.cplusplus.com/reference/iostream/istream/read/
[3]http://www.cplusplus.com/reference/iostream/istream/readsome/


Wazzak
Last edited on
Thanks...I'll look at those references. What I meant by "harden" was just to make the routine more robust, so that it would only process valid data, would ignore invalid data, and so on. For example, if lines 1 through 99 were valid integers, and line 100 said something like "line 100" it would skip "line" and use the number 100. Or, if it said "100 line" it would take the 100 and skip line.

I suppose I could modify it to allow for comments, C++ style, and skip everything after a "//". I'm willing to do the work myself, but I was hoping for some insight into what (if anything) the operator does for me. I remember from my C++ class that C++ was a lot "smarter" than C when it came to type-sensitive data processing on reads and writes.

Thanks.
closed account (zb0S216C)
The single-line C++ comment is terminated with the new-line character. So all you've got to do is ignore everything until a new-line is reached, then start reading the file as normal.

mzimmers wrote:
I was hoping for some insight into what (if anything) the operator does for me. (sic)

Are you referring to the << operator?

Wazzak
Well, in my case, it's the >> operator. I guess this operator assumes properly formatted input, though, doesn't it?
By "harden" he means that he wants the program/function to properly handle both conforming and non-conforming input.

How exactly you do this depends on how you intend to handle invalid input, and what kind of input is acceptable.

Here's one method: simply ignore non integer data, with special handling for comments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// Comments are just like C++ comments... but that isn't necessarily the case.
// You could use "#" or ";" if you like.
// You can also make commentary more complex if you like, but the method to
// handle it becomes more complex as well...
//
// This routine does not assume that whitespace leads commentary. That means that you
// can have comments starting (without intervening whitespace) after both valid and invalid
// input. This works because valid input will stop at the first non-integer character, and invalid
// input will have a "//" somewhere in it.
//
bool is_comment( const string& s )
  {
  return s.find( "//" ) != string::npos;
  }

// This thing lets us overload the extraction operator to read our ints with our special method
struct my_int
  {
  int& n;
  my_int( int& n ): n( n ) { }
  };

// This is our special method to read ints -- skipping bad integers and C++ commentary
istream& operator >> ( istream& ins, my_int& n )
  {
  // only continue if all OK
  if (ins && !ins.eof())
    // repeat while necessary to get an integer
    while (true)
      {
      // get an integer -- if OK then we're done
      if (ins >> n.n)
        break;

      // otherwise, get it as a string
      string s;
      ins.clear();
      ins >> s;

      // special handling for comments
      if (is_comment( s ))
        ins.ignore( numeric_limits <streamsize> ::max(), '\n' );
      }

  // return the stream
  return ins;
  }

Now, change line 17 of your code to:

16
17
		for (int i=0; i < nbrCells; i++)
			s >> my_int( a[i] );

This code assumes that your integer data is surrounded by whitespace. If that is not the case, we can make some modifications, but it is slightly more involved to read the string.

BTW... I haven't had time to test this code. (I could have made a mistake!) If it doesn't work right let us know and we'll fix it.

Hope this helps.
Wow...I wasn't expecting anyone to actually solve this for me. This forum is great!

I'll implement and test it in the morning, and will report back either way.

Thanks!
Well, I got a compiler error message that I don't really understand:

../simulator/src/utils/readcoeffs.cpp:78: error: no match for 'operator>>' in 's >> my_int(((long int&)(((long int*)(((long unsigned int)i) * 8ul)) + a)))'

(I did convert this routine to long ints, but I get a similar message when it just uses ints.)

Any idea what this means? Thanks.
closed account (zb0S216C)
By looking at the error message I can say that the instantiation, s, doesn't have an overloaded >> operator that takes the operand that you gave it. Here's an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct Simple
{
    Simple &operator >> ( std::ifstream & );
};

Simple &Simple::operator >> ( std::ifstream &Stream )
{
    // ...
}

int main( )
{
    Simple NewSimple;

    std::string Input;

    Simple >> Input;
    // Huh? What's std::string when it's at home?
    // Is it a pencil? A measuring tape maybe?
    // Please describe it to me :)
}


Wazzak
Last edited on
Framework –

I'm sorry, but I don't understand what you're trying to tell me here.

I'm beginning to think that an overloaded operator is probably overkill, since I truly only use this operator in one place. But, I'm not sure that inline coding this will fix the issue.
closed account (zb0S216C)
What I'm basically saying is that if an operator doesn't support the type of the passed operand, the compiler will either attempt to convert the type, or throw an error at you. This is what's happening in my previous post.

In Simple, I overloaded the >> operator to accept an instantiation of std::ifstream. However, I passed it a std::string instantiation, which I didn't provide, so the compiler cried. Instead, I have to overload the >> operator again so it accepts a std::string instantiation. If I overload an operator to accept a given type, I'm giving the operator knowledge on how to handle a specific type.

Wazzak
Last edited on
I forgot something important, and I made two mistakes.

The first thing is that the temporary object must be const, but the reference must be mutable.

The first mistake I made is that you are operating on long and not int.
The second mistake is I should have put the condition at the while loop.

(This is what I get for not testing before posting... But as I now have a moment to do that...)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include <iostream>
#include <fstream>
#include <string>

#include <limits>

using namespace std;

// Comments are just like C++ comments... but that isn't necessarily the case.
// You could use "#" or ";" if you like.
// You can also make commentary more complex if you like, but the method to
// handle it becomes more complex as well...
//
// This routine does not assume that whitespace leads commentary. That means that you
// can have comments starting (without intervening whitespace) after both valid and invalid
// input. This works because valid input will stop at the first non-integer character, and invalid
// input will have a "//" somewhere in it.
//
bool is_comment( const string& s )
  {
  return s.find( "//" ) != string::npos;
  }

// This thing lets us overload the extraction operator to read our ints with our special method
struct my_int
  {
  mutable long& n;
  my_int( long& n ): n( n ) { }
  };

// This is our special method to read ints -- skipping bad integers and C++ commentary
istream& operator >> ( istream& ins, const my_int& n )
  {
  // repeat while necessary to get an integer
  while (ins && !ins.eof())
    {
    // get an integer -- if OK then we're done
    if (ins >> n.n)
      break;

    // otherwise, get it as a string
    string s;
    ins.clear();
    ins >> s;

    // special handling for comments
    if (is_comment( s ))
      ins.ignore( numeric_limits <streamsize> ::max(), '\n' );
    }

  // return the stream
  return ins;
  }

long	getCoeff(string file, long	nbrCells, long* a)
{
	long		rc;
	ifstream    s(file.c_str());

	if (s)
	{
		for (int i=0; i < nbrCells; i++)
			s >> my_int( a[i] );
		rc = 0;
	}
	else
	{
		cout << "Cannot open " << file << "." << endl;
		rc = 1;
	}
		s.close();
	return rc;
}

int main( int argc, char** argv )
  {
  if (argc != 2)
    {
    cout << "usage:\n  " << argv[ 0 ] << " FILENAME\n";
    return 1;
    }

  long ns[ 12 ];

  long rc = getCoeff( argv[ 1 ], 12, ns );

  for (unsigned n = 0; n < 12; n++)
    cout << n << ":\t" << ns[ n ] << endl;

  cout << rc << endl;

  return 0;
  }

Tests:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// This is a test

1 2 3//hello
four 4 five//world 5
6 7 8
9

10 12 13
D:\prog\cc\foo> a foo.txt
0:      1
1:      2
2:      3
3:      4
4:      6
5:      7
6:      8
7:      9
8:      10
9:      12
10:     13
11:     2293728
0

D:\prog\cc\foo> 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
same source code as above

The first match is on line 64, since our
special extraction operator will only find
numbers that have leading whitespace.
D:\prog\cc\foo> a a.cpp
0:      0
1:      1
2:      2
3:      0
4:      1
5:      12
6:      1
7:      12
8:      0
9:      12
10:     0
11:     0
0

D:\prog\cc\foo> 

The extraction operator is convenient because it requires minimal changes to your existing code... However, you will notice that it really is just a function call. Instead of the fancy struct and extraction operator, you really only need to call a function, say:

1
2
3
4
istream& extract( istream& ins, long& n )
  {
  ...
  }
1
2
  for (...)
    extract( s, a[i] );


Hope this helps.
Thanks, Duoas -

That does compile now. Storage modifiers aren't exactly my strong point in C++, so would you mind telling me what we've done with the const/mutable keywords?

Also, what does if (ins >> n.n) mean? (line 38)

Thanks for all the help.
Temporary object references must be passed to functions as const -- meaning that you cannot modify them (a safety feature in the language to prevent unwanted side-effects). Except, in this case, we do want the side-effects (to modify the object it references), so we tell the compiler as much using the mutable keyword.

The if (f >> foo) is a common C++ idiom that means, "if the item (foo) was successfully obtained from the stream (f) [then do something]." You'll often find it in loops like:

1
2
3
4
5
6
7
8
9
// Get all the integers we can from the standard input
// (until there is no more input, or until something other
// than an integer is there).
int n;
vector <int> ns;
while (cin >> n)
  {
  ns.push_back( n );
  }

Again, you can dump all the fancy C++ stuff and just write it as a simple (normal) function, like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void extract( istream& ins, long& n )
  {
  // repeat while necessary to get an integer
  while (ins && !ins.eof())
    {
    // get an integer -- if OK then we're done
    if (ins >> n)
      break;

    // otherwise, get it as a string
    string s;
    ins.clear();
    ins >> s;

    // special handling for comments
    if (is_comment( s ))
      ins.ignore( numeric_limits <streamsize> ::max(), '\n' );
    }
  }

Glad to be of help. :-)
I appreciate the explanation. What I didn't understand in line 38 was the "n.n" construct. What exactly is that?
How about just
1
2
3
4
5
6
7
8
//loop through stream
{
    stream >> var;
    if( stream.fail() ) {
        //whoops, non-conforming data
        stream.clear(); //ignore the problem or possibly do something else
    }
}

What's wrong with a simple solution like that?
(Or you could even use exception handling if you don't like the if-statment...)
What's wrong with a simple solution like that?

How is that different than the solution I suggested?

Oh wait... your solution doesn't remove the invalid input from the stream...
And it doesn't address the OPs distinction between invalid data and comments.
Duoas, if you're still reading this, I need to make a change to this routine.

It turns out that the array I'm reading into isn't an array of longs, but an array of objects. I guess the right way to do this is to overload the >> operator for the object, but...how do I distinguish between this usage of ">>" and the bitshift usage? I've overloaded the bitshift operator as follows:

1
2
3
4
5
MyClass	operator>>  (MyClass &r, long i)           {
	MyClass  temp;
	temp.qCurrent = r.qCurrent >> i;
	return temp;
}


Thanks for any insight.
1
2
3
4
5
//global function
osteam& operator>>(ostream &out, const MyClass &obj) {
    ...
    return out;
}
You might need to make your class friends with that function...
Extraction operators don't take const references. Also, they work on input streams.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
istream& operator >> ( istream& ins, MyClass& obj )
  {
  // This is where you extract the pieces of your object and assemble them
  // into the argument obj.

  // Coding extraction stuff is not too simple. Remember that you must leave the stream
  // in a reasonable state. If something goes wrong, make sure to set the failbit 
  // http://www.cplusplus.com/reference/iostream/ios/setstate/

  ...

  // Don't forget to return the input stream reference...
  return ins;
  }

Good luck!
Pages: 12