Clean a polluted file. Unexpected results

Hi.

I have a text file that have been polluted with strange characters like Æ’à and codes like '\0' '\x1' '\x19' by another application.

I've built an app that is supposed to clean those files but I get unexpected results.

Here are the part of my code that does that

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  const string Stringbase = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'";

string ReplaceAscii(string x)
{
	string tmp ;
	string Char;

	for (unsigned int i = 0; i <= x.length()-1; i++)
	{
		Char = x.at(i);
				
		if (Stringbase.find(Char) != string::npos)
		{
			tmp += Char;
		}
	}
	return tmp;
}


That function receive a complete line of the actual file to be cleaned then is supposed to return a cleaned string.

The problem is that if I read the original file, I get strange results like char ™ is treated as "= and those chars are copied to the return string. But if I cut and paste the entire content in a new text file, everything is working properly, the ™ char is being removed from the string.

I'm working with VS2015 but I tried with Geany with same results.
You may want to try working with the character's numeric values instead of the representation.

Perhaps something like:
1
2
3
for( auto it : x)
   if(static_cast<int>(it) > 31 && static_cast<int>(x) < 127)
      tmp += it;
Nope. Same behavior. This must have something to do with file encoding but I can't pinpoint the problem.
Topic archived. No new replies allowed.