Removing White Spaces


Guys I'm doing an Scanner for a compiler so i'm testing the basic read, deleting and write in a separate file. This piece of code is to test the erasing of a white space when it checks it but it's not working.
The input file is:
Hello World
and the output is
Hello rld

This is the code
javascript:editbox1.editTag('code')
while (!file.eof()) {

//Receives the string and put it in the vector
getline(file, temp);
line.push_back(temp);
c_char = 0;

for (char& c : line[i]) {
if (c == ' ') {

//Erases the character wanted
line[i].erase(c_char, 1);
c_char--;
continue;
}
//Writes the line char by char
outfile << c;

c_char++;
}
outfile << endl;
i++;
}
Last edited on
Using erase with big strings is very inefficient.
Using eof() on streams doesn't work as expected.
Writing a string char by char is also inefficient.
Removing spaces from a string is a common task so better put it into a function so it's easier to reuse in a other project.
Have a look at a different approach.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
string remove_spaces(const string& s)
{
  string retval;
  retval.reserve(s.size()); // prevents memory allocations which are expensive

  for (char ch: s)
    if (ch != ' ')
      retval += ch;

  return retval;
}

int main()
{
  // no error checking for brevity
  ifstream src(__FILE__);
  ofstream dest("output.txt");
  string tmp;
  vector<string> lines;

  while(getline(src, tmp))
  {
    string line = remove_spaces(tmp);
    lines.emplace_back(line);
    dest << line;
  }
}
would remove_if work here, or anything else built-in?
seems like something we shouldn't have to write.
Would remove_if work here, or anything else built-in?

Yes. A solution using std::copy_if() works too, with the potential advantage of being less-constrained (The source iterators must merely satisfy InputIterator, instead of ForwardIterator for remove_if().) This means we can iterate the streams directly:

1
2
3
4
5
6
7
std::ifstream input{__FILE__}; 
std::ofstream output{"output.txt"};

using it = std::istream_iterator<char>;
// bind a locale as appropriate
auto const not_space = [] (auto c) { return !std::isspace(c); }; 
std::copy_if(it{input}, it{}, std::ostream_iterator<char>{output}, not_space);
That is handy. Thanks!
> I'm doing an Scanner for a compiler

We can't just remove (or ignore) all white spaces from the input indiscriminately;
white spaces are an integral part of the grammar. For instance these two are completely different constructs:
1
2
static const int i = 23 ;
staticconstinti=23;

There are situations where one or more white spaces is optional, where they can (should) be skipped.
For instance these two mean the same:
1
2
for ( i = 0 ;    i < 10 ; ++ i )  { std::cout   << i << '\n' ; }
for(i=0;i<10; ++i ){std::cout<<i<<'\n';}


In general, can use just use formatted input (eg. char c ; stm >> c ; which skips leading white spaces) in places where know that the context is appropriate.

We need to tokenise the input.
Something along these lines, perhaps (this is just a trivialised outline to illustrate one possible approach):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <iostream>
#include <deque>
#include <string>
#include <cctype>
#include <sstream>
#include <algorithm>
#include <iomanip>

struct token {

    // TO DO: handle literal strings, chars etc.
    enum type_t { KEYWORD, IDENTIFIER, INTEGER, FLOAT, OPERATOR, TERMINAL };

    std::string value ;
    type_t type = IDENTIFIER ;

    explicit operator bool () const { return type != TERMINAL ; }
    bool operator! () const { return !bool(*this) ;}

    // TO DO: allow operators of more than one character eg <= or ++
    // ie. std::vector<std::string> operators ;
    static const std::string operators ;

    static const std::deque<std::string> keywords ;

    friend std::ostream& operator<< ( std::ostream& stm, const token& t ) { // for debug support

        switch( t.type )
        {
            case token::KEYWORD : return stm << t.value << " (KEYWORD)" ;
            case token::IDENTIFIER : return stm << std::quoted(t.value) << " (IDENTIFIER)" ;
            case token::INTEGER : return stm << t.value << " (INTEGER)" ;
            case token::FLOAT : return stm << t.value << " (FLOAT)" ;
            case token::OPERATOR : return stm << "'" << t.value << "' (OPERATOR)" ;
            default: return stm << "(TERMINAL)" ;
        }
    }
};

const std::string token::operators = "+-*/%()[]{},;.<>=!?:";
const std::deque<std::string> token::keywords = { "const", "auto", "if", "for", "double", "int", "throw" } ; // etc.

struct token_stream {

    explicit token_stream( std::istream& input_stm = std::cin )
        : stm(input_stm) { stm >> std::skipws ; }

    token get()
    {
        if( tokens.empty() ) underflow() ;

        const token t = tokens.front() ;
        tokens.pop_front() ;
        return t ;
    }

    void put_back( token t ) { tokens.push_front(t) ; }

    std::istream& stm ;
    std::deque<token> tokens ;

    void underflow() {

        char c ;
        if( !( stm >> c ) )  { // input failed

            tokens.push_back( { {}, token::TERMINAL } ) ; // no more tokens
            return ;
        }

        static const auto isoper = [] ( char c ) { return token::operators.find(c) != std::string::npos ; } ;
        static const auto iskeyword = [] ( const std::string& str )
        { return std::find( token::keywords.begin(), token::keywords.end(), str ) != token::keywords.end() ; } ;

        if( std::isdigit(c) ) { // number (a simplistic implementation)

            stm.putback(c) ;
            long double n ;
            stm >> n ;
            
            // TO DO: refine this (use regular expressions?) 
            // this is terribly crude; for instance 3.0 would be parsed as an integer.
            // it doesn't take of things like 23.4f or 7LL (the 'f' and "LL" is not consumed)
            // also, with the current code, the number "-3" is parsed as two tokens: "-" and "32" (this may be acceptab;le)
            if( n == int(n) ) tokens.push_back( { std::to_string( int(n) ), token::INTEGER } ) ;
            else tokens.push_back( { std::to_string(n), token::FLOAT } ) ;
        }

        else if( isoper(c) ) // operator
            tokens.push_back( { {c}, token::OPERATOR } ) ;

        else { // identifier or keyword

            std::string str = {c} ;
            while( stm.get(c) && !std::isspace(c) && !isoper(c) ) str += c ;
            if(stm) stm.putback(c) ;
            tokens.push_back( { str, iskeyword(str) ? token::KEYWORD : token::IDENTIFIER } ) ;
        }
    }
};

int main() {

    const std::string input = " const auto closure = [] ( double value )\n    { if( value > 7.3 ) array[4][5] = -1.234e+2 ; } ;" ;
    std::cout << input << "\n\n" ;

    std::istringstream input_stm(input) ;
    token_stream tok_stm(input_stm) ;

    while( token t = tok_stm.get() ) std::cout << t << '\n' ;
}

http://coliru.stacked-crooked.com/a/20e1dabc63439c9d
Topic archived. No new replies allowed.