Problem with tokenizer

Hey guys, i been trying to tokenize a string using the boost library.



1
2
3
4
5
6
7
8
int main(){
   using namespace std;
   using namespace boost;
   string s = "This is,  a test";
   tokenizer<> tok(s);
   for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
       cout << *beg << "\n";
   }



The problem i am facing currently is that i am unable to tokenize a string, yet allow it to leave the whitespaces as token also. I tried replacing the whitespaces with a character , but how am i able to tokenize the string. For example , i subtituted all the white spaces with "|" , thus the string above will be "This|is,||a test".

From here , how can i go about making sure that the string will continue being tokenized only leaving the words , but "|" will not be considered as a char to be removed. Hope for some help here.
Last edited on
Look at this example from the boost site:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// char_sep_example_1.cpp
#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>

int main()
{
  std::string str = ";;Hello|world||-foo--bar;yow;baz|";
  typedef boost::tokenizer<boost::char_separator<char> > 
    tokenizer;
  boost::char_separator<char> sep("-;|");
  tokenizer tokens(str, sep);
  for (tokenizer::iterator tok_iter = tokens.begin();
       tok_iter != tokens.end(); ++tok_iter)
    std::cout << "<" << *tok_iter << "> ";
  std::cout << "\n";
  return EXIT_SUCCESS;
}
You need to define the boost::char_separator<char> sep(" ");.

I don't think that you need to substitute the spaces (btw: withe spaces contains more: like tab ('\t') and end of line)
Topic archived. No new replies allowed.