need help with my lexer.

Pages: 12
closed account (Dy7SLyTq)
so im building a lexer for my language using regular expressions (i know i keep jumping around on stages of my compiler but thats because they havent really taught it as well as udacity has imo so im starting over and just use the purple dragon book for reference.). im new to regular expressions in general but picked them up pretty quickly in python. they seem to be a bit more tricky in c++ however. whats wrong with my code?

main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#include <iostream> //standard i/o streams
#include <fstream>  //standard file i/o stream
#include <string>   //basic_string(char) containter
#include <vector>
#include <regex>

using std::ostream;
using std::cout;
using std::cerr;
using std::endl;
using std::cin;
using std::ifstream;
using std::string;
using std::vector;
using std::regex;
using std::regex_search;
using std::smatch;

class Token
{
     string Type, Name;
     int LineNo, LineCol;

     public:
          Token(string type, string name, int lineno, int linecol)
               : Type(type), Name(name), LineNo(lineno), LineCol(linecol) {}
          Token() {}

          void SetType   (string type) { Type    = type;    }
          void SetName   (string name) { Name    = name;    }
          void SetLineNo (int lineno)  { LineNo  = lineno;  }
          void SetLineCol(int linecol) { LineCol = linecol; }

          string GetType   () { return Type;    }
          string GetName   () { return Name;    }
          int    GetLineNo () { return LineNo;  }
          int    GetLineCol() { return LineCol; }
};

ostream& operator<<(ostream &out, Token token)
{
     out<<"("<< token.GetType() <<", '"<< token.GetName() <<"', "<< token.GetLineNo() <<", "<< token.GetLineCol() <<")";
     return out;
}

void          ReadInFile (ifstream&, vector<string>&);
vector<Token> Lex        (vector<string>&);

int main(int argc, char *argv[])
{
     ifstream File(argv[1]);
     vector<string> FileContents(1);

     ReadInFile(File, FileContents);

     vector<Token> TokenList = Lex(FileContents);

     for(auto &Counter : TokenList)
          cout<< Counter << endl;
}

void ReadInFile(ifstream &File, vector<string> &FileContents)
{
     int Counter = 0;

     while(getline(File, FileContents[Counter++]))
          FileContents.resize(FileContents.size() + 1);
}

vector<Token> Lex(vector<string> &FileContents)
{
     smatch Match;
     vector<Token> TokenList;
     int LineNo = 1;

     for(auto &Counter : FileContents)
     {
          if(regex_search(Counter, Match, regex("function")))
               TokenList.push_back(Token("FUNCTION", "function", LineNo, Match.position()));

          LineNo++; 
     }

     return TokenList;
}


hello.jd
1
2
3
4
5
6
7
8
//all of this is filler except for the function keyword
//thats all i want to focus on finding right now
import sysio;

function main(var args[])
{
     println "Hello, world!" & end;
}


with this being the output:

dtscode@dtscode-Latitude-E6410:~/Desktop/Jade$ g++ main.cpp -o jade -std=c++11
dtscode@dtscode-Latitude-E6410:~/Desktop/Jade$ ./jade hello.jd
dtscode@dtscode-Latitude-E6410:~/Desktop/Jade$


it should print each function keyword. but its not for some reason
Can't see anything obviously wrong.

And when I run your code I get

(FUNCTION, 'function', 1, 39)
(FUNCTION, 'function', 5, 0)


as expected??

Have you stepped though your code?

Andy
Last edited on
closed account (Dy7SLyTq)
how would i do that? with a debugger?
Yes

Or you could just add a bit of logging...

Andy

PS out of interest, does

./jade ./hello.jd

work any better?
closed account (Dy7SLyTq)
it doesnt actually :(. anyways... whats logging? sorry im new to all of these error finding techniques. i would just use cout<< tests.
closed account (N36fSL3A)
That's what logging is.
Well, by "logging" I did just mean couts in this case.

Your code just ran for me, but I ran it on Windows. That's why I wondered if you might need the ./

Andy
closed account (N36fSL3A)
This isn't exactly on topic, but what's going to be the name of this language? Also, how many lines long is this so far?
I'd say it is a language he is creating since he is asking about a lexer. I'm assuming it is jade, just from his code chunks there.
I've repro-ed your prob.

I was running with Visual C++ before, ok. But GCC doesn't like the regex call for some reason...

Andy
Last edited on
AFAIK std::regex still isn't supported in GCC http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011

use boost::regex
Last edited on
Thanks, explains the problem.

Just tried a really simple test, which failed, and had spotted people in the newgroups talking about lack of regex support in GCC. But I mistook it to be out of date news...

Andy
Last edited on
closed account (Dy7SLyTq)
damn it... oh well. @lumpkin/bxspecter: yes its going to be jade (not JADE; i had already come up with jade before i saw that and i dont want to change it). and since im only on the lexer its not very long. only 85 lines.

i thought gcc had full 11 support and visual didnt fully support it. anyways... ill attempt to install boost and use it. how similar is boost regex to <regex>?
closed account (Dy7SLyTq)
so i downloaded the latest boost tar.gz, but where would i put it if i wanted it to be included on the standard search path ie i could do
#include <boost/any.hpp>

edit: nvr mnd. im just going to use /usr/local/include
Last edited on
closed account (Dy7SLyTq)
does anyone know a good tutorial to install boost on linux? the boost.org one isnt detailed enough
closed account (1yR4jE8b)
Are you using a Debian derivative?

sudo apt-get install libboost-all-dev

Done.
closed account (Dy7SLyTq)
thanks that worked
closed account (Dy7SLyTq)
how do i get column position? im looking through the boost.regex documentation and can't find it
how do i get column position?

Are you referring to Match.position() ? If so, that's the same.

Or something else?

Andy

match_results
http://www.boost.org/doc/libs/1_54_0/libs/regex/doc/html/boost_regex/ref/match_results.html
DTSCode wrote:
i thought gcc had full 11 support and visual didnt fully support it.

GCC is ahead in core C++11 language implementation, but Microsoft is (and has been for the last three years) well ahead as far as the C++11 library implementation is concerned.

GCC library status: http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011
Pages: 12