need help with a lexing algorithim

closed account (Dy7SLyTq)
so after many months, i finally have a working symbol table that is tested for the important parts (ie the storing of data correctly. user friendly access to elements will be implemented later) and a "working" lexer. i say it like that because it can only read strings. i tried to put in something to read identifiers (ie [_a-zA-Z][_a-zA-Z0-9]*) but it didnt work because my code chops off everything upto and including the match. so if someone could explain to me based on my code how to fix this, that would be great. my end goal would be to first have it grab all strings so that way if it has a keyword in it, it wont get picked up, then keywords, then operators, then identifiers. here is my code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
#include     <iostream>
#include       <string>
#include       <vector>
#include          <map>

#include <boost/regex.hpp>

#include "Token.hpp"
#include "Lexer.hpp"
#include "Debug.hpp"

using std::         cout;
using std::         endl;
using std::       string;
using std::       vector;
using std::          map;

using boost::           regex;
using boost::          smatch;
using boost:: match_flag_type;
using boost::   match_default;
using boost::match_prev_avail;
using boost::   match_not_bob;

Lexer::Lexer(map<string, vector<string>> &RawSource)
{
    for(auto &Counter : RawSource)
        for(auto &SubCounter : Counter.second)
            this->Source += SubCounter + " ";
}

Lexer::Lexer(vector<vector<string>> &RawSource)
{
    for(auto &Counter : RawSource)
        for(auto &SubCounter : Counter)
            this->Source += SubCounter + " ";
}

Lexer::Lexer(vector<string> &RawSource)
{
    for(auto &Counter : RawSource)
        this->Source += Counter + " ";
}

Lexer::Lexer(string &RawSource)
    : Source(RawSource) {}

Lexer::Lexer(Lexer &Temp)
{
    this->SymbolTable = Temp.SymbolTable;
    this->Source      = Temp.Source;
}

Lexer& Lexer::operator=(const Lexer &LHSLexer)
{
    this->SymbolTable = LHSLexer.SymbolTable;
    this->Source    = LHSLexer.Source;
}

Lexer::~Lexer() { delete this->SymbolTable; }

void Lexer::StartLex() { this->Lex(); }

void Lexer::Lex()
{
    string::const_iterator Start = this->Source.begin(),
                           End   = this->Source.end  (); 
    smatch                 Match;
    match_flag_type        Flags = match_default;

    while(!this->Source.empty())
    {
        if(regex_search(Start, End, Match, regex("\"[^\"]*\""), Flags))
            this->SymbolTable->PushBack("STRING", string(Match[0].first, Match[0].second), 0, -1);

        else if(regex_search(Start, End, Match, regex("import|function|println|end"), Flags))
            this->SymbolTable->PushBack("KEYWORD", string(Match[0].first, Match[0].second), 0, -1);

        this->SymbolTable->Next();
        Start = Match[0].second;
        Flags |= match_prev_avail;
        Flags |= match_not_bob;
        this->Source = string(Start, End);
    }
}

TokenList* Lexer::GetSymbolTable    () { return this->SymbolTable;   }
string     Lexer::GetFormattedSource() { return this->Source;        }
void       Lexer::PrintSource       () { cout<< Source << endl;      }
void       Lexer::PrintSymbolTable  () { this->SymbolTable->Print(); }


i apologize that it is only a snippet. its split over multiple files so if you need to what something is please ask
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Lexer::Lexer(Lexer &Temp) //¿why isn't taking a const reference?
{
    this->SymbolTable = Temp.SymbolTable;
    this->Source      = Temp.Source;
}

Lexer& Lexer::operator=(const Lexer &LHSLexer)
{
    //¿do you have another members?
    this->SymbolTable = LHSLexer.SymbolTable;
    this->Source    = LHSLexer.Source;
}

Lexer::~Lexer() {
    delete this->SymbolTable; //possible double delete
}
Your memory management is incorrect. Your copy-constructor and assignment operator seems to be the ones provided by the compiler (except for that non-const reference)

1
2
Lexer::Lexer(string &RawSource) //¿why isn't a const reference?
    : Source(RawSource) {}



> regex_search(Start, End, Match, regex("\"[^\"]*\""), Flags)
I guess that's supposed to be lazy, ¿is it?
"hello" brave new "world" ¿where should the match end?
edit: never mind, I misread the regex

> i tried to put in something to read identifiers (ie [_a-zA-Z][_a-zA-Z0-9]*)
I do not see such a thing in the code that you posted
Last edited on
closed account (Dy7SLyTq)
I'm sorry I forgot to put in the const. the regex works fine. That not the issue. I have all of the regexes I need but I don't know where to put it in the code
Last edited on
closed account (Dy7SLyTq)
here is the updated lexer function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void Lexer::Lex()
{
    string::const_iterator Start = this->Source.begin(),
                           End   = this->Source.end  (); 
    smatch                 Match;
    match_flag_type        Flags = match_default;

    while(!this->Source.empty())
    {
        if(regex_search(Start, End, Match, regex("\"[^\"]*\""), Flags))
            this->SymbolTable->PushBack("STRING", string(Match[0].first, Match[0].second), 0, -1);

        else if(regex_search(Start, End, Match, regex("import|function|var|println|end"), Flags))
            this->SymbolTable->PushBack("KEYWORD", string(Match[0].first, Match[0].second), 0, -1);

        else if(regex_search(Start, End, Match, regex("[;\(\)\[\]\{\}&]"), Flags))
            this->SymbolTable->PushBack("OPERATOR", string(Match[0].first, Match[0].second), 0, -1);

        else if(regex_search(Start, End, Match, regex("[_a-zA-Z][_a-zA-Z0-9]*"), Flags))
            this->SymbolTable->PushBack("IDENTIFIER", string(Match[0].first, Match[0].second), 0, -1);

        this->SymbolTable->Next();
        Start = Match[0].second;
        Flags |= match_prev_avail;
        Flags |= match_not_bob;
        this->Source = string(Start, End);
    }
}

the only other changes i made was to add const were appropriate
Last edited on
closed account (Dy7SLyTq)
bump
Topic archived. No new replies allowed.