String tokenizing basics

I want to add some functionality to my program.

My program goes through a loop of taking input a line at a time.

 
while(std::getline(std::cin, line))


The input is a G++ compiler error, along the lines of:


ostreamoverloading.cpp:27:1: error: ‘friend’ used outside of class
 friend std::istream &operator>> (std::istream& is, Obj& o)
 ^
ostreamoverloading.cpp: In function ‘std::istream& operator>>(std::istream&, Obj&)’:
ostreamoverloading.cpp:9:17: error: ‘std::string Obj::output’ is private
     std::string output;
                 ^
ostreamoverloading.cpp:30:24: error: within this context
     std::getline(is, o.output;
                        ^
ostreamoverloading.cpp:30:30: error: expected ‘)’ before ‘;’ token
     std::getline(is, o.output;


As of now, to get the relevant data I need, I am using:

if(line.find("error:") != std::string::npos)

I want to start tokenizing the input, but I'm struggling to find good material that can help me delimit using various delimiters, my desired tokenizing is:


ostreamoverloading.cpp:27:1: error: ‘friend’ used outside of class
friend std::istream &operator>> (std::istream& is, Obj& o)
 ^
ostreamoverloading.cpp: In function ‘std::istream& operator>>(std::istream&, Obj&)’:

ostreamoverloading.cpp:9:17: error: ‘std::string Obj::output’ is private
     std::string output;
                 ^
ostreamoverloading.cpp:30:24: error: within this context
     std::getline(is, o.output;
                        ^
ostreamoverloading.cpp:30:30: error: expected ‘)’ before ‘;’ token
     std::getline(is, o.output;


---------------------------------------------------------------------
ostreamoverloading.cpp
27
error:
‘friend’ used outside of class
In function
‘std::istream& operator>>(std::istream&, Obj&)’
---------------------------------------------------------------------
ostreamoverloading.cpp
error:
‘std::string Obj::output’ is private
30
within this context
std::getline(is, o.output;
---------------------------------------------------------------------
ostreamoverloading.cpp
30
error:
expected ‘)’ before ‘;’ token
---------------------------------------------------------------------


The above tokenizes the output into 3 records of variable tokens. As you can see some of the tokens are similar, but not always in the same order depending on how to compiler output is formatted.

Ideally after tokenization, I would like the above tokens to be formatted as:


ostreamoverloading.cpp ---------------------------------------------


error: ‘friend’ used outside of class In function ‘std::istream& operator>>(std::istream&, Obj&)’on line 27

error: ‘std::string Obj::output’ is private within this context std::getline(is, o.output; on line 30

error: expected ‘)’ before ‘;’ token on line 30


NextFileErrors.h-----------------------------------------------------

error: ...


Is anyone willing to give me some examples of how you would attempt this?
I haven't paid close enough attention to know exactly how to parse g++ compiler output, but lines do typically follow the format:

    file ':' line ':' column ':' code ':' description
or
    file ':' continued-description
or
[tt]    ' '+ stuff-to-ignore


You can grab yourself a string-split algorithm (like one of these http://www.cplusplus.com/faq/sequences/strings/split/) to split the line on colons ':'.

If the fourth element exists and is "error" then you've found a line to begin getting information from.

Hope this helps.

Oh, before I forget, there do exist programs already that help you with compiler output in some sophisticated ways

Perhaps the best is
http://www.bdsoft.com/tools/stlfilt.html

Others include just better coloring to help see the message better
http://www.mixtion.org/gccfilter/
http://schlueters.de/colorgcc.html

Hope this helps.
Something along these lines, perhaps:

test.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <iostream>
#include <string>

struct A
{
    private: int v ;

};

friend void foo( A& a )
{
    a.v = 78 ;
}

int main()
{
    const int i = 0 ;
    ++i ;
    
    ( i + 3 ; 
}

http://coliru.stacked-crooked.com/a/4e15a6cbcbf5852d

filt.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <string>

int main()
{
    std::string line ;
    while( std::getline( std::cin, line ) )
    {
        const auto pos = line.find( "In function" ) ;
        if( pos != std::string::npos ) std::cout << '\n'<< line.substr(pos) << '\n' ;
        else 
        {
            const auto pos_ln = line.find(':') ;
            
            if( pos_ln != std::string::npos )
            {
                const auto pos_col = line.find( ':', pos_ln+1 ) ;
                std::cout << line.substr(0,pos_ln) << " line " << line.substr( pos_ln+1, pos_col - pos_ln  - 1 ) 
                          << " col " << line.substr( pos_col+1 ) << '\n' ;
            }
            
            else std::cout << line << '\n' ;
        }
    }
}

ln -s /Archive2/4e/15a6cbcbf5852d/main.cpp test.cpp
g++ -std=c++14 -O2 -Wall -Wextra -pedantic-errors -ofilt main.cpp
echo . && echo . && g++ -std=c++14 -O2 -Wall -Wextra -pedantic-errors -c -fdiagnostics-show-location=once -fno-diagnostics-show-caret -fno-diagnostics-show-option test.cpp 2>&1 | ./filt
.
.
test.cpp line 10 col 1: error: 'friend' used outside of class

In function 'void foo(A&)':
test.cpp line 6 col 18: error: 'int A::v' is private
test.cpp line 12 col 7: error: within this context

In function 'int main()':
test.cpp line 18 col 7: error: increment of read-only variable 'i'
test.cpp line 20 col 13: error: expected ')' before ';' token
test.cpp line 20 col 13: warning: statement has no effect

http://coliru.stacked-crooked.com/a/1e1df2667d7cf17a
Yes, I'd had figured I wasn't the first to come up with it. Thing is I want to make some useful tools for myself for learning, and to try and have some material to help me get into university. My marks don't quite cut it, so I'm hoping some initiative will help.

Anyway, in response to your instruction:

I've been looking into this, I haven't slept all night so my mind isn't absorbing information as it should be. I'm hoping after a sleep I'll be feeling refreshed and ready to tackle it from a new perspective. I'll have to decipher which algorithm will suit this kind of work.

My main concern is splitting the text into tokens from a variable number of lines. If I knew every time an error occurred and it was all on the one line delimited by a colon, it would be a walk in the park.

The first one of your token lists is common for all errors and warnings, my problem is depending on the error, it may have an extra 2,3, even 4 to 5 lines of extra information. Something I can't know before the compiler is ran and directing output to this program.

I've never attempted tokenizing before, what I have works, it works okay. I feel like being able to tokenize is a way to allow myself to better format the output.

This is my current code: https://github.com/megatron-/BetterErrors

If you can think of any improvements, no matter how big, small or pedantic I want to know.

Thanks for your reply.
I haven't looked at the code, but the trick is to understand that there is no fixed number of lines per error.

You have to simply handle each line as it comes. If it starts a new error message, then start deciphering a new error. Otherwise, continue with the last error.

Hope this helps.
-fmessage-length=n
Try to format error messages so that they fit on lines of about n characters. The default is 72 characters for g++ and 0 for the rest of the front ends supported by GCC.

If n is zero, then no line-wrapping is done; each error message appears on a single line.

-fno-diagnostics-show-caret
By default, each diagnostic emitted includes the original source line and a caret '^' indicating the column. This option suppresses this information.
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Language-Independent-Options.html#Language-Independent-Options
Topic archived. No new replies allowed.