Deleting Tricky Placed Comments in Source File

Hello,
I am new to these parts but have used this website and others for a long time to find help to questions I am looking for. I am trying to delete comments from a source file.

Now granted there are many topics on this and I have researched them, but these comments are placed tricky, in a way that some will be erased, but others will stay.

Here is part of the .txt file that the comments are in. If you run my function you will see that it erases all the comments like it should, but it also erases part of the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/******************************************************************

                         Computer Assignment x

Written by: John Doe                                 Date: Unknown


********//****************************//**************************/

#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>

#include <assert.h>
#include <string.h>

using namespace std;

#define DATA_FILE "file.txt"

const unsigned MAXWORDS = 1000; // maximum number of words /*in input stream
const unsigned MAXLEN   = 20;   // number of letters */in longest word
const unsigned LINESIZE = 5;    // number of cleaned words in a single line

unsigned num_words = 0;         /* number of words //in output list */
char word_name [ MAXWORDS ] [ MAXLEN + 1 ]; /* name of a word */
char prog_name1 [ ] = "// Begin Program Data //";
char prog_name2 [ ] = "/* End Program Data */";





This is my function. It copies chars from the input stream and writes them to a file using output stream.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  void process_data(std::ifstream& inputStream, std::ofstream& outputStream)
{
	std::string dataFromInputFile;

	std::getline(inputStream, dataFromInputFile, '\0');

	while (dataFromInputFile.find("/*") != std::string::npos)
	{
		size_t Beginning = dataFromInputFile.find("/*");
		dataFromInputFile.erase(Beginning, (dataFromInputFile.find("*/", Beginning) - Beginning) + 2);
	}

	while (dataFromInputFile.find("//") != std::string::npos)
	{
		size_t Beginning = dataFromInputFile.find("//");
		dataFromInputFile.erase(Beginning, dataFromInputFile.find("\n", Beginning) - Beginning);
	}

		outputStream << dataFromInputFile;

	
	close_files(inputStream, outputStream); 
}


Thank you for looking!

boosie
You can't eliminate the comments by eliminating each kind of comment syntax one at a time. You have to scan the input from beginning to end using a state machine to progressively parse it. Something like
1
2
3
4
5
6
7
8
9
10
11
12
13
state = none
foreach char c in input
    if state == none
        if c == '/'
            state = seen_slash
        else
            send_to_output(c)
    else if state == seen_slash
        if c == '*'
            state = in_multiline_comment
        else if c == '/'
            state = in_single_line_comment
(etc)

Also, note that single-line comments can actually occupy multiple lines:
1
2
3
4
5
// This is a comment. There are no characters following the backslash \
abort(); This is also part of the comment.
this_is_not_commented();
// This is a comment. There is a space following the backslash: \ 
this_is_not_commented();
(The syntax highlighter of this site doesn't work quite right.)
Last edited on
Keep track of the current comment state. Something along these lines:
Note: this does not give special treatment to comments which may appear within quoted strings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <iostream>
#include <fstream>

enum class comment_state { C, CPP, NEITHER };

comment_state begin_comment( std::istream& stm, char first )
{
    if( first == '/' )
    {
        if( stm.peek() == '*' ) return comment_state::C ; // start of C comment with /*
        else if( stm.peek() == '/' ) return comment_state::CPP ; /* start of C++ comment with // */
    }

    return comment_state::NEITHER ;
}

bool end_comment( std::istream& stm, char first, comment_state curr_state )
{
    if( curr_state == comment_state::C ) return first == '*' && stm.peek() == '/' ; // */
    else if( curr_state == comment_state::CPP ) return first == '\n' ; /* end of line */
    return false ;
}

int main()
{
    std::ifstream file( __FILE__ ) ; // open the file /* this file */ for input
    comment_state curr_state = comment_state::NEITHER ; // current comment state

    char c ;
    while( file.get(c) ) /* for each character including white space characters */
    {
        if( curr_state != comment_state::NEITHER ) /* in a comment right now either
                                                     either /* ... */// or // ...\n
        {
            if( end_comment( file, c, curr_state ) ) //* if the comment has ended *//
            {
                curr_state = comment_state::NEITHER ;
                if( c == '\n' ) std::cout << '\n' ; /* end of C++ comment // ; print the new line */
                else file.ignore(1) ; // end of C comment */ ; extract and discard the /
            }
        }

        else // not in either /* or // */ comment right now
        {
            curr_state = begin_comment( file, c ) ;
            if( curr_state == comment_state::NEITHER ) std::cout << c ;
        }
    }
}

http://rextester.com/ZIZNT70071
Hey Helios,

Thanks for the reply.

That makes sense. In this way it will know whether it is in a comment or not, and will not output while in a comment. I think this will work for the comment that has been giving me the issue too!

I will try your solution.

Thank you.
There are only really four states you need to care about:

  • in a double-quoted string "..."
  • in a single-quoted string '...'
  • in a multi-line comment
  • neither

It is entirely possible to design a source file that will trip that up, but you might consider that unlikely enough that you can ignore it for personal use.

[edit]
Heh, this was a fun project.

My automaton works with 'state' being the current state method:

 
  std::istream& (uncommentator::*state)();

Valid states:
1
2
3
4
5
  std::istream& normal();
  std::istream& double_quoted();
  std::istream& single_quoted();
  std::istream& single_line_comment();
  std::istream& multi_line_comment();

And the constructor/function body:
1
2
3
4
5
6
  uncommentator( std::istream& ins, std::ostream& outs ): 
    ins(ins), outs(outs), index(0), state(normal)
  {
    while ((*this.*state)())
      ;
  }

I like piped utilities, so:
1
2
3
4
int main()
{
  uncommentator( std::cin, std::cout );
}

LOL. Thanks to helios for a nice edge case that I would have forgotten otherwise.
Last edited on
Thank you @JLBorges & @Duthomhas, both of your solutions are also very helpful. I am working on this today and will let all of you know how it goes. Thank you for the help!

boosie

[edit]
I have it working now, the only thing left is to keep the comments inside of quotes within the source file. I am thinking of adding another flag to check if it is in between quotes.
Last edited on
Topic archived. No new replies allowed.