Extract comments from a C file.

As the title suggests, I need to extract comments from a C file, which are usually marked with " /* This is a comment */ ". It seems to me that I need to calculate first at what position is the / and then ask it if on the very next position to the / operand is the *, if it is then I need check where is the next * and if / operand is immediately next to it. At last I need to take everything between values that the first and second * have.
But I don't know how to write that in code.

I've forgot to mention, this was supposed to be done in c++.
Last edited on
Here is C code to remove all comments I wrote back in high school. Be wary, that it is not an example of good style and might contain bugs (Escaped characters and quotes in character literals are sure ones)

You can look how comment detection was implemented and just reverse process to remove everything but the comments.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <stdio.h>
#define  DEFAULT  0
#define  BLOCK_COMMENT  1
#define  LINE_COMMENT  2
#define  STRING  3

int main()
{
	int c;
	int mode = 0;

	while((c=getchar())!=EOF)
	{
		switch(mode)
		{
		case DEFAULT:  
			if (c == '\"')  
			{
				mode = STRING; 
				putchar(c);
			}
			else if (c == '/')  
			{
				c = getchar(); 
				if (c == '/') mode = LINE_COMMENT; 
				else if (c == '*') mode = BLOCK_COMMENT; 
				else putchar('/'), putchar(c);
			}
			else putchar(c); 
			break;
		case BLOCK_COMMENT
			if (c == '*') if ((c=getchar()) == '/') mode = DEFAULT;
			break;
		case LINE_COMMENT:
			if (c == '\n') mode = DEFAULT, putchar('\n');
			break;
		case STRING: 
			if (c == '\\') putchar(c),putchar(getchar()); 
			else if (c == '\"') mode = DEFAULT, putchar(c); 
			else putchar(c); 
			break;
		}
	}
}
i havent tested this, and its not perfect (there are a few holes such as it doesnt test to see if there was no */), but this should be a good start:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <iostream>
#include  <fstream>
#include   <string>
#include   <vector>

int main(int argc, char *argv[])
{
    std::istream *Stream;

    switch(argc)
    {
        case 1:
            Stream = std::&cin;
            break;

        default:
            Stream = new std::ifstream(argv[1]);
            break;
    }

    std::vector<std::string> Comments;

    while(*Stream)
    {
             char   Current = Stream->get();
        std::string CommentValue = "";

        switch(Current)
        {
            case '"':
                while((Current = Stream->get()) != '"');
		break;

            case '/':
                CommentValue += Current;

                if(Stream->peak() == '*')
                {
                    while((Current = Stream->get()) != '*' && Stream->peak() != '/')
                        CommentValue += Current;

                    CommentValue += "*/";
                    Comments.push_back(CommentValue);
                }

                break; 
        }
    }
}


edit: theres a memory leak in there too
Last edited on
@Little Bobby Tables
Use references:
1
2
std::ifstream file;
std::istream& Stream = (argc == 1 ? std::cin : (file.open(argv[1]) , file ));
why though?
Topic archived. No new replies allowed.