Reading text file segments

I have a text file which contains this exactly:
int Bob, 2you, john, Adam.Smith;
float taxYear=2013, taxRate=29.2.3;


I need to analyze this text file where I look at "int" "Bob" "2you" "Adam.Smith;"

Etc.. seperately and not as a huge line


How could I do this? I use
 
while (f >> ws, getline(f, line, ' ')) 


but this doesn't break the text file into the segments i need
There are a number of threads on this topic. Perhaps this is the most informative:
http://www.cplusplus.com/forum/general/187742/

The more information the better.

From that post, the required output is stated

So the correct output for my code should be:

int RESERVED WORD
Bob IDENTIFIER
, SPECIAL CHARACTER
2you INVALID
Adam.Smith INVALID
float RESERVED WORD
2013 INTEGER NUMBER
taxYear IDENTIFIER
taxRate IDENTIFIER
29.2.3 INVALID
; SPECIAL CHARACTER



What I don't understand in this output is why the comma after the word "Bob" is categorised as a SPECIAL CHARACTER, while the other three commas are ignored. Similarly there are two semicolons, but only one of them is flagged as a special character.

It is pretty much impossible to suggest anything with such a wavering set of requirements. I think you need to clearly state what it is you are trying to do.


Though as an aside, this code while (f >> ws, getline(f, line, ' ')) could be replaced with while (f >> line) which has the same effect. But that doesn't move any closer to a solution.
Here's a suggestion. Probably best put into a separate function.

Start with an empty string which will hold the word.
Read a single character at a time.
If it is any of these special characters, "\n\t ,;=" it marks the end of the word.
Otherwise add the character to the end of the current word.

You now have two items, the word, and the special character which marked its end.

Do something with those two items and repeat the whole process.

This is one way to write such a function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>

bool getWord(std::istream & is, std::string & word, char & ch)
{
    const std::string ending = "\n\t ,;=";
    word = "";
    ch = ' ';

    is >> std::ws;

    while (is.get(ch) && (ending.find(ch) == std::string::npos))
        word += ch;

    return is;
}

int main()
{
    std::ifstream fin("data.txt");

    std::string word;
    char delimiter;

    std::cout << std::left;
    while (getWord(fin, word, delimiter))
        std::cout << std::setw(20) << word << delimiter << '\n';
}

Output:
int
Bob                 ,
2you                ,
john                ,
Adam.Smith          ;
float
taxYear             =
2013                ,
taxRate             =
29.2.3              ;
Last edited on
Ty for your input!


Basically, my program has to mimic a lexical analyzer, where I have to go through each segment of the text file, and determine whether what I'm reading in is a reserved word, identifier, integer, special character

I'm having trouble understanding your getWord function but I'll figure it out, ty so much!
So far my code looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <iostream>
#include <fstream>
#include <string>
#include <stack>
#include <algorithm>

using namespace std;

const string text = "Data.txt";

void readFile(string text, stack<string> &resv, stack<string> &iden, stack<int> &intg, stack<float> &real, stack<char> &spec);
void printFile(stack<string> resv, stack<string> iden, stack<int> intg, stack<float> real, stack<char> spec);

int main()
{
	stack<string> RESV;
	stack<string> IDEN;
	stack<int> INTG;
	stack<float> REAL;
	stack<char> SPEC;
	cout << "Opening text file to store data" << endl;
	readFile(text, RESV, IDEN, INTG, REAL, SPEC);

system("PAUSE");
return 0;
}
void readFile(string text, stack<string> &resv, stack<string> &iden, stack<int> &intg, stack<float> &real, stack<char> &spec)
{
	string line;
	char buffer[500];
	ifstream f(text);
	if (f.is_open())
	{
		while (f >> ws, getline(f, line)) //eliminate leading white space, read line per line until hitting comma
		{
			for (int i = 0; i <= line.length(); i++)
			{
				if (isalpha(line[i])) //if letter 
				{
					if (isalnum(line[i])) //if letter or digit
					{
						buffer[i] = line[i];
						string test(buffer);
						if (test == "int" || test == "float")
						{
							cout << test << "RESERVED" << endl;
							resv.push(test);
						}
						else if (line[i + 1] == ' ' || line[i + 1] == ',' || line[i + 1] == ';' || line[i + 1] == '=') //if we look ahead and hit white space or special
						{
							cout << test << "IDENTIFIER" << endl;
							iden.push(test);
						}
					}
					else
					{
						cout << "INVALID" << endl;
					}
				}
				if (line[i] == ',' || line[i] == '=' || line[i] == ';')
				{
					cout << line[i] << "SPECIAL CHARACTER" << endl;
					spec.push(line[i]);
				}
				if (isdigit(line[i]))
				{
					buffer[i] = line[i];
					if (line[i + 1] == ' ' || line[i + 1] == ',' || line[i + 1] == ';' || line[i + 1] == '=')
					{
						int buff = atoi(buffer);
						cout << buff << "INTEGER" << endl;
						intg.push(buff);
					}
				}
			}	
		}
		
	}
	f.close();

}



But when I run it, I think because of my buffer array a lot of garbage gets outputted and it says everything is an identifier
I'm having trouble understanding your getWord function


Basically it follows the brief description I gave earlier,
Chervil wrote:
Start with an empty string which will hold the word.
Read a single character at a time.
If it is any of these special characters, "\n\t ,;=" it marks the end of the word.
Otherwise add the character to the end of the current word.

You now have two items, the word, and the special character which marked its end.


Initially I wrote the same function in a longer, more verbose style, and then rewrote it more concisely. It could be the concise nature of the code which is initially hard to read - though I tend to prefer shorter code. If there is anything you need to be explained further, please ask. (Also I think there is a bug in my code, it doesn't show up with the test file in use, but different test data could show a possible problem).

I should also recommend that you consider the work done by Thomas1965 in this post, http://www.cplusplus.com/forum/general/187742/#msg913396

The approach is somewhat different to mine, but there is something to be learned from studying different approaches to a problem.

As for your own latest code, I only just started to look at it - not had time to say anything yet.

Edit: the character buffer char buffer[500]; declared at line 30 is misused. That is, the contents are interpreted as though it was a c-string. However there is no null terminating character at line 43 or at line 70. You must either store a zero after the last character, or perhaps safer, use a std::string instead and append characters to it.
Last edited on
Topic archived. No new replies allowed.