• Forum
  • Lounge
  • Criticise/Improve - Some kind of parser/

 
Criticise/Improve - Some kind of parser/interpreter

Haven't been on much lately due to a dead family member.

I've been coding trying to distract myself, and have been coming up with projects I can go through to teach C++ proficiently.


I have a, well, I've called it a "tokenizer" but to be frank I don't know what it is, but it's doing what I expect it to do. I give it commands and arguments, and it executes them. The commands can also be written in a file and by using the command "Run filename" you can run the contents.


I feel it's awfully long winded for what it does, I'm not sure of it so I thought I'd leave it here over-night and try and get some feedback. Coming back to a re-vamped core logic, would be nice if you have the time of course.


The instruction syntax is like running a program through bash:

program arg1 arg2 ... argn

except instructions are separated by a semi-colon(I think bash is the same?)

So you could write:

program1 arg; program2 arg1 arg2; program3; ...

The script file is identical to the command line, or each instruction on a new line. Just make sure to end it with a semi-colon.



So here it is....


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
#include <iostream>
#include <string>
#include <vector>
#include <fstream>

// I thought it easier to dump information into a struct
// Rather than std::vector< std::vector<std::string> >
struct instruction
{
	std::string command;
	std::vector<std::string> args;
};

// Function splits a string into chunks by specify a delimiter,
// The function returns a vector of strings, these strings are instructions
std::vector<std::string> Split(const std::string str, const std::string delimiter, bool StripWS = 1)
{
	std::vector<std::string> result;

	// Remember positions of delimiters
	size_t pos = 0, prev = pos;

	pos = str.find_first_of(delimiter, pos);
	
	// If a delimiter was found
	while(pos != std::string::npos)
	{
		// Copy the token and store it
		std::string token = str.substr(prev, pos-prev);
		result.push_back(token);

		//If a whitespace is next to delimiter ignore it, i.e command; command2
																	//^
		if(StripWS && (str[pos+1] == ' ' || str[pos+1] == '\n')) pos++;

		//Move past the delimiter
		pos++;
		// make prev equal start of next instruction
		prev = pos;
		//Attempt to find next delimiter
		pos = str.find_first_of(delimiter, pos);
	}
	//Cut off last token
	std::string token = str.substr(prev, pos-prev);
	// If no information discard it
	if(token != "" && token != " " && token != "\n") result.push_back(token);

	// String is split into instructions, done
	return result;
}

// This function, takes a vector of strings(The instructions) and splits them into an instruction struct
std::vector<instruction> Tokenize(const std::vector<std::string> instructions, const std::string delimiter)
{
	std::vector<instruction> result;

	// For each instruction in the vector
	for(auto& i : instructions)
	{
		instruction ins;
		// Split string into words
		std::vector<std::string> args = Split(i, delimiter);
		if(args.empty()) return result;
		//First argument is the command
		ins.command = args[0];
		//Anything else a command argument
		if(args.size() > 1)
			for(unsigned int j = 1; j < args.size(); j++)
				ins.args.push_back(args[j]);

		result.push_back(ins);
	}

	return result;
}

// processes instructions
int Call(instruction);

int Interpret(std::string s)
{
	// Split command into instructions
	std::vector<std::string> tokens = Split(s, ";");
	if(tokens.size() == 0) return 0;
	else
	{
		// Split instruction into commands and arguments seperated by whitespace
		std::vector<instruction> instructions = Tokenize(tokens, " ");
		for(unsigned int i = 0; i < instructions.size(); i++)
		{
				std::cout << "Command = " << instructions[i].command << "\t";
				for(unsigned int j = 0; j < instructions[i].args.size(); j++)
					std::cout << "Arg(" << j << ") = " << instructions[i].args[j] << "\t";
				std::cout << "\n";
				if(!Call(instructions[i])) return 0;
		}
	}
	return 1;
}

// Read a file and display it
void Read(std::string filename);

// Run a file as a script
void Run(std::string filename);

// Quit program, or stop script from running(like return)
void Quit();

int Call(instruction command)
{
	if(command.command == "Read")
	{
		if(command.args.size() > 1)
		{
			std::cout << "Read: Expected 1 argument (filename) received " << command.args.size() << "\n";
		}
		else Read(command.args[0]);
	}
	else if(command.command == "Run")
	{
		if(command.args.size() > 1)
		{
			std::cout << "Read: Expected 1 argument (filename) received " << command.args.size() << "\n";
		}
		else Run(command.args[0]);
	}
	else if(command.command == "Quit") return 0;
	return 1;
}

void Read(std::string filename)
{
	std::cout << "Attempting to open file: " << filename << "...\n\n";
	std::ifstream ifs(filename.c_str());
	std::string file = "";
	if(!ifs.is_open()) file = "Could not open file " + filename + "\n";
	else
	{
		while(ifs)
		{
			std::string temp;
			std::getline(ifs, temp);
			file += temp;
			file += "\n";
		}
	}
	ifs.close();
	std::cout << file << "\n\n";
}

void Run(std::string filename)
{
	std::cout << "Attempting to open file: " << filename << "...\n\n";
	std::ifstream ifs(filename.c_str());
	std::string file = "";
	if(!ifs.is_open()) file = "Could not open file " + filename + "\n";
	else
	{
		std::cout << "Reading file...\n\n";
		while(ifs.good())
		{
			std::string temp;
			std::getline(ifs, temp, '\n');
			file += temp;
		}
	}
	ifs.close();
	std::cout << "Translating...\n\n";
	if(!Interpret(file)) return;
}

int main()
{
	std::cout << "STSI: Script To Screen Interpreter - Version 0.0.1\n\n";
	while(1)
	{
		std::string command;
		std::cout << "STSI> ";
		std::getline(std::cin,command,'\n');

		if(!Interpret(command)) return 0;
	}
	return 1;
}
Why do you have both Split and Tokenize?

There are simpler approaches to tokenizing strings in C++:
http://stackoverflow.com/a/10051869/1959975 - hand-coded
http://stackoverflow.com/a/53863/1959975 - boost
http://stackoverflow.com/a/237280/1959975 - my favorite
Thanks for the feedback, LB...

I had Split() to split a string to a common delimiter(";"), which would then split a variable amount of "instruction" strings to a vector, the Tokenize() took that vector and processed it into a vector of instructions.

My plan was split strings by their start and end, then the Tokenize() could process the actual syntax. My original plan was going to be not splitting strings at all on pass, more finding the start and endpoint locations, working out any math, creating a string with the evaluations, then splitting it, if that makes sense. Though I'm struggling to wrap my head around this as it is.

I've been reviewing your links, looks interesting and have recently installed boost too.

Something I am wondering on though, the examples seem they cannot find matched brackets such as () {} and [], if I had them all together in a delimiter set, using something like find_first_of() would be useless right?

Would it be easier for me to parse character by character and introduce a look ahead, to figure out what token I am parsing?

Thanks in advance.
"Tokenize" and "Split", to me anyway, mean the same thing in this context, so it doesn't make sense to have them do two different things. One or the other needs to be renamed or removed.

Generally, you would want to split a string such as "hello(goodbye)" into tokens "hello", "(", "goodbye", and ")". In this case you want the delimiters in your output (except maybe spaces).
Topic archived. No new replies allowed.