Optimization?

Hi all,
I'm taking programming lessons and was tasked to write a program which splits a string based on several rules. The rules are the following each part of the text consists of two parts itself (eg. XX-XX or Y-YY) the first subpart has to exist in a list we were given (I won't attach it here as there are ~680 possible combinations for the first subpart ranging from "A" to "ZW"). The second subpart can consist of 1 - 2 letters, so all possible combinations are
1
2
3
4
5
6
X-X, 
X-XX,
XX-X, 
XX-XX,
XXX-X 
and XXX-XX
.
I implemented this using a recursive function and backtracking, this however is not optimal for long words, as the runtime increases rather fast...
Do you have any idea on how to speed this up?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
std::vector<std::pair<std::string, std::string> > allParts;

void solve(std::string word, std::vector<std::string> firstPart, int partNumber, int pos)
{

	for (int i = 0; i < allParts.size(); i++)
	{
		if (allParts[i].first.empty() && allParts[i].first.empty())
		{
			allParts.erase(allParts.begin() + i);
		}
	}

	if (pos > word.length())
		return;

	if (partNumber > 0) {
		if (!allParts[allParts.size()-1].second.empty() && pos == word.length()) {
			for (int i = 0; i < allParts.size(); i++)
			{
				std::cout << allParts[i].first << "-" << allParts[i].second << std::endl;
			}
			std::cin.ignore();
			std::cin.get();

			exit(0);
		}
	}

	
	for (int i = 0; i < firstPart.size(); i++)
	{
		if (word[pos] == 223)
			return;

		if (firstPart[i] == std::string(1, word[pos]))
		{
			
			if (pos + 1 < word.size()) {
				allParts.push_back(std::make_pair("", ""));
				allParts[partNumber].first = word[pos];

				if (pos + 1 < word.length())
					allParts[partNumber].second = word[pos + 1];
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 2);
				allParts[partNumber].second = "";

				if (pos + 2 < word.length())
					allParts[partNumber].second = word.substr(pos + 1, 2);
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 3);

				allParts[partNumber].first = "";
				allParts[partNumber].second = "";

			}
		}
	else if (firstPart[i] == word.substr(pos, 2))
		{
			if (pos + 2 < word.length()) {
				allParts.push_back(std::make_pair("", ""));
				allParts[partNumber].first = word.substr(pos, 2);

				if (pos + 2 < word.length())
					allParts[partNumber].second = word[pos + 2];
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 3);
				allParts[partNumber].second = "";

				if (pos + 3 < word.length())
					allParts[partNumber].second = word.substr(pos + 2, 2);
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 3);

				allParts[partNumber].first = "";
				allParts[partNumber].second = "";

			}
		}
	else if (firstPart[i] == word.substr(pos, 3))
		{
			if (pos + 3 < word.length()) {
				allParts.push_back(std::make_pair("", ""));
				allParts[partNumber].first = word.substr(pos, 3);

				if (pos + 3 < word.length())
					allParts[partNumber].second = word[pos + 3];
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 3);
				allParts[partNumber].second = "";

				if (pos + 4 < word.length())
					allParts[partNumber].second = word.substr(pos + 3, 2);
				else
					return;

				solve(word, firstPart, partNumber + 1, pos + 4);

				allParts[partNumber].first = "";
				allParts[partNumber].second = "";
			}
		}
	}

	
	return;

}

Output for eg. "Ethernet" would be:
1
2
3
4
E-t
h-e
r-n
e-t

and to show the other rules: Computing would be
1
2
3
C-OM
P-UT
IN-G

I did not include main() as it is not relevant to my question ;)
Last edited on
you should be able to do this kind of work 'directly'

split out the first letter, check a lookup table of the patterns for all the X- patterns and handle what you found. If you did not find any, check the next letter and the XX- patterns. Then the XXX- patterns. each of these checks, being a lookup against the table, should be O(1). Apply the rule for the determined pattern, and move to next letter in the string that hasn't been handled yet.

Am I missing some complexity? This may be over-simplified, as I am looking more at your description and output than the actual code.
split out the first letter, check a lookup table of the patterns for all the X- patterns and handle what you found. If you did not find any, check the next letter and the XX- patterns. Then the XXX- patterns. each of these checks, being a lookup against the table, should be O(1).


Just to clarify it for me: you think I should split the list of all the predetermined patterns into 3 smaller lists (one for X, one for XX and one for XXX) and use these lists in my recursive function instead of one big list?

Edit: Or are you saying I should get rid of the recursive function as a whole?

EDIT 2 : Nevermeind, I found the problem. As I am from Germany, the list of words we had to check contained the umlauts "ä""ü""ö", which cannot be placed in the second part of the word. This in return meant that my program tried to fit these umlauts in as good as it could, which resulted in a enormous runtime. I now check whether there actually is a fixed part containing the umlaut and the surrounding letters and the program terminates in miliseconds.
@jonnin Thank you very much for your help, your solution will probably decrease runtime even more, so I'm still going to try that :)
Last edited on
Topic archived. No new replies allowed.