Q-Learning Example isn't running

Hello everyone,I have this q-learning example taken from the following website :http://mnemstudio.org/path-finding-q-learning-example-1.htm. The problem is when I run the program there is no results and no error as well. Can anybody help please?


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
// Author:		John McCullock
// Date:		11-05-05
// Description:	Q-Learning Example 1.

#include <iostream>
#include <iomanip>
#include <ctime>

using namespace std;

const int qSize = 6;
const double gamma = 0.8;
const int iterations = 10;
int initialStates[qSize] = {1, 3, 5, 2, 4, 0};

int R[qSize][qSize] =  {{-1, -1, -1, -1, 0, -1},
			{-1, -1, -1, 0, -1, 100},
			{-1, -1, -1, 0, -1, -1},
			{-1, 0, 0, -1, 0, -1},
			{0, -1, -1, 0, -1, 100},
			{-1, 0, -1, -1, 0, 100}};

int Q[qSize][qSize];
int currentState;

void episode(int initialState);
void chooseAnAction();
int getRandomAction(int upperBound, int lowerBound);
void initialize();
int maximum(int state, bool returnIndexOnly);
int reward(int action);

int main(){

	int newState;

	initialize();

    //Perform learning trials starting at all initial states.
    for(int j = 0; j <= (iterations - 1); j++){
        for(int i = 0; i <= (qSize - 1); i++){
            episode(initialStates[i]);
		} // i
	} // j

    //Print out Q matrix.
    for(int i = 0; i <= (qSize - 1); i++){
        for(int j = 0; j <= (qSize - 1); j++){
            cout << setw(5) << Q[i][j];
			if(j < qSize - 1){
				cout << ",";
			}
		} // j
        cout << "\n";
	} // i
    cout << "\n";

	//Perform tests, starting at all initial states.
	for(int i = 0; i <= (qSize - 1); i++){
        currentState = initialStates[i];
        newState = 0;
		do {
            newState = maximum(currentState, true);
            cout << currentState << ", ";
            currentState = newState;
        } while(currentState < 5);
        cout << "5" << endl;
	} // i

	return 0;
}

void episode(int initialState){

    currentState = initialState;

    //Travel from state to state until goal state is reached.
	do {
        chooseAnAction();
	} while(currentState == 5);

    //When currentState = 5, run through the set once more to
    //for convergence.
    for(int i = 0; i <= (qSize - 1); i++){
        chooseAnAction();
	} // i
}

void chooseAnAction(){

	int possibleAction;

    //Randomly choose a possible action connected to the current state.
    possibleAction = getRandomAction(qSize, 0);

	if(R[currentState][possibleAction] >= 0){
        Q[currentState][possibleAction] = reward(possibleAction);
        currentState = possibleAction;
	}
}

int getRandomAction(int upperBound, int lowerBound){

	int action;
	bool choiceIsValid = false;
	int range = (upperBound - lowerBound) + 1;

    //Randomly choose a possible action connected to the current state.
    do {
        //Get a random value between 0 and 6.
        action = lowerBound + int(range * rand() / (RAND_MAX + 1.0));
		if(R[currentState][action] > -1){
            choiceIsValid = true;
		}
    } while(choiceIsValid == false);

    return action;
}

void initialize(){

	srand((unsigned)time(0));

    for(int i = 0; i <= (qSize - 1); i++){
        for(int j = 0; j <= (qSize - 1); j++){
            Q[i][j] = 0;
		} // j
	} // i
}

int maximum(int state, bool returnIndexOnly){
// if returnIndexOnly = true, a Q matrix index is returned.
// if returnIndexOnly = false, a Q matrix element is returned.

	int winner;
	bool foundNewWinner;
	bool done = false;

    winner = 0;
    
	do {
        foundNewWinner = false;
        for(int i = 0; i <= (qSize - 1); i++){
			if((i < winner) || (i > winner)){     //Avoid self-comparison.
				if(Q[state][i] > Q[state][winner]){
                    winner = i;
                    foundNewWinner = true;
				}
			}
		} // i

		if(foundNewWinner == false){
            done = true;
		}

    } while(done = false);

	if(returnIndexOnly == true){
		return winner;
	}else{
		return Q[state][winner];
	}
}

int reward(int action){
				
    return static_cast<int>(R[currentState][action] + (gamma * maximum(action, false)));
}
Last edited on
Hello, and welcome to the forums! Please use code tags in the future by clicking the <> button to the right of a text box.

I compiled the code from the link you posted, but with warnings enabled:
http://coliru.stacked-crooked.com/a/85ae819e4cbcf2a4

Around line 156, there is a do-while condition } while(done = false);. This value will always be false. There may be other problems with the code, but at least that is one you can eliminate.
This value will always be false.

Note that if you do the naive fix: } while (done == false);, you will break the code. It works fine (or seems to -- I haven't bothered validating the output) as it is, but removing the do-while loop wrapper around the for loop completely is the "fix" to do. Does your console window close before you see the output?
Hello kevinkjt2000 , thank you for your comments and welcome , I did fix it the posted code's format. And yes, I believe that the problem is there in the line (156), however I was wondering how they published the code without checking? Thank you again kevinkjt2000 :).
Last edited on
There is no problem with line 156. It works as it should. It is an unusual construct that can be removed without ill effect, but it doesn't cause any issues.

Does your console window close before you see the output?
cire, thank you . Yes, my console window closed before I see any output. Now I'm working on removing the do-while loop. Thank you guys you really helped me a lot.
Topic archived. No new replies allowed.