InternetReadFile faulty..

function taken from: http://www.cplusplus.com/forum/windows/109799/

My function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
void download_words(string file_name = "words_file", const string& URL = 
	"https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt",
	const string& direc = "Resource_folder_Hangman", const string& exten = ".txt") {

	if (direc.empty())
		file_name += exten;
	else
		file_name = direc + '\\' + file_name + exten;

	ofstream file(file_name);

	HINTERNET connect = InternetOpen("MyBrowser", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);

	if (!connect) {
		cout << "Connection Failed or Syntax error\n";
		exit_program();
	}

	HINTERNET OpenAddress = InternetOpenUrl(connect, URL.c_str(), NULL, 0, INTERNET_FLAG_PRAGMA_NOCACHE | INTERNET_FLAG_KEEP_CONNECTION, 0);

	if (!OpenAddress)
	{
		DWORD ErrorNum = GetLastError();
		cout << "Failed to open URL \nError No: " << ErrorNum << '\n';
		InternetCloseHandle(connect);
		exit_program();
	}

	char DataReceived[4096];
	DWORD NumberOfBytesRead = 0;
	while (InternetReadFile(OpenAddress, DataReceived, 4096, &NumberOfBytesRead) && NumberOfBytesRead)
	{
		file << DataReceived;
	}

	InternetCloseHandle(OpenAddress);
	InternetCloseHandle(connect);
}


I'm not able to parse the html of the page: https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt PROPERLY.

I don't know what's happening.

In my word file, most of the words are correct but I get invalid stuff like:
zoÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ , zoÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ogamy , zimmisÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ

I get a list of words like I'm supposed to but some words are faulty and at the end there is a quirk.

And in the end (zwitterionic is the last word) I get 'zwitterioniczonuroid' (zonuroid is already there above) and then for some reason it starts writing words after zonuroid till zogamete and in the end it writes zoÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ



My mind is blown

Last edited on
Seems to be an encoding issue. The website uses UTF-8 but you use ASCII with your char type
What should I do? I have absolutely no clue in this stuff.. Can C++ read UTF-8? But I want to eventually store it in char because the rest of the program is in char itself.

But what explains that it continues to write to the file after finishing writing all the words? And that it starts writing from almost the end but little far from the end that is the word zonuroid and then writes only until zogamete from there. That couldn't be because of encoding right? Should I change the while loop? Maybe a for-loop that iterates the number equivalent to number of words.

And also all characters in that page are latin characters right? So why is it an issue with encoding?
Last edited on
But what explains that it continues to write to the file after finishing writing all the words? And that it starts writing from almost the end.
I don't understand your issue. When I run your code it writes them from the beginning but misses the last 7 words.

Should I change the while loop? Maybe a for-loop that iterates the number equivalent to number of words.
First you need to solve the encoding problem. Maybe ask Google about downloading UTF-8 file from the internet.

Thomas click ctrl+f and search for zwitter which is the last word. The program hasn't missed the last 7 words but apparently continues to write after it.

Well that is what happens for me.
If it did miss the last 7 words, why would it miss the last 7 words?

Maybe ask Google about downloading UTF-8 file from the internet.

I can't read UTF-8 in C++ and then convert it to char? How would I download UTF-8 file from the internet using C++ program? I'll try googling too.

latin characters have the same ascii code in UTF 8 right? But the webpage only has latin characters.. this is confusing..
Last edited on
What else can I use to just read html as an alternative? Libcurl? What and where can I get that?
What else can I use to just read html as an alternative?

Using C# maybe.
1
2
3
System.Net.WebClient client = new System.Net.WebClient();
string html = client.DownloadString("Your Url");
System.IO.File.WriteAllText("Filename", html, Encoding.UTF8);

Can't be easier.

Libcurl? What and where can I get that?
Google will tell you.

Why don't you just download the file manually ?
Only have ever used C++ that too just started recently.

Any article suggestions for including small bits of C# in C++?

-> Why don't you just download the file manually ?
I'm making a game where the computer guesses the user's word. The computer refers a textfile for the words. In the case that this textfile has been deleted or altered by the user, I want the program to be able to automatically repair itself by downloading the text. And also, on first run that's how the program gets the textfile, by downloading it.

This way I can share just one piece of code with anybody instead of having to worry about them not properly including the textfile.
Any article suggestions for including small bits of C# in C++?

Not really, I saw some but they used C++ CLR as well and that would complicate it.

On option to create a separate C# console app to download the text file and call it from your game via system or ShellExecute or CreateProcess.

Another option would be to write the game in C# so you also could make a nice GUI
https://www.startpage.com/do/dsearch?query=hangman+WPF+C%23&cat=web&pl=ext-ff&language=english

Finally if you want to stick to C++ research if libcurl can download the file in URL8 and convert it to ASCII.
Topic archived. No new replies allowed.