Download A Web Page

I would like to download a web page from the internet and save it as an HTML file.
You could use Winsock (Windows Socket Library), which is native component to the Windows SDK therefore it has no dependencies on 3rd party DLLs (Dynamic Link Libraries) or other 3rd party objects therefore it is a great choice to use but Wininet (Windows Internet Library) is also a native component to the Windows SDK and it does the job quicker than Winsock with the addition of half the trouble.

There are lot of articles of how to get source code of a website, but anyway this is the code should do the job of getting source code of a website, although I will give you the easy part to save the buffer into a file using Windows API or using the standard library for Input Output (IO) of files.

Code:
Do be aware I am using Visual C++ using Visual Studio, so I can use #pragma in order to perform a inline-linkage whereas in other compiler types such as GCC and others you may need to manually link it,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#pragma comment(lib,"wininet.lib") //remove if not using VC++.
#include<iostream>
#include<Windows.h>
#include<wininet.h>
#include<cstring>
using namespace std;
int main(){
HINTERNET connect = InternetOpen("MyBrowser",INTERNET_OPEN_TYPE_PRECONFIG,NULL, NULL, 0);
 
   if(!connect){
      cout<<"Connection Failed or Syntax error";
      return 0;
   }
 
HINTERNET OpenAddress = InternetOpenUrl(connect,"http://www.google.com", NULL, 0, INTERNET_FLAG_PRAGMA_NOCACHE|INTERNET_FLAG_KEEP_CONNECTION, 0);
 
   if ( !OpenAddress )
   {
      DWORD ErrorNum = GetLastError();
      cout<<"Failed to open URL \nError No: "<<ErrorNum;
      InternetCloseHandle(connect);
      return 0;
   }
 
   char DataReceived[4096];
   DWORD NumberOfBytesRead = 0;
   while(InternetReadFile(OpenAddress, DataReceived, 4096, &NumberOfBytesRead) && NumberOfBytesRead )
   {
           cout << DataReceived;
   }
 
   InternetCloseHandle(OpenAddress);
   InternetCloseHandle(connect);
 
   cin.get();
return 0;
}



You can easily add the buffer either by byte to byte of data into the File or add the complete buffer into the file, the second recommendation is a more suitable one, for real-time purposes as it is atomic either the file is created successfully or fails, therefore it can be easier to do error checking and provides more reliability.
Last edited on
Modern websites are partly rendered by JavaScript. Just fetching the HTML doesn't render active parts of the page.
Which version of VC++ do you have?
I tried to run your provided code but I got these errors:

1
2
3
4
1>c:\users\r's\documents\visual studio 2010\projects\html download\html download\html download.cpp(8): error C2664: 'InternetOpenW' : cannot convert parameter 1 from 'const char [10]' to 'LPCWSTR'
1>          Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
1>c:\users\r's\documents\visual studio 2010\projects\html download\html download\html download.cpp(15): error C2664: 'InternetOpenUrlW' : cannot convert parameter 2 from 'const char [22]' to 'LPCWSTR'
1>          Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast
Last edited on
I found an article that explained it to me. Thank you for your help!
The reason those error came, could be due to you do not have Windows SDK or you forgot to link Wininet.lib.
You're trying to pass ANSI strings to the Unicode entrypoints

The safest thing to do is alter you code to use the ANSI entrypoints directly. That is, use

InternetOpenA rather than InternetOpen
InternetOpenUrlA rather than InternetOpenUrl

(this rule applies to (almost) all WinAPI functions which take one or more string parameters.)

Andy

PS WinAPI "functions" which take strings are almost always actually macros that evaluate to the ANSI or Unicode (or Wide, hence the W) version of the function. Whether or not it evaluates to the -A or -W version of the function is controlled by the define UNICODE, which is set in Visual Studio via a project's "Character Set" property.

If you just plan to use ANSI chars the whole time, the easiest thing is to just use the -A forms of the functions directly and be done with it.

Or you need to read up about TCHARS, etc. (more macros and typedefs which swap between ANSI and Unicode.)
Last edited on
Regarding "Zin Byte": how do I link the library? I am really new to C++ and that would help in the future.
Hi,

Are you using VS (Visual Studio) if so just copy the above code and it should work.

Or else you need to link the libraries manually: https://www.youtube.com/watch?v=blZNVKYxyIY

(Dev C++)

The same rules applies for other GCC compilers and MinGW compilers.
Last edited on
Topic archived. No new replies allowed.