Taking information from a webpage

Hi!
I want to be able to read a webpage as an ordinary text file. (probably using some kind of stream)

The page in question: http://api.eve-central.com/api/marketstat?typeid=34&typeid=35&regionlimit=10000002
(I, of course want several pages like this but once I have a working example I can probably easily replicate it to work on everything else :) )

I can proccess the XML structure into structs and that bit, I just need to get a hold on this either file, or textstream.

I would be very glad for a working example since I've been trying to find a library that does this with a documentation that I can understand.

// Jonas Wingren
You haven't said what platform or operating system you're working with? Anyway...

The best known library for this kind of thing is cURL
http://curl.haxx.se/

You should find that their "easy" interface (their quotes!) should be enough for your needs.
http://curl.haxx.se/libcurl/c/libcurl-easy.html

If you download the appropriate development libraries, for your platform/operating system, you will find it comes with docs and samples, including the one I customized to download this thread (see below).
http://curl.haxx.se/download.html

I have libcurl-7.19.3-win32-ssl-msvc, which is the latest available pre-built development package listed under Visual C++. If you need a newer version for Windows, you will probably have to build it yourself.

If you're using Linux, I would expect libcurl would be available via your system's package manager.

If you are using Windows, another possility is WinINet. See this thread from last year for details.

Get data from a Internet file into a char
http://m.cplusplus.com/forum/windows/62128/

Note that the program discussed in this thread is reading the downloaded content into a set of char buffers. If you're writing the data back to file immediately, you will be able to modify the code to reuse a single buffer, rather than creating a new one each iteration of the loop.

Andy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
#include <cstdio>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>

size_t write_stream(char* ptr, size_t size, size_t nmemb, void* stream) {
    return fwrite(ptr, size, nmemb, reinterpret_cast<FILE*>(stream));
}

int main() {
    CURLcode res = CURLE_OK;
    const char* url       = "http://www.cplusplus.com/forum/beginner/105882/";
    const char* file_name = "thread-beginner-105882.html";
    std::cout << "download : " << url       << std::endl;
    std::cout << "to file  : " << file_name << std::endl;
    CURL* curl = curl_easy_init();
    if (0 != curl) {
        FILE* fp = fopen(file_name,"wb");
        if(0 != fp) {
            curl_easy_setopt(curl, CURLOPT_URL, url);
            curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_stream);
            curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
            res = curl_easy_perform(curl);
            if(CURLE_OK == res) {
                std::cout << "curl_easy_perform succeeded" << std::endl;
            } else {
                std::cout << "curl_easy_perform failed" << std::endl;
            }
            fclose(fp);
        }
        curl_easy_cleanup(curl);
    }
    return 0;
}


Contents of file thread-beginner-105882.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Taking information from a webpage - C++ Forum</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
<link rel="stylesheet" type="text/css" href="/v315/main.css">
<script src="/v315/main.js" type="text/javascript"></script>
</head>
<body>

...

</div><div id="I_content"><h3><div class="C_ico question" title="question">&nbsp;</div>
 Taking information from a webpage</h3><span id="CH_edttl"></span>
<span class="rootdatPost" title="105882,root,0,-1,2,0"></span><div id="CH_PostList">
<div class="C_forPost" id="msg572250"><span title="572250,64449,1023,106,1"></span>
<div class="box">
<div class="boxtop">
<div class="dwhen"><a href="#msg572250" title="Link to this post">
<img src="/img/link.png" width="16" height="8"></a> ... 
<div class="dwho"><a href="/user/Jonas_Wingren/"><b>Jonas Wingren</b> (106)</a></div>
</div>
<div class="dwhat" colspan="2" id="CH_i572250">
Hi!<br>
I want to be able to read a webpage as an ordinary text file. (probably using some kind of stream)<br>
<br>

...

<div class="C_forPost" id="msg572309"><span title="572309,62677,1023,2514,0"></span>
<div class="box">
<div class="boxtop">
<div class="dwhen"><a href="#msg572309" title="Link to this post">
<img src="/img/link.png" width="16" height="8"></a> ... 
<div class="dwho"><a href="/user/andywestken/"><b>andywestken</b> (2514)</a></div>
</div>
<div class="dwhat" colspan="2" id="CH_i572309">
You haven't said what platform or operating system you're working with? Anyway...<br>
<br>
The best known library for this kind of thing is cURL<br>
<a href="http://curl.haxx.se/">http://curl.haxx.se/</a><br>
<br>

...
Last edited on
should solve it, will take a look at it tomorrow, i'm using Code::Blocks though and I beleive Iwas having issues linking the curl library. But If I have any further problems I'll reply to this post. Will probably take me 1-2 days to get the time to sit down and do some programming on the spare time... too bad :(
I forgot the other, even easier Windows specific approach earlier:

URLDownloadToFile function
http://msdn.microsoft.com/en-us/library/ms775123%28v=vs.85%29.aspx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <tchar.h>
#include <urlmon.h>
#include <iostream>
#include <iomanip>
#pragma comment(lib, "urlmon.lib")
using namespace std;

int main()
{
	const TCHAR url[] = _T("http://www.cplusplus.com/forum/windows/58696/");
	const TCHAR filePath[] = _T("C:\\Test\\windows-58696.txt");

	HRESULT hr = URLDownloadToFile(
		NULL,   // A pointer to the controlling IUnknown interface
		url,
		filePath,
		0,      // Reserved. Must be set to 0.
		NULL ); // A pointer to the IBindStatusCallback 
	if(SUCCEEDED(hr))
	{
		cout << "Downloaded OK" << endl;
	}
	else
	{
		cout << "An error occured: Error code = 0x" << hex << hr << endl;
	}

	return 0;
}


Andy
Last edited on
I fixed it by building it from source, got no types.h though but I believe it's working, I can access the information on this page for example. Thanks for the help anyways.
Topic archived. No new replies allowed.