I need libcurlcurl to download the right things!

So I have an example program which downloads webpage to a file.
http://www.cplusplus.com/forum/beginner/194073/

But it just downloads the source file, I want to download the whole text which I get when I open Inspect Element and copy all text from HTML tag. Sorry for bad english please help.
Last edited on
Bumping, please forgive me if this is stupid question, this is the first library I am learning.
Last edited on
I am not sure what the problem is. When you request a website the server will send you the source - consisting of html, css and js. like this excerpt from the url above
You need to search what you look for - either with string::find or regular expressions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Need help with libcurl! - C++ Forum</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
<link rel="stylesheet" type="text/css" href="/v321/main.css">
<script src="/v321/main.js" type="text/javascript"></script>
<script type='text/javascript'>
var googletag = googletag || {};
googletag.cmd = googletag.cmd || [];
(function() {
var gads = document.createElement('script');
gads.async = true;
gads.type = 'text/javascript';
var useSSL = 'https:' == document.location.protocol;
gads.src = (useSSL ? 'https:' : 'http:') + 
'//www.googletagservices.com/tag/js/gpt.js';
var node = document.getElementsByTagName('script')[0];
node.parentNode.insertBefore(gads, node);
})();
</script>

<script type='text/javascript'>
googletag.cmd.push(function() {
googletag.defineSlot('/32882001/L', [728, 90], 'div-gpt-ad-1427191279638-0').addService(googletag.pubads());
googletag.enableServices();
});
</script>
</head>
<body>
<div id="I_top">
<div id="I_header">
<div id="I_logo"><a href="/" title="cplusplus.com"><div></div></a></div>
<div id="I_search">
<form id="search" action="/search.do" method="get">
Search: <input name="q" size="20" class="txt"> <input type="submit" value="Go" class="btn">
</form>
</div>
<div id="I_bar">
<ul>
<li><a href="/forum/">Forum</a></li>
<li><a href="/forum/beginner/">Beginners</a></li>
<li class="here">Need help with libcurl!</li>
</ul>
</div>
<div id="I_user" class="C_LoginBox"><span title="ajax"></span></div>
</div>
</div>
<div id="I_mid">
<div id="I_wrap">
<div id="I_minheight"></div>
<div id="I_main">
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</div></div>
<div id="I_midclear"></div>
</div>
</div>
<div id="I_bottom">
<div id="I_footer">
	<a href="/">Home page</a> | <a href="/privacy.do">Privacy policy</a><br>&copy; cplusplus.com, 2000-2016 - All rights reserved - <i>v3.1</i><br><a href="/contact.do?referrer=www.cplusplus.com%2Fforum%2Fbeginner%2F194073%2F">Spotted an error? contact us</a>
</div>
</div>

<script type="text/javascript">
<!--
function NavFor(us) {document.getElementById('I_subnav').innerHTML=us.ok?'<div class="sect"><h3><b><a href="/user/">'+us.user+'</a></b></h3><ul><li><a href="/forum/myposts.cgi">My topics</a></li></ul></div>':'';}onSession(NavFor);ready();
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-521783-1']);
_gaq.push(['_trackPageview']);

(function() {
  var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
  ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
  var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

//-->
</script>

</body>
</html>
Thanks for reply. Here is the problem - From this link I want to take the Experience Points 151,713 but when I search for it it is not found in source code! It is found in Inspect element box though.
https://www.americasarmy.com/soldier/Pvt.Phushi
Last edited on
Ok so I read up and it seems javascript adds something.
Stack Overflow - When you say "view source", I'm assuming you're talking about the editor, not the actual "View Source". When you "view source" from the browser, you get the HTML as it was delivered by the server, not after javascript does its thing.

So how do I download the code after javascript does its thing? Not the html which was delivered by server before javascript does its thing?

I am afraid you would need a framework for that - either .NET, MFC or Qt where you get a kind of internal browser.
Ah ok thanks for that Mr.Thomas. So I will look into what you mentioned. Do you recommend me to learn libcurl then? Or should I not? Is it useful to know that(not for this project, in general)?
Last edited on
I think you need to decide first what you want to do and what to use. The above mentioned frameworks all provide their own classes for internet access and if you use them for GUI then there won't be much use for libcurl.

However be aware that all this frameworks take quite some time to learn. If you just need a little app for this website you might be better off asking in the job section. It's a rather easy thing to do and should not cost more than a few dollars.
Ok thanks. Well the reason I'm doing this website thing is to learn, so I guess now is the best time to start learning!
What OS and IDE do you use ?
Ubuntu 14.04 and vim to write c++, I downloaded Qt to check it out. Yea I know it is going to take a long time to learn and I don't get much free time because I am in my senior year, but yea trying to learn :D
Qt seems to be a good choice. If you find some time have a look at his videos:
https://www.youtube.com/watch?v=6KtOzh0StTc&list=PL2D1942A4688E9D63
Thank you for all the help, Mr Thomas. I will check out the link.
Topic archived. No new replies allowed.