Access url sourcecode

Forum

Forum
Windows Programming
Access url sourcecode

Access url sourcecode

Hi!

I am looking for a way to access the sourcecode of several websites using c++ (Visual Studio 2010).
I have to go through thousands of websites to extract the information i need, and obviously I'm not to eager to do this manually.
I obviously don't want Visual Studio to open each individual website in IE or Chrome. As I said, I just want access to the source of each url.
I've found a way to export results to excel (where I will analyze them), but I have not found a clever way to access the sourcecode of the websites.

I am quite new to programming, but my idea was at first that I could temporarily store the sourcecode in some variable and then extract the bits I need. I then realized that a variable (such as a string, which was my first idea) may not be able to hold that amount of information.
My idea is that I can temporarily store the sourcecode in for instance "temp.txt" and then extract the information I need from this file. I then have to delete the temporary file (or at least overwrite it), and repeat the process with a while loop.
I don't want thousands of websites to open up, and I don't want thousands of textfiles containing the sourcecode..

I appriciate all the help I can get:)

Last edited on

modoran (2077)

A string (a memory buffer) is exactly what you need, it will be much faster than using a temporary file.
You can use wininet, winhttp (these are windows default libraries) or you can use libcurl (more elegant solution).

powelKim (3)

but how do i access the source of the url?

modoran (2077)

You always get the html source of URL, browsers usually further interpret and parses it.

powelKim (3)

I guess I got a bit of reading to do:p
Thanks for the help.

Topic archived. No new replies allowed.

C++

Forum

Access url sourcecode