I am looking for a way to access the sourcecode of several websites using c++ (Visual Studio 2010).
I have to go through thousands of websites to extract the information i need, and obviously I'm not to eager to do this manually.
I obviously don't want Visual Studio to open each individual website in IE or Chrome. As I said, I just want access to the source of each url.
I've found a way to export results to excel (where I will analyze them), but I have not found a clever way to access the sourcecode of the websites.
I am quite new to programming, but my idea was at first that I could temporarily store the sourcecode in some variable and then extract the bits I need. I then realized that a variable (such as a string, which was my first idea) may not be able to hold that amount of information.
My idea is that I can temporarily store the sourcecode in for instance "temp.txt" and then extract the information I need from this file. I then have to delete the temporary file (or at least overwrite it), and repeat the process with a while loop.
I don't want thousands of websites to open up, and I don't want thousands of textfiles containing the sourcecode..
A string (a memory buffer) is exactly what you need, it will be much faster than using a temporary file.
You can use wininet, winhttp (these are windows default libraries) or you can use libcurl (more elegant solution).