Simple Web Crawler

Hi all,

I'm new to C++ but not to programming. I'm trying to create a console application where I would enter the URL of a specific stock/financial message board. The program would then monitor the specific message board for mentions of stocks and keep a tally each time the stock is mentioned. My question is how to open a connection and get the data/html code which can be parsed. I have read up as best I could and it seems the best option is to use win api but I have no idea where to start. I should mention this is not a school project.
To open a connection look into WinSock or libraries such as curl. I'm not very good with either (yet) so I can't give you much help, but those are what you need to use to make connections.
Why are you making this?
going into winsock might be a little tricky. I'd do what gsingh2011 suggests and use curl with either php or perl. Perl would be my preference because of the regular expressions you might need to cut and chop your results.

You can try http://codediaries.blogspot.com/2009/12/c-winsock-example-using-client-server.html for a winsock tutorial.
You should start out simple and learn to write web services using c++. There is a lot of info out there.
i built a c++ class that makes easy to download a file or a page using the http protocol...
there is just a problem, the instructions are in italian ;p

anyway just reading the code could be useful for you... or maybe you can just try to use the class (it's under construction but it will work for what you have to do ;) )

http://mamo139.altervista.org/code_viewer.php?id=pastebin/http_download_v0.01.04.h&lang=c++
http://mamo139.altervista.org/code_viewer.php?id=pastebin/http_download_v0.01.04.cpp&lang=c++

example of how to use it in the easiest way (this example does not handle the errors that may occur during the download, if u need i'll show you how to do it):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include "http_download.h" 

int main (){ 

   http_download a; 

   a.initialize("http://neacm.fe.up.pt/pub/videolan/vlc/1.0.3/win32/vlc-1.0.3-win32.exe","vlc.exe",0); 

a.start();

     while(a.get_status() != HD_STATUS_DOWNLOAD_COMPLETED){ 
      Sleep(1000); 
      printf("status: %d\n", a.get_status()); 
   } 

   printf("main(): download completed!\n\n"); 
   getchar(); 
   return 0; 
} 


bye
Last edited on
Unfortunately, that code doesn't compile. It's missing "matrici.h".
Topic archived. No new replies allowed.