| h9uest (155) | |
|
Hi all: I need to write a program in C++ that can search through google and other search engines. For example, it accepts the user input of "cat", then it will hook up with google image search and then download first 10 images of the results into a folder on the local machine. Can anyone shed light on this? I looked at google image search api, which requires javascript interaction. I'm a bit confused what to do next. If you could explain some details of google image search api, I'll be very thankful. | |
|
|
|
| screw (145) | |
|
Hi, You should use socket programming. Search socekt programming on google. i.e: http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html You can use boost library: http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio.html check the http client in example section: http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/examples.html | |
|
|
|
| Galik (2228) | |
|
You can use libCURL to search google: http://curl.haxx.se/libcurl/ | |
|
|
|
| h9uest (155) | |
|
@screw: Beautiful! I'll absolutely check it out. I think I'm going to try libcurl first, and I'll definitely try boost even if libcurl works. Thank you for the links. @Galik: Thank you, Galik! I've installed everything and I've run a few sample programs without any problem. I've successfully downloaded a page but don't know how to download all images (or specify the conditions for images that should be downloaded) in that page. There are indeed examples and api tutorials on the site, but relevant ones seem to be quite long and it is going to take some time of mine. So please feel free to provide links to easy/quick examples and tutorials about downloading images with libcurl! Many thanks again. | |
|
|
|
| Galik (2228) | |||
|
@h9uest Use libCURL the same way you download a web-page to download the images. Put in the URL of the image and write it to a file with the correct extension in its name and that should be fine. Here is an adaptation of an example of using libCURL I posted here recently:
The original example is here: http://cplusplus.com/forum/unices/45878/#msg249287 | |||
|
|
|||
| h9uest (155) | |
|
@Galik: Thank you! I think my wording confused you. The images I'm interested to download are the return results of a google page. For example, if I post a query to google, and the page will contain lots of images, embedded in html tags. It seems that xml parsing is necessary to retrieve the image urls. An example on the libcurl site: http://curl.haxx.se/libcurl/c/example.html see "HTML parsing". The main.c alone contains 6200 lines of code! I did some reading and found a pretty good xml parsing tool: libxml But again, it looks a bit overwhelming ... I definitely understand that's what it should be like about cs: keep learning new stuff. But given my current situation, it's a bit awkward because I don't have that much time. Yeah, so if you have some suggestions, like, those that can help me avoid the nasty html parsing or some good and handy tools for the task of "downloading imgs from google image search result page", please let me know! I guess I'll have to do it the hard way if no shortcuts are available. Thanks again! :) | |
|
|
|
| Galik (2228) | |
|
If you know regular expressions then you could use those to extract the image URLs from the returned web-page. Boost have a good regular expressions library: http://www.cs.brown.edu/~jwicks/boost/libs/regex/doc/introduction.html You probably need regex_search() http://www.cs.brown.edu/~jwicks/boost/libs/regex/doc/regex_search.html | |
|
|
|
| h9uest (155) | |
|
@Galik: Many thanks! I've decided to temporarily go on with the main thing of my project, and will get back to this issue later. | |
|
|
|