how do I build this program?

there is a website that has one audio file on each page. There are hundreds of web page of such. The names of the audio files are named as arbitrary numbers. And their proper names are some texts on the webpage. How can I make a program that download all the possible audio files and rename them to their proper name?

I've learned the basics of C++ up to class. What other programming knowledge do I have to learn in order to do such thing as getting data from websites?
closed account (o3hC5Di1)
Hi there,

You should look into using wget for this.
It can (recursively) download all links (of a certain filetype if you like).

There may still be a problem renaming the actual files, but there's probably some way to fix that if you google for it.

If you want to go C++, I think you would need:

* libcurl - for making the HTTP requests
* some html / DOM parsing library that allows you to find the audio's filename and url. Regex might work too, but I don't think C++11 regex is implemented by most compilers yet, so you would have to get boost::regex anyway, in which case a DOM parser would be easier.

For small tasks you which to automate such as this, it's often easier to use a scripting language. PHP for instance has built in libcurl and DOM parsing support and it would surprise me if Python or Perl didn't either. Those might offer an easier way to do this particular task.

On a sidenote, I'm assuming you have permission to do so and are not going to break any copyright's here?

All the best,
NwN
Last edited on
NwN wrote:
PHP for instance has built in libcurl and DOM parsing support and it would surprise me if Python or Perl didn't either.


PHP has NOT built-in support for CURL, altough many webhosts has enabled curl extension.

Perl do not use curl, not sure if it even has a curl module, usually LWP is used in Perl's world.
closed account (o3hC5Di1)
Ah right, sorry - PHP only supports it with the module installed, which I do believe is installed by default on a lot of linux distro's.

All the best,
NwN
Topic archived. No new replies allowed.