I'm making a program to download the html of a website. There is information within the html that I want to extract.
I can use libcurl to download the html but I'm really struggling to parse the html. I've tried converting the html to xml using HTML tidy (recommended here http://www.mostthingsweb.com/2013/02/parsing-html-with-c/) but it just gives:
1 2 3 4 5
line 621 column 13 - Error: <time> is not recognized!
line 3156 column 9 - Error: <time> is not recognized!
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
Aborted