libxml2 is a pretty standard choice for HTML parsing. http://xmlsoft.org/
Keep in mind, this is literally just an HTML parser. If a website contains JS that manipulates the DOM, a parser will not execute that code, so you will not be able to see computed contents. You need something closer to a full-fledged web browser for that.
Thank you for your reply. I did some research on libxml2 and I read somewhere that libxml2 does not support HTML5 tags. Apparently to parse an HTML document using libxml2 you must first convert the HTML document to XML and then perform the parsing.
Google/Gumbo: C99, but the claims are impressive:
'Passes all html5lib tests, including the template tag.' and 'Tested on over 2.5 billion pages from Google's index'. https://github.com/google/gumbo-parser