Detect if a webpage changed

Hi! I'd like to make a program that tells me if a webpage changed from its last "visit". Since I'm quite a beginner I have no idea on how to proceed. I thought to get the html code from the page with libcurl, save it, then compare it with the one it gets in its next visit. But I don't know if it's possible or how. Any hints? Thanks!
you just described all you have to do:
1. get libcurl working
2. download page
3. compare it with previous download

instead of compare there might exist some revision number in page code so you might only read some string with date
Request the header (with HEAD) and compare the timestamps.
Just use some kind of network interface (like socket) and send:
GET /index.html HTTP/1.1
Host: www.example.com


You should recieve something like:
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
ETag: "3f80f-1b6-3e1cb03b"
Content-Type: text/html; charset=UTF-8
Content-Length: 131
Accept-Ranges: bytes
Connection: close

<html>
<head>
<title>An Example Page</title>
</head>
<body>
Hello World, this is a very simple HTML document.
</body>
</html>

source: http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

Where the bold item should interest you.
Last edited on
As you can see, the date is in the header.

Why request the whole page (with GET) when you can request the header alone (with HEAD)? You'll find that most real pages are larger than your 131 byte example.
Why request the whole page (with GET) when you can request the header alone (with HEAD)?


overlooked it
Request the header (with HEAD) and compare the timestamps.

That looks like what I need in a fairly simple solution. But what is HEAD? How do I use it? I didn't find much on Google. Thanks everyone!
http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods

HEAD is one of requests you can send to server.
Topic archived. No new replies allowed.