WinSock - getting source code from website

Pages: 12
*EDIT* - sorry, what you have should suffice, i was writing this while you posted, so i didn't see your last post!

actually it would look something more like this:
1
2
3
4
while( (n = recv( socket, &buff, sizeof(buff), 0 ) ) > 0 )
{
     // do something with data in 'buff'
}

The idea is that 'recv()' will return a value indicating the number of bytes recieved. If it returns 0, then the connection was closed (and there is no more data to recieve). the '>' is because if the call fails 'recv()' will return 'SOCKET_ERROR'. So after that loop ends, it is a good idea to check 'n' for 'SOCKET_ERROR' and call WSAGetLastError() if it is.
Last edited on
Alright, thanks. It seems to be working now. However, one more problem just came up. If I put the host as www.google.com it seems to work out fine, the output starts like this,
HTTP/1.1 200 OK
However, I tried other sites like, www.yahoo.com and www.msn.com and I got this,
HTTP/1.1 400 Bad Request
For msn it even said, Invalid Hostname. Why is it only working for google?
keep in mind that many websites can be hosted on a single server... i'm not entirely sure if it is what's causing your problem, but it is always a good idea to include the 'Host:' header in an http request. This tells the server which website you're actually trying to access. When you resolve the address of the actual server machine and send it a request, it can't assume you're trying to connect to a given website. So an updated request for you would look something like:
1
2
3
4
GET / HTTP/1.1<crlf>
Host: www.msn.com<crlf>
Connection: close<crlf>
<crlf>


EDIT: it's been a while since i've done this stuff, but i believe the only time where you might omit the 'Host:' header is if you use an absolute URI in the initial request line:
1
2
GET www.msn.com/ HTTP/1.1<crlf>
...
Last edited on
Haha, you're a genius. That fixed the problem. However, the part where the code gets cut off is still a problem for some reason. For example, try using www.cplusplus.com as the host. Look at the source code of the main page and you'll see the output stops after the lines,
1
2
3
<body onLoad="InitJS()">

<div id="header">

when there is clearly more in the source code.

Finally, what if I wanted to see another page on the site? For example, what if I wanted to see www.cplusplus.com/forums/? I tried plugging it in to where I originally had www.cplusplus.com and it either gives a bad request or it gives the code of a search page that searched for the site, depending on where you plugged it in.
Last edited on
Any ideas for both of these problems? It shouldn't be cutting it off anymore since I have that loop, unless the condition is reached too early... And when viewing a page on the site, do I change the Host: part or the url part, because I tried different combinations and its not working. Thanks.
sorry it has taken so long to reply i am home visiting the folks and they don't have a wireless home network (yay for the neighbors unsecured one and my laptop!). like i said, i haven't done any of this in a while so i'm just going by memory, but the host site is usually always the top-domain (i.e. www.google.com) you're connecting to, then the URI in the first line is either a sub-domain of that, or an absolute URI (i.e. www.cplusplus.com/forums/). I'm a bit puzzled about it still cutting you off; are you checking any error possibilities? like, do you check the last value returned by recv and call WSAGetLastError() if it indicates an error? if you are still having problems with it, feel free to post/message your code and i'll see if i can't get it working. take care and merry christmas!
So this is what I'm doing,
host = gethostbyname("www.cplusplus.com");
and here I'm using the full name,
send(Socket,"GET www.cplusplus.com/forums HTTP/1.1\r\nHost: www.cplusplus.com\r\nConnection: close\r\n\r\n", strlen("GET www.cplusplus.com/forums HTTP/1.1\r\nHost: www.cplusplus.com\r\nConnection: close\r\n\r\n"),0);
Is that what you were saying in your last post? It didn't work, it said it was a bad request.

And I added the WSAGetLastError() after the recv and it returns 0, so I don't know what the problem could be...
Topic archived. No new replies allowed.
Pages: 12