Trying to download a binary file from HTTP server

Pages: 123
I am trying to grab the binary file, but I don't see much of a difference using code like yours, and using mine with strcat...

I've just tried this with a 2.5k byte text file, though, and the script copies the whole thing perfectly. However, when I do it with an image/application, it only grabs a portion of the file. Really confusing me here :S
I'm fairly certain the the str* family of functions aren't happy if the data is not text. memcpy doesn't care.
Ah, alrighty :)

I still don't understand why it's not copying everything from the binary file, though.

The content-length is equal to 336 (the size of the image), while the length of the entire string is 278 (body and headers).
Is there some control character or something at the end of the binary that marks the end as the end. If so look for that to control the end of the recv loop execution. If you haven't received any data sleep and then try again until you get the end. I'm not sure where you're reading data from but quite often there are delays over the internet and/or from a slow machine. Your recv loop may be exiting prematurely because the data got held up.
Mm, checking right now. Thanks for the idea :)
Mm, exe's don't have EOF characters, do they?
Ok.
1
2
3
4
5
6
7
8
9
10
11
    char *buffer = (char*)malloc(56);
    memset(buffer, NULL, 56);

    char *response = (char*)malloc(512);
    memset(response, NULL, 512);

    while(recv(sockfd, buffer, 56, 0) > 0)
    {
        strcat(response, buffer);
        memset(buffer, NULL, 56);
    }


You need to clear response as well. Because strcat won't null terminate it. It'd be ideal for this to set the size of response to something large like 50000 just so you don't end up with some unexpected behaviour from buffer-overflows that don't crash the program.

recv(sockfd, buffer, 56, 0). Make that 55 not 56. Again it might not be null terminating it. Because you are using these C methods it's going to be very prone to buffer overflows. While it may not crash your code, it may have unexpected consequences.

I would look at something like this.

1
2
3
4
5
char response[50000] = {0};

while () {
 sprintf(response, "%s%s", response, buffer);
}

While still prone to memory overrun, your not having to worry about dynamically allocated memory from malloc. This also inits the response-buffer to null.
I'd still recommend using memcpy(). Don't forget you're dealing with binary data so you don't want to be inserting null terminators and sprintf() and strcat() need them to identify the ends of character strings.

I do agree with Zaitas comment about clearing response first and making it very big.

You may also still have an issue with timing so if there is no EOF marker then use a timeout. If you read zero bytes, sleep then try again, if it's still zero exit the loop.
Alrighty...

I don't know what you mean by setting 56 to 55... Can you please explain?

I did set the response size to 50,000, and I did memset both the buffer and response variables after declaring them.

Current script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#include <netdb.h>
#include <string.h>
#include <fstream.h>
#include <arpa/inet.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

int main()
{
    int sockfd, c;
    struct sockaddr_in addr;
    struct hostent *host = gethostbyname("localhost");

    sockfd = socket(PF_INET, SOCK_STREAM, 0);

    addr.sin_family = AF_INET;
    addr.sin_port = htons(80);
    addr.sin_addr = *((struct in_addr *)host->h_addr);
    memset(addr.sin_zero, '\0', sizeof addr.sin_zero);

    c = connect(sockfd, (struct sockaddr *)&addr, sizeof addr);

    while(c == -1)
    {
        sleep(5);
        c = connect(sockfd, (struct sockaddr *)&addr, sizeof addr);
    }

    char *packet = "GET /File HTTP/1.0\r\n\r\n";

    send(sockfd, packet, strlen(packet), 0);

    char *buffer = (char*)malloc(128);
    memset(buffer, NULL, 128);

    char *response = (char*)malloc(500000);
    memset(response, NULL, 500000);

    while(recv(sockfd, buffer, 127, 0) > 0)
    {
        sprintf(response, "%s%s", response, buffer);
        printf("Buffer size: %d, Total response size: %d\r\n", strlen(buffer), strlen(response));
        memset(buffer, NULL, 127);
    }

    return 0;
}


Output of a 15 byte text file:

Buffer size: 128, Total response size: 128
Buffer size: 128, Total response size: 256
Buffer size: 27, Total response size: 283


Output of a 2,469 byte text file:

Buffer size: 128, Total response size: 128
Buffer size: 128, Total response size: 256
Buffer size: 128, Total response size: 384
Buffer size: 128, Total response size: 512
Buffer size: 128, Total response size: 640
Buffer size: 128, Total response size: 768
Buffer size: 128, Total response size: 896
Buffer size: 128, Total response size: 1024
Buffer size: 128, Total response size: 1152
Buffer size: 128, Total response size: 1280
Buffer size: 128, Total response size: 1408
Buffer size: 128, Total response size: 1536
Buffer size: 128, Total response size: 1664
Buffer size: 128, Total response size: 1792
Buffer size: 128, Total response size: 1920
Buffer size: 128, Total response size: 2048
Buffer size: 128, Total response size: 2176
Buffer size: 128, Total response size: 2304
Buffer size: 128, Total response size: 2432
Buffer size: 128, Total response size: 2560
Buffer size: 128, Total response size: 2688
Buffer size: 128, Total response size: 2816
Buffer size: 25, Total response size: 2841


Both of those work.

Output of a 336 byte PNG file:

Buffer size: 128, Total response size: 128
Buffer size: 128, Total response size: 256
Buffer size: 22, Total response size: 278
Buffer size: 5, Total response size: 283
Buffer size: 0, Total response size: 283


As you can see, it doesn't have to do with file size.

I'm a wee bit stumped by this...

Meh, if you (like me) have no idea what's going on here, I'll just go to a different forum and ask my question there. Not a problem.

-Mike
Line 45 should be memset(buffer, NULL, 128);

I meant your recv function should be 1 less than the size of your buffer. This will ensure the last character is always going to be a null. You need to null terminate buffers to ensure you don't end up with invalid memory.

Can you please print the full response headers from your PNG request?
Does the server acknowledge you are getting a PNG back through the content-type header?

I would also verify that the data is actually missing. File sizes on disk and in reality are actually different. So if your saving the information into a PNG file make sure you don't actually have everything there.

I've just spotted you're pulling from localhost so it shouldn't be timing although it has the symptoms of timing. Try the timeout thing I mentioned just to make sure.

EDIT: Actually thinking about it, it could well be timing. With it being localhost, if you reading program is taking the CPU, and for loops can hog CPU, your sending program won't get a chance to send until your receiving program exits. Put in a call to sleep() if bytes read is zero, this will also force a context switch and give the CPU to the sending program
Last edited on
I set the recv() function to grab a little less data than the buffer could hold, and it got a little more data that before. Then I decided to set it lower and lower, and eventually got it down to one. To my surprise, the application didn't segfault :)

It actually send back a lot of data. (Full response was 533 bytes, actual bytes from the image was 263.)

But saving that data in a file still didn't make it viewable as an image...

I did get this though: http://img443.imageshack.us/img443/7405/hexnr5.png

On the right is the original image, and the left is the one output from this script.

And the headers from the PNG request:

HTTP/1.1 200 OK
Date: Thu, 12 Jun 2008 22:26:53 GMT
Server: Apache/2.2.4 (Ubuntu) PHP/5.2.3-1ubuntu6.3
Last-Modified: Thu, 30 Aug 2007 04:58:49 GMT
ETag: "5c00ec-150-8fd9f440"
Accept-Ranges: bytes
Content-Length: 336
Connection: close
Content-Type: image/png


Edit: Alright. I'll check the timing situation real quick :)

Edit: You mean like this?

1
2
3
4
5
6
7
8
9
10
11
12
    while(r = recv(sockfd, buffer, 128, 0))
    {
        if(r == 0)
        {
            printf("Sleeping\r\n");
            sleep(1);
            continue;
        }
        sprintf(response, "%s%s", response, buffer);
        printf("Buffer size: %d, Total response size: %d\r\n", strlen(buffer), strlen(response));
        memset(buffer, NULL, 128);
    }
Last edited on
Ahh ok. Your problem is because the image contains null bytes. You will need to use memcpy() as bnbertha said.

sprintf, strcat, printf etc all rely on null terminated strings. But your image contains series of nulls. So you can't use them. Only memory functions
Last edited on
Oops!

I got them mixed around!

http://img443.imageshack.us/img443/7405/hexnr5.png

On the left is the original image, on the right is the output from the script. My apologizes ;)

But I think I see it now. They have even amounts of bytes, as is show by GHex. But some of them aren't registered as actual bytes...?

Edit: How do I fix this? :p
Last edited on
yes, but you don't need the continue.

null bytes eh? Should still register as a byte in the received bytes count.......I think.

EDIT: we're all out of sync :-P

EDIT: yes, as Zaita said, fix it using memcpy()
Last edited on
Heh :p

Kk, let' see...

Edit: Not at easy as I thought it would be.

There's memchr, but that only searches the block of memory for one character. I need it to search the entire string. Is there an equivalent of strstr that works with memory, not just strings?
Last edited on
you can still use strstr to search for a string, As long as the string is before your PNG data
Oh yeah.. Haha, thanks :D

Edit: Erm... Am I doing this right?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#include <netdb.h>
#include <string.h>
#include <fstream.h>
#include <arpa/inet.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

int main()
{
    int sockfd, c;
    struct sockaddr_in addr;
    struct hostent *host = gethostbyname("localhost");

    sockfd = socket(PF_INET, SOCK_STREAM, 0);

    addr.sin_family = AF_INET;
    addr.sin_port = htons(80);
    addr.sin_addr = *((struct in_addr *)host->h_addr);
    memset(addr.sin_zero, '\0', sizeof addr.sin_zero);

    c = connect(sockfd, (struct sockaddr *)&addr, sizeof addr);

    while(c == -1)
    {
        sleep(5);
        c = connect(sockfd, (struct sockaddr *)&addr, sizeof addr);
    }

    char *packet = "GET /Picture.png HTTP/1.0\r\n\r\n";

    send(sockfd, packet, strlen(packet), 0);

    char *buffer = (char*)malloc(128);
    memset(buffer, NULL, 128);

    char *response = (char*)malloc(500000);
    memset(response, NULL, 500000);

    size_t r;

    char *data = response;

    while(r = recv(sockfd, buffer, 128, 0))
    {
        memcpy(data, buffer, r);
        data += r;
        printf("Buffer size: %d, Total response size: %d\r\n", sizeof(buffer), sizeof(response));
        memset(buffer, NULL, 128);
    }

    printf("%d\r\n", sizeof(response));

    char *pch = strstr(response, "\r\n\r\n");

    char mesg[500];

    memcpy(mesg, pch + 4, sizeof(response));

    fstream fp("/home/mike/Desktop/Haruhi.png", std::ios::out|std::ios::binary);
    fp.write(mesg, 336);

    return 0;
}


Because it doesn't seem to be working...

Also, what's the difference between sizeof() and size_t()? and why don't they return the true string lengths?
Last edited on
Whats happening now?
Jason2gs,what goal are you reaching? I don't somewhat understand your writen code. Do you get faster file to use many than one thread? I'm get any way on this http://www.cplusplus.com/forum/unices/2148/ to using http- query header. And I think get a file power by http- query to use range specify http RFC.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35
Please any ideas about this.
Pages: 123