I am trying to merge binary files

Pages: 12
I am attempting to merge binary files. However, this is to no avail. The program keeps segfaulting. I want to merge the buffers the files are stored in and then write the new one to disk. Anyway, here is my code. Any help is greatly appreciated.

Main.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include "getsize.h"
long lSize;
char * buffer;
size_t result;
FILE * pFile;
FILE * pFile2;
FILE * pFile3;

void read1()
{
    pFile = fopen ( "uTorrent.exe", "rb");
    fseek (pFile , 0 , SEEK_END);
    lSize = ftell (pFile);
    rewind (pFile);
    buffer = (char*) malloc (sizeof(char)*lSize);
    result = fread (buffer,1,lSize,pFile);
}
void read2()
{
    pFile2 = fopen ( "CCleaner.exe", "rb");
    fseek (pFile2 , 0 , SEEK_END);
    lSize = ftell (pFile2);
    rewind (pFile2);
    buffer = (char*) malloc (sizeof(char)*lSize);
    result = fread (buffer,1,lSize,pFile);
}
void write()
{
    pFile3 = fopen ( "test.exe", "a+");
    FILE * buffer[] = {pFile2, pFile}; // It would not let me compile with "char * buffer[] = {pFile2, pFile};"
    fwrite (buffer , 1 , z , pFile3 );
}
int main()
{
    calcsize();
    read1();
    fclose (pFile);
    read2();
    fclose (pFile2);
    write();
    fclose (pFile3);
    free (buffer);
    return 0;
}


getsize.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#include <stdio.h>
#include <stdlib.h>

FILE * file1;
FILE * file2;
long long x, y, z,a ,b;
long fSize, fSize2;
int calcsize()
{
 file1 = fopen ( "uTorrent.exe", "rb");
 file2 = fopen ( "CCleaner.exe", "rb");
 fseek (file1, 0, SEEK_END);
 fSize = ftell (file1);
 fSize2 = ftell (file2);
x = sizeof(file1);
y = sizeof(file2);
b = x * fSize;
a = y * fSize2;
return z = a + b;
}
Last edited on
You have written functions that can potentially lead to memory leaks. Having said that, you have some problems with your code logic. read1() loads the file into buffer. read2() creates a new heap space which buffer will point to, then loads another file into it. Now, how will write() know where the buffer created by read1() is?

On top of that, do not put your clean up code outside the scope of where you created your resources. This is a very bad practice! Your read1() function obtains file handle yet this handle is closed after read1() returns. Obviously, for a small program like yours, it may not be a big deal, but correcting these coding practices early on is much more important.

In any case, your write() function needs to be re-written. You can actually improve this program by getting rid of the write() function. The logic will go like this:

1. Create a new file for writing
2. Open first file, read contents, write to the new file, close first file
3. Open second file, read contents, append to the new file, close second file
4. Close new file.


Also, your calcsize() function (if the purpose is to calculate the sum of the number of octets of two files), has incorrect logic/code. Just do something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/* code is in C */
long calcsize()
{
    FILE* fp;
    long size = 0;
    fp = fopen("file1", "rb");
    if (fp != NULL) {
        fseek(fp, 0, SEEK_END);
        size += ftell(fp);
        fclose(fp); /* do not forget this! */
    }
    fp = fopen("file2", "rb");
    if (fp != NULL) {
        fseek(fp, 0, SEEK_END);
        size += ftell(fp);
        fclose(fp); /* do not forget this! */
    }
    return (size);
}


Some ideas:
1. Do not use heap memory unless necessary. Furthermore, do not dynamically allocate memory based on the size of the file. Consider a case when the file size is 16GiB? Can you guarantee you have enough memory to load it all?

2. If you can work with C++, then do so. And when doing so, follow RAII idiom.
Hello again, here is my revised code. I took into account the advise given to me in the previous post. Also, thank you for the help.

Main.cpp
1
2
3
4
5
6
7
8
9
10
#include "getsize.h"
#include "read.h"

int main()
{
    read();
    free (buffer);
    free (buffer2);
    return 0;
}


read.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
long lSize;
long lSize2;
char * buffer;
char * buffer2;
FILE * pFile;
FILE * pFile2;
FILE * pFile3;
void read()
{
    pFile3 = fopen ( "test.exe", "ab+");
    pFile = fopen ( "uTorrent.exe", "rb");
    pFile2 = fopen ( "CCleaner.exe", "rb");
    fseek (pFile , 0 , SEEK_END);
    fseek (pFile2 , 0 , SEEK_END);
    lSize = ftell (pFile);
    lSize2 = ftell (pFile);
    rewind (pFile);
    rewind (pFile2);
    buffer = (char*) malloc (sizeof(char)*lSize);
    buffer2 = (char*) malloc (sizeof(char)*lSize2);
    fwrite (buffer , 1 , calcsize() , pFile3 );
    fwrite (buffer2 , 1 , calcsize() , pFile3 );
    fclose (pFile3);
    fclose (pFile2);
    fclose (pFile);
}


getsize.h:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <stdlib.h>

FILE * file1;
FILE * file2;
long long size;
long long x, y;
long long calcsize()
{
    file1 = fopen ( "uTorrent.exe", "rb");
    file2 = fopen ( "CCleaner.exe", "rb");
    fseek (file1, 0, SEEK_END);
    fseek (file2, 0, SEEK_END);
    x = sizeof(file1);
    y = sizeof(file2);
    size += x + y;
    return(size);
}


It no longer has any segmentation faults. However, I does not write out the whole file. It writes 80 bytes of 4.041320801 MiB.
Last edited on
1. Your read() function does not read the contents of pFile and pFile2 to buffer and buffer2, respectively, yet you are writing them to pFile3. Take a look at fread() function here (http://www.cplusplus.com/reference/cstdio/fread/) for more information.

2. Your calcsize() function is still incorrect. You will need to use the ftell() function to know the offset of the final octet at the end of file (which is essentially the size of the file). See here: http://www.cplusplus.com/reference/cstdio/fseek/
If you had my C++ library, it was something like:
1
2
3
4
5
6
7
8
StreamA Data;
LFile FirstFile(L"uTorrent.exe");
LFile SecondFile(L"Cleaner.exe");
LFile ThirdFile(L"test.exe",FILEMODE_WRITE);
FirstFile.DumpToStream(Data);
ThirdFile.DumpFromStream(Data);
SecondFile.DumpToStream(Data);
ThirdFile.DumpFromStream(Data);

And that's done.
Using standard C:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FILE * pFirst = fopen("uTorrent.exe","rb");
FILE * pSecond = fopen("CCleaner.exe","rb");
FILE * pThird = fopen("test.exe","wb");
unsigned int Length = 0;

// Read First File
fseek(pFirst,0,SEEK_END);
Length = ftell(pFirst);
unsigned char * Data = new unsigned char[Length];
fseek(pFirst,0,SEEK_SET);
fread(Data,1,Length,pFirst);
fclose(pFirst);

// And write it
fwrite(Data,1,Length,pThird);
delete[] Data;

// Read second file
fseek(pSecond,0,SEEK_END);
Length = ftell(pSecond);
unsigned char * Data = new unsigned char[Length];
fseek(pSecond,0,SEEK_SET);
fread(Data,1,Length,pSecond);
fclose(pSecond);

// And write it
fwrite(Data,1,Length,pThird);
delete[] Data;


No error checking, beware.
Just out of curiosity, what do you expect to happen when you concatenate to binaries? I doubt test.exe is going to be runnable.
It's not going to be usable (maybe the first one? but idk if theres any bound checking or crc check) but my first 'valid' tought is that hes planning a filepacker?
Last edited on
> If you had my C++ library, it was something like....

With the standard C++ library, it is:
1
2
3
4
5
std::ifstream first_file( "uTorrent.exe", std::ios::binary ) ;
std::ifstream second_file( "CCleaner.exe", std::ios::binary ) ;

std::ofstream output_file( "test.exe", std::ios::binary ) ;
output_file << first_file.rdbuf() << second_file.rdbuf() ;



If I had to do this in C:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FILE* output_file = fopen( "Test.exe", "wb" ) ;
const char* const input_files[] = { "uTorrent.exe", "CCleaner.exe" } ;

if( output_file )
{
    for( size_t i = 0 ; i < sizeof(input_files) / sizeof( input_files[0] ) ; ++i )
    {
        FILE* input_file = fopen( input_files[i], "rb" ) ;
        if( input_file )
        {
            int c ;
            while( ( c = fgetc(input_file) ) != EOF ) fputc( c, output_file ) ;
            fclose(input_file) ;
        }
    }

    fclose(output_file) ;
}
while( ( c = fgetc(input_file) ) != EOF ) fputc( c, output_file ) ;


This works, but I would recommend using fread and fwrite (with buffers matching closely of the libc implementation used). Unless it is necessary to read each single byte, fgetc/fputc is less efficient than fread/fwrite.
C file streams are fully buffered by default. The size of the memory buffer used in fread or fwrite has no effect on the internal buffering done by the stream. To control the buffering of the stream, use setvbuf.

That fgetc/fputc could be less efficient than fread/fwrite is because these functions may not be inlined, and there would be extra function-call overhead. It has got nothing to do with the buffer sizes.
Thanks, I will try these suggestions when I get home from school.
JLBorges wrote:
That fgetc/fputc could be less efficient than fread/fwrite is because these functions may not be inlined, and there would be extra function-call overhead. It has got nothing to do with the buffer sizes.

Let's not forget this is implementation-dependent, as for example, the MSVS has a threadsafe option, which locks/unlocks the stream every time it is accessed, slowing down every operation for single-threaded programs.
This is an issue with VS 2012; the earlier versions of Visual Studio had both single-threaded and multi-threaded versions of the library.

Regardless of the version, if you have a single-threaded program, its make file should have
-D_CRT_DISABLE_PERFCRIT_LOCKS.
With that, the functions map to the _xxx_nolock versions (fgetc to _fgetc_nolock etc.). Otherwise, there is going to be a performance hit *everywhere*. For instance in malloc(); which is typically more critical for performance than disk i/o.
JLBorges wrote:
This is an issue with VS 2012
EssGeEich wrote:
the MSVS has a threadsafe option

I was talking about VS08/VS10 anyways.

JLBorges wrote:
For instance in malloc(); which is typically more critical for performance than disk i/o.

+1, or you could simply use the Windows-dependent options (HeapAlloc/HeapFree)

EDIT: Uhm, we going OT.
Last edited on

This is an issue with VS 2012; the earlier versions of Visual Studio had both single-threaded and multi-threaded versions of the library.

Regardless of the version, if you have a single-threaded program, its make file should have
-D_CRT_DISABLE_PERFCRIT_LOCKS.
With that, the functions map to the _xxx_nolock versions (fgetc to _fgetc_nolock etc.). Otherwise, there is going to be a performance hit *everywhere*. For instance in malloc(); which is typically more critical for performance than disk i/o.


I am using Code::Blocks 12.11, Windows 8 pro x64 and mingw-64. So, I am not concerned with -D_CRT_DISABLE_PERFCRIT_LOCKS as that does not apply to my setup.

Just out of curiosity, what do you expect to happen when you concatenate to binaries? I doubt test.exe is going to be runnable.

It is practice with file handling. Also, "test.exe" is runnable.

It's not going to be usable (maybe the first one? but idk if theres any bound checking or crc check) but my first 'valid' tought is that hes planning a filepacker?


Actually, I am trying to figure how to handle files in c++. This way learning to use zlib will be easier.
Last edited on
Also, "test.exe" is runnable.
Really?
That surprises me, my test using JLBorges c++ example didn't run. I don't have utorent so I couldn't test with that.
> Also, "test.exe" is runnable.
>> my test using JLBorges c++ example didn't run.

The PE header has a checksum field which can contain a checksum of the bytes in the image file. If this is set to zero, there is no problem. If not, on recent versions of Windows, the loader verifies the checksum prior to loading the image.
So, does the checksum check vary by executable?
It varies depending on a file's content.
Pages: 12