Help needed with hash trees

Hello,

I'm a C++ newbie, but might have set myself a bit too much of a challenge here.

I'm trying to re-implement the tree hash calculation required for Amazon in C++ (they provide a Java and C# example here.... http://docs.amazonwebservices.com/amazonglacier/latest/dev/checksum-calculations.html ).

This is what I've got so far.......

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
int file_sha256_tree(FILE *f) {
    long fSz;
    const long cZero = 0L;
    ldiv_t fChunks;
    fseek(f, 0, SEEK_END);
    fSz = ftell(f);
    fseek(f, 0, SEEK_SET);
    //
    fChunks = div(fSz, oneMB);
    if (fChunks.rem > 0) {
        fChunks.quot++;
    }
    if (fChunks.quot == cZero) {
        file_sha256(f);
    }
    byteVoV chunkHashes;
    //
    byte hash[SHA256_DIGEST_LENGTH];
    SHA256_CTX sha256;
    byte *buffer = (byte*) malloc(oneMB);
    int bytesRead = 0;
    int idx = 0;
    while ((bytesRead = fread(buffer, 1, oneMB, f))) {
        SHA256_Init(&sha256);
        SHA256_Update(&sha256, buffer, bytesRead);
        SHA256_Final(hash, &sha256);
        //chunkHashes.push_back(hash);
        chunkHashes[idx].assign(hash, hash + SHA256_DIGEST_LENGTH);
        idx++;
    }
    fclose(f);
    free(buffer);
    byteVoV L1Hashes = chunkHashes;
    while (L1Hashes.size() > 1) {
        div_t subLen = div(L1Hashes.size(), cTWO);
        if (subLen.rem > 0) {
            subLen.quot++;
        }
        byteVoV L2Hashes;
        int j = 0;
        for (int i = 0; i < L1Hashes.size(); i = i + 2, j++) {
            if (L1Hashes.size() > 1) {
                SHA256_Init(&sha256);
                byte hshData[SHA256_DIGEST_LENGTH];
                copy(L1Hashes[i].begin(), L1Hashes[i].begin() + SHA256_DIGEST_LENGTH, hshData);
                SHA256_Update(&sha256, hshData, SHA256_DIGEST_LENGTH);
                copy(L1Hashes[i + 1].begin(), L1Hashes[i + 1].begin() + SHA256_DIGEST_LENGTH, hshData);
                SHA256_Update(&sha256, hshData, SHA256_DIGEST_LENGTH);
                SHA256_Final(hshData, &sha256);
                L2Hashes[j].assign(hshData, hshData + SHA256_DIGEST_LENGTH);
            } else {
                L2Hashes[j] = L1Hashes[i];
            }
        }

        L1Hashes = L2Hashes;
    }
    
    byte rootHash[SHA256_DIGEST_LENGTH];
    copy(L1Hashes[0].begin(), L1Hashes[0].begin() + SHA256_DIGEST_LENGTH, rootHash);
    hex_convert(rootHash);
    //
    return 0;
}



But I get a segfault 11, and gdb shows :
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000010
0x00000001000028ea in std::vector<unsigned char, std::allocator<unsigned char> >::capacity (this=0x0) at stl_vector.h:434
434
1
2
3
4
5
6
7
(gdb) backtrace
#0  0x00000001000028ea in std::vector<unsigned char, std::allocator<unsigned char> >::capacity (this=0x0) at stl_vector.h:434
#1  0x0000000100004766 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_assign_aux<unsigned char*> (this=0x0, __first=0x7fff5fbff940 "Ҥ??aS?????*?\021??yܨ?y??k\"?X\004??P", __last=0x7fff5fbff960 "") at vector.tcc:216
#2  0x0000000100004a29 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_assign_dispatch<unsigned char*> (this=0x0, __first=0x7fff5fbff940 "Ҥ??aS?????*?\021??yܨ?y??k\"?X\004??P", __last=0x7fff5fbff960 "") at stl_vector.h:852
#3  0x0000000100004a69 in std::vector<unsigned char, std::allocator<unsigned char> >::assign<unsigned char*> (this=0x0, __first=0x7fff5fbff940 "Ҥ??aS?????*?\021??yܨ?y??k\"?X\004??P", __last=0x7fff5fbff960 "") at stl_vector.h:317
#4  0x00000001000016da in file_sha256_tree (f=0x7fff70c6cee0) at main.cpp:87
#5  0x0000000100002127 in main (argc=3, argv=0x7fff5fbffb60) at main.cpp:144 


(the above function starts at line 60, so you'll have to subtract 60 from the line reference in the backtrace)


Problem is I can't quite pinpoint where I've gone wrong, so ideas would be most welcome !
Last edited on
Why are you using malloc and free in a C++ program?
Aside from that, here you're trying to access a vector element, even though the vector is empty:
chunkHashes[idx].assign(hash, hash + SHA256_DIGEST_LENGTH);
Hi Athar,

Thanks for taking the time to reply.

Re: malloc/free.... my mistake, a misunderstanding, will fix that.

Re: accessing empty vector element, this might have changed for the better now, as I've re-written a little bit using push_back, the code runs through without segfault but doesn't calculate the hash correctly....

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
int file_sha256_tree(FILE *f) {
    long fSz;
    const long cZero = 0L;
    ldiv_t fChunks;
    fseek(f, 0, SEEK_END);
    fSz = ftell(f);
    fseek(f, 0, SEEK_SET);
    //
    fChunks = div(fSz, oneMB);
    if (fChunks.rem > 0) {
        fChunks.quot++;
    }
    if (fChunks.quot == cZero) {
        file_sha256(f);
    }
    byteVoV chunkHashes;
    //
    byte hash[SHA256_DIGEST_LENGTH];
    SHA256_CTX sha256;
    byte *buffer = (byte*) malloc(oneMB);
    int bytesRead = 0;
    int idx = 0;
    while ((bytesRead = fread(buffer, 1, oneMB, f))) {
        SHA256_Init(&sha256);
        SHA256_Update(&sha256, buffer, bytesRead);
        SHA256_Final(hash, &sha256);
        //chunkHashes[idx].assign(hash, hash + SHA256_DIGEST_LENGTH);
        toAppend.insert(toAppend.begin(),hash[0],SHA256_DIGEST_LENGTH);
        chunkHashes.push_back(toAppend);
        //chunkHashes[idx].insert(chunkHashes[idx].begin(),hash[0], SHA256_DIGEST_LENGTH);
        //
        // // // idx++;
    }
    fclose(f);
    free(buffer);
    byteVoV L1Hashes = chunkHashes;
    while (L1Hashes.size() > 1) {
        div_t subLen = div(L1Hashes.size(), cTWO);
        if (subLen.rem > 0) {
            subLen.quot++;
        }
        byteVoV L2Hashes;
        int j = 0;
        for (int i = 0; i < L1Hashes.size(); i = i + 2, j++) {
            if (L1Hashes.size() > 1) {
                SHA256_Init(&sha256);
                byte hshData[SHA256_DIGEST_LENGTH];
                copy(L1Hashes[i].begin(), L1Hashes[i].begin() + SHA256_DIGEST_LENGTH, hshData);
                SHA256_Update(&sha256, hshData, SHA256_DIGEST_LENGTH);
                copy(L1Hashes[i + 1].begin(), L1Hashes[i + 1].begin() + SHA256_DIGEST_LENGTH, hshData);
                SHA256_Update(&sha256, hshData, SHA256_DIGEST_LENGTH);
                SHA256_Final(hshData, &sha256);
                // L2Hashes[j].assign(hshData, hshData + SHA256_DIGEST_LENGTH);
                toAppend.insert(toAppend.begin(),hshData[0],SHA256_DIGEST_LENGTH);
                L2Hashes.push_back(toAppend);
                //
            } else {
                // L2Hashes[j] = L1Hashes[i];
                copy(L2Hashes[j].begin(),L2Hashes[j].end(),L1Hashes[i].begin());
            }
        }

        L1Hashes = L2Hashes;
    }
    
    byte rootHash[SHA256_DIGEST_LENGTH];
    copy(L1Hashes[0].begin(), L1Hashes[0].begin() + SHA256_DIGEST_LENGTH, rootHash);
    hex_convert(rootHash);
    //
    // cout << fSz << "...." << oneMB << "......" << fChunks.quot << ".....";
    return 0;
}





So I'm thinking perhaps " L1Hashes = L2Hashes;" isn't doing what I think it should ?
I'm not exactly sure what you're doing there. I suggest moving calculating the hash into its own function, so that the code becomes easier to read. I'd also replay the loop on paper for 5 and 6 blocks to see whether the right blocks are being merged.
The structure should look somewhat like this if you only care about the final hash (I verified that it works):
1
2
3
4
5
6
7
8
9
10
11
12
vector<SHA256Hash> blockHashes=calculateBlockHashesForFile("...");
while (blockHashes.size()>1)
{
  //calculate next level
  for (size_t i=0;i<blockHashes.size()/2;i++)
      blockHashes[i]=mergeHash(blockHashes[i*2],blockHashes[i*2+1]);
  //promote lone leaf, if any
  if (blockHashes.size()%2!=0)blockHashes[blockHashes.size()/2]=blockHashes.back();
  //get rid of remaining leaf nodes
  blockHashes.resize((blockHashes.size()+1)/2);
}
cout << "SHA-256 Tree Hash: " << blockHashes[0].toString() << endl;


Edit: this line in your code looks fishy:
if (L1Hashes.size() > 1) {
You need to execute the else statement when there's only one node left, but since you're not removing any elements from L1Hashes, this is never going to happen.
Last edited on
Thank you Athar. I appreciate your feedback.

I will go experiment over the next day or so and let you know how I get on (hopefully with good news !).

Have a great day.
Athar,

Thank you very much ! In the end, I scrapped my spaghetti code and used your tidier version as a base template to build the rest.

Looks like I'm going to have to read up on vectors to understand how you managed to reduce the logic to such a neat paragraph. But I guess you've been programming C++ for a lot longer (I come to it from higher-level languages, e.g. PHP, Perl etc..... so its more challenging for me !).

Thanks again !
Topic archived. No new replies allowed.