Failing to properly compare MD5 checksums

I'm writing a mod manager for Linux. The unmodded game requires a few changes in the configjuration files in order to be mod-ready. I need a function that compares two MD5 checksums and see if there are any differences in the file. Unfortunately, it's not working properly, I thought the issue was that the function returned null-terminated strings but that doesn't seem to be the issue now.

Here are the file paths and checksums:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
  std::vector<std::string> path = {
    homedir + "/.local/share/feral-interactive/Empire/AppData/scripts/preferences.empire_script.txt",
    homedir + "/.local/share/feral-interactive/Empire/preferences",
    empire + "data/campaigns/main/scripting.lua",
    empire + "data/campaigns/main/startpos.esf",
    empire + "data/campaign_maps/global_map/america_lookup.tga"
  };
  const char* path_md5[] = {
    "fbf65ef80563e0bd76836c150e49ecd0",
    "a0f283d89d97a00011f292cc8f9c0d24",
    "89b18f2c2c3f98b6fed7ed7ae4e3e4f9",
    "084b14b4174b93c6489422fe33dc7b2b",
    "086f52fb74d2d0c03ef4e0a969fac4d9"
  };


Here is my MD5 function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// Get the size of the file by its file descriptor
unsigned long get_size_by_fd(int fd)
{
    struct stat statbuf;
    if(fstat(fd, &statbuf) < 0) exit(-1);
    return statbuf.st_size;
}

// Print the MD5 sum as hex-digits.
void print_md5_sum(unsigned char* md)
{
    for (unsigned int i = 0; i < MD5_DIGEST_LENGTH; i++)
    {
      printf("%02x", md[i]);
    }
}

int md5sum_check(std::string filepath, const char* HASH)
{
  unsigned char result[MD5_DIGEST_LENGTH];
  int file_descript;
  unsigned long file_size;
  char* file_buffer;

  file_descript = open(filepath.c_str(), O_RDONLY);
  if (file_descript < 0)
  {
    std::cerr << "Unable to open " << filepath << std::endl;
    return 1;
  }

  file_size = get_size_by_fd(file_descript);
  file_buffer = static_cast<char*>(mmap((caddr_t)0, file_size, PROT_READ, MAP_SHARED, file_descript, 0));
  MD5((unsigned char*) file_buffer, file_size, result);
  munmap(file_buffer, file_size);
  void* c = memchr(result, '\0', MD5_DIGEST_LENGTH);
  printf("%c\n", c);
  /*
  * 1. Checks if there's any difference between the two hashes
  * 2. Checks if hash is null-terminated
  */
  if ((memcmp(result, HASH, MD5_DIGEST_LENGTH) != 0) && (c != NULL))
  {
    print_md5_sum(result);
    std::cerr << " != " << HASH << "\n" << filepath << " is not calibrated!" << std::endl;
    return 0;
  }

  print_md5_sum(result);
  std::cout << " = " << HASH << "\n"<< filepath << " is calibrated." << std::endl;
  return 0;
}


Here is my output!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
fbf65ef80563e0bd76836c150e49ecd0 = fbf65ef80563e0bd76836c150e49ecd0
/home/pradana/.local/share/feral-interactive/Empire/AppData/scripts/preferences.empire_script.txt is calibrated.
�
a0f283d89d97a00011f292cc8f9c0d24 != a0f283d89d97a00011f292cc8f9c0d24
/home/pradana/.local/share/feral-interactive/Empire/preferences is not calibrated!

5af6ff56f3e27624f6a24193de8ec0ec = 89b18f2c2c3f98b6fed7ed7ae4e3e4f9
/home/pradana/.local/share/Steam/steamapps/common/Empire Total War/data/campaigns/main/scripting.lua is calibrated.
�
1b72009f1bbf1c1749bacdbcc4230b43 != 084b14b4174b93c6489422fe33dc7b2b
/home/pradana/.local/share/Steam/steamapps/common/Empire Total War/data/campaigns/main/startpos.esf is not calibrated!

086f52fb74d2d0c03ef4e0a969fac4d9 = 086f52fb74d2d0c03ef4e0a969fac4d9
/home/pradana/.local/share/Steam/steamapps/common/Empire Total War/data/campaign_maps/global_map/america_lookup.tga is calibrated.


You can deduce very easily which files have been modified and which have not by comparing the checksums with your own eyes, but the computer doesn't seem to be following.
Last edited on
First, calling memch() on an MD5 digest makes no sense. MD5 produces simple arrays of 16 arbitrary byte values. Both "00000000000000000000000000000000" and "11111111111111111111111111111111" are valid MD5 digests. In other words, MD5 digests are not null-terminated, they're fixed- length.

Second, you're comparing result, which is an array of 16 bytes, to HASH, which is a null-terminated string of ASCII characters. I don't really know why you got any matches, honestly.
If you want to compare them the you either need to convert result into a string (and compare with strcmp()) or HASH into a digest (and compare with memcmp(a, b, 16)).
Thank you for the feedback. I tried casting result to const char* and then comparing the two strings using strcmp, but it doesn't work:

1
2
3
4
const char* result_char = (const char*) result;
  if ((strcmp(result_char, hash) != 0))
{
...


hash is nul-terminated but I'm not sure if casting a char array to a char pointer makes it nul-terminated in C++.
Last edited on
I'm not if casting a char array to a char pointer makes it nul-terminated in C++.
No, casting from one pointer type to another does not change the data. If you for example do something like
 
std::cout << (const char *)result;
That's not going to print a string of hex numbers. It will most likely just print garbage and possibly crash the program.

I tried casting result to const char*
I didn't say "cast", I said "convert". An MD5 digest contains 16 bytes, and its ASCII hex representation contains 32 characters (not counting the null terminator), so obviously casting is not going to cut it.
Last edited on
Right. Going through Stack Overflow, I found a function that should fix the issue:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const char* md5_to_hex(unsigned char* md5)
{
  static const char hexchars[] = "0123456789abcdef";
  std::string result;

  for (int i = 0; i < MD5_DIGEST_LENGTH; i++)
  {
      unsigned char b = md5[i];
      char hex[3];

      hex[0] = hexchars[b >> 4];
      hex[1] = hexchars[b & 0xF];
      hex[2] = 0;

      result.append(hex);
  }

  return result.c_str();
}


That didn't work. I found another one:

1
2
3
4
5
6
7
8
9
10
11
12
13
const char* to_hex(unsigned char* md5)
{
    static const char digits[] = "0123456789abcdef";
    std::string result;

    for (int i=0; i< MD5_DIGEST_LENGTH; i++)
    {
        result += digits[md5[i] / MD5_DIGEST_LENGTH];
        result += digits[md5[i] % MD5_DIGEST_LENGTH];
    }

    return result.c_str();
}


That didn't work either.
Don't do this:
1
2
3
4
const char *f(){
    std::string a_local_string;
    return a_local_string.c_tr();
}
Typically, implementations of std::string::c_tr() return a pointer to the std::string's internal buffer, which becomes invalid when the function returns. So you just end up returning a dangling pointer to the caller.

Just return the std::string from the functions.
Thank you for the feedback! The problem is fixed.
Registered users can post here. Sign in or register to post.