> Should i try using some hash thing then? Like CRC32 or MD5 to check for duplicates
Computing the hash requires processing every byte in the file. If two files hash to the same value, they are probably identical (with the probability close to one if the hash is a cryptographically strong hash); you still need to compare the bytes. Computing a hash will speed up things if it can be computed once, cached, and then reused many times.
1. Compare file sizes
2. If they are equal, and if we have the pre-computed hash values for atleast one file, compare the hashes.
3. If the hashes are equal (or step two was not performed), compare the files byte by byte.