Some old CRC32 code

I'm bored. Here's some code from an old lib I worked on.

Most CRC32 libs I've used sucked and were harder to use than I'd like, and/or assumed you could hash all the data at once (which you can't do if you're streaming data over time) so I made this one.

I think it's also the only time I can remember where using the casting operator actually made sense.

Also, yes I used const void* instead of const char*. I'm still torn as to which is the better approach. Feel free to chime in with an opinion on that, or on anything else. I'm always open to feedback.

The HashBase parent is an abstract class. My original goal was to have different Hash forumuals (CRC, MD5, etc) abstracted via the same Reset/Hash interface. But I never got around to doing anything but CRC32

Usage example:
1
2
3
4
5
6
7
8
9
10
//=== if full buffer available
u32 mycrc = CRC32( somedata, sizeofdata );

//=== if streaming
CRC32 foo;
foo.Hash( somedata, sizeofdata );
foo.Hash( moredata, sizeofmore );
//...

u32 crc = foo;


crc32.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class CRC32 : public HashBase
{
public:
    //=========================================
    //  ctors
    inline CRC32()                                  { Reset();                  }
    inline CRC32(const void* buf,uint_t siz)        { Reset(); Hash(buf,siz);   }

    //=========================================
    // implicit cast, so that you can do something like foo = CRC(dat,siz);
    inline operator u32 () const                    { return Get();             }

    //=========================================
    // getting the crc
    inline u32          Get() const                 { return ~mCrc;             }

    //=========================================
    // HashBase stuff
    virtual void        Reset()                     { mCrc = ~0;                }
    virtual void        Hash(const void* buf,uint_t siz);

private:
    u32         mCrc;
    static bool mTableBuilt;
    static u32  mTable[0x100];

    static const u32        POLYNOMIAL = 0x04C11DB7;

private:
    //=========================================
    // internal support
    static void         BuildTable();
    static u32          Reflect(u32 v,int bits);
};


crc32.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

//=====================================================
bool    CRC32::mTableBuilt = false;
u32     CRC32::mTable[0x100];

//=====================================================

void CRC32::BuildTable()
{
    int i, j;
    for(i = 0; i < 0x100; ++i)
    {
        mTable[i] = Reflect(static_cast<u32>(i),8) << 24;

        for(j = 0; j < 8; ++j)
            mTable[i] = (mTable[i] << 1) ^ ( (mTable[i] & (1<<31))  ? POLYNOMIAL : 0);

        mTable[i] = Reflect(mTable[i],32);
    }
    mTableBuilt = true;
}

u32 CRC32::Reflect(u32 v,int bits)
{
    u32 ret = 0;
    int i;

    --bits;
    for(i = 0; i <= bits; ++i)
    {
        if(v & (1<<i))
            ret |= 1 << (bits-i);
    }
    return ret;
}

//=====================================================

void CRC32::Hash(const void* buf,uint_t siz)
{
    //=============================
    const u8* p = reinterpret_cast<const u8*>(buf);

    //=============================
    if(!mTableBuilt)
        BuildTable();

    uint_t i;
    for(i = 0; i < siz; ++i)
        mCrc = (mCrc >> 8) ^ mTable[ (mCrc & 0xFF) ^ p[i] ];
}
Also, yes I used const void* instead of const char*. I'm still torn as to which is the better approach.
Whenever you're expecting a byte buffer, just accept a void * (or const void *). If you take an un/signed char * and the user has a buffer of the other type (or some other type) you're forcing them to make a cast. It's a mild annoyance that doesn't hurt, but doesn't need to be there, either.

I would (in fact, that's how I do it) simply hard-code the lookup table, rather than generate it at run time. Mostly for thread-safety. Here, I even have it nicely formatted into the closest thing to a square that was possible:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
0x00000000,0x77073096,0xEE0E612C,0x990951BA,0x076DC419,0x706AF48F,0xE963A535,0x9E6495A3,
0x0EDB8832,0x79DCB8A4,0xE0D5E91E,0x97D2D988,0x09B64C2B,0x7EB17CBD,0xE7B82D07,0x90BF1D91,
0x1DB71064,0x6AB020F2,0xF3B97148,0x84BE41DE,0x1ADAD47D,0x6DDDE4EB,0xF4D4B551,0x83D385C7,
0x136C9856,0x646BA8C0,0xFD62F97A,0x8A65C9EC,0x14015C4F,0x63066CD9,0xFA0F3D63,0x8D080DF5,
0x3B6E20C8,0x4C69105E,0xD56041E4,0xA2677172,0x3C03E4D1,0x4B04D447,0xD20D85FD,0xA50AB56B,
0x35B5A8FA,0x42B2986C,0xDBBBC9D6,0xACBCF940,0x32D86CE3,0x45DF5C75,0xDCD60DCF,0xABD13D59,
0x26D930AC,0x51DE003A,0xC8D75180,0xBFD06116,0x21B4F4B5,0x56B3C423,0xCFBA9599,0xB8BDA50F,
0x2802B89E,0x5F058808,0xC60CD9B2,0xB10BE924,0x2F6F7C87,0x58684C11,0xC1611DAB,0xB6662D3D,
0x76DC4190,0x01DB7106,0x98D220BC,0xEFD5102A,0x71B18589,0x06B6B51F,0x9FBFE4A5,0xE8B8D433,
0x7807C9A2,0x0F00F934,0x9609A88E,0xE10E9818,0x7F6A0DBB,0x086D3D2D,0x91646C97,0xE6635C01,
0x6B6B51F4,0x1C6C6162,0x856530D8,0xF262004E,0x6C0695ED,0x1B01A57B,0x8208F4C1,0xF50FC457,
0x65B0D9C6,0x12B7E950,0x8BBEB8EA,0xFCB9887C,0x62DD1DDF,0x15DA2D49,0x8CD37CF3,0xFBD44C65,
0x4DB26158,0x3AB551CE,0xA3BC0074,0xD4BB30E2,0x4ADFA541,0x3DD895D7,0xA4D1C46D,0xD3D6F4FB,
0x4369E96A,0x346ED9FC,0xAD678846,0xDA60B8D0,0x44042D73,0x33031DE5,0xAA0A4C5F,0xDD0D7CC9,
0x5005713C,0x270241AA,0xBE0B1010,0xC90C2086,0x5768B525,0x206F85B3,0xB966D409,0xCE61E49F,
0x5EDEF90E,0x29D9C998,0xB0D09822,0xC7D7A8B4,0x59B33D17,0x2EB40D81,0xB7BD5C3B,0xC0BA6CAD,
0xEDB88320,0x9ABFB3B6,0x03B6E20C,0x74B1D29A,0xEAD54739,0x9DD277AF,0x04DB2615,0x73DC1683,
0xE3630B12,0x94643B84,0x0D6D6A3E,0x7A6A5AA8,0xE40ECF0B,0x9309FF9D,0x0A00AE27,0x7D079EB1,
0xF00F9344,0x8708A3D2,0x1E01F268,0x6906C2FE,0xF762575D,0x806567CB,0x196C3671,0x6E6B06E7,
0xFED41B76,0x89D32BE0,0x10DA7A5A,0x67DD4ACC,0xF9B9DF6F,0x8EBEEFF9,0x17B7BE43,0x60B08ED5,
0xD6D6A3E8,0xA1D1937E,0x38D8C2C4,0x4FDFF252,0xD1BB67F1,0xA6BC5767,0x3FB506DD,0x48B2364B,
0xD80D2BDA,0xAF0A1B4C,0x36034AF6,0x41047A60,0xDF60EFC3,0xA867DF55,0x316E8EEF,0x4669BE79,
0xCB61B38C,0xBC66831A,0x256FD2A0,0x5268E236,0xCC0C7795,0xBB0B4703,0x220216B9,0x5505262F,
0xC5BA3BBE,0xB2BD0B28,0x2BB45A92,0x5CB36A04,0xC2D7FFA7,0xB5D0CF31,0x2CD99E8B,0x5BDEAE1D,
0x9B64C2B0,0xEC63F226,0x756AA39C,0x026D930A,0x9C0906A9,0xEB0E363F,0x72076785,0x05005713,
0x95BF4A82,0xE2B87A14,0x7BB12BAE,0x0CB61B38,0x92D28E9B,0xE5D5BE0D,0x7CDCEFB7,0x0BDBDF21,
0x86D3D2D4,0xF1D4E242,0x68DDB3F8,0x1FDA836E,0x81BE16CD,0xF6B9265B,0x6FB077E1,0x18B74777,
0x88085AE6,0xFF0F6A70,0x66063BCA,0x11010B5C,0x8F659EFF,0xF862AE69,0x616BFFD3,0x166CCF45,
0xA00AE278,0xD70DD2EE,0x4E048354,0x3903B3C2,0xA7672661,0xD06016F7,0x4969474D,0x3E6E77DB,
0xAED16A4A,0xD9D65ADC,0x40DF0B66,0x37D83BF0,0xA9BCAE53,0xDEBB9EC5,0x47B2CF7F,0x30B5FFE9,
0xBDBDF21C,0xCABAC28A,0x53B39330,0x24B4A3A6,0xBAD03605,0xCDD70693,0x54DE5729,0x23D967BF,
0xB3667A2E,0xC4614AB8,0x5D681B02,0x2A6F2B94,0xB40BBE37,0xC30C8EA1,0x5A05DF1B,0x2D02EF8D
That's actually a really good idea. Thanks. It really doesn't make much sense to compute that at runtime now that I think about it.

I don't think the above is necessarily thread unsafe, just that it could potentially be wasteful (the table could be rebuilt several times if multiple threads build it at once).
Ok, here is something to blow your mind :P

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include <iostream>
using namespace std;

template <bool Condition, class True, class False>
struct BranchType
{
    typedef True Type;
};

template<class True, class False>
struct BranchType<false,True,False>
{
    typedef False Type;
};

struct Zero
{
    enum{RESULT=0};
};

template <int Bits, int I, int RET>
struct NewRet
{
    static const int RESULT;
};

template <int Bits, int I, int RET>
const int NewRet<Bits,I,RET>::RESULT=RET|(1<<(Bits-1-I));

template <unsigned long Value, int Bits, int I=0, int RET=0>
struct CT_Reflect
{
    static const int RESULT;
    static const int NEW_RET;
};

template <unsigned long Value, int Bits, int I, int RET>
const int CT_Reflect<Value,Bits,I,RET>::NEW_RET=

    BranchType<(I<Bits),
    NewRet<Bits,I,RET>,
    Zero>::Type::RESULT;

template <unsigned long Value, int Bits, int I, int RET>
const int CT_Reflect<Value,Bits,I,RET>::RESULT=

    RET| BranchType<(Value & (1<<I)),

            typename BranchType<(I<Bits),
            CT_Reflect<Value,Bits,I+1,NEW_RET>,
            Zero>::Type,

            typename BranchType<(I<Bits),
            CT_Reflect<Value,Bits,I+1,RET>,
            Zero>::Type

        >::Type::RESULT;

unsigned long RT_Reflect(unsigned long v,int bits)
{
    unsigned long ret = 0;
    int i;

    --bits;
    for(i = 0; i <= bits; ++i)
    {
        if(v & (1<<i))
            ret |= 1 << (bits-i);
    }
    return ret;
}

int main()
{
    cout << CT_Reflect<3,4>::RESULT << endl;
    cout << RT_Reflect(3,4) << endl;

    cout << CT_Reflect<2,4>::RESULT << endl;
    cout << RT_Reflect(2,4) << endl;

    cout << CT_Reflect<16,5>::RESULT << endl;
    cout << RT_Reflect(16,5) << endl;

    cout << CT_Reflect<11,4>::RESULT << endl;
    cout << RT_Reflect(11,4) << endl;

    cout << CT_Reflect<24,5>::RESULT << endl;
    cout << RT_Reflect(24,5) << endl;

    cout << "\nhit enter to quit...";
    cin.get();
    return 0;
}

Template meta-programming allows you to calculate anything (that can be calculated) in compile-time. This way, you get to keep your algorithm and save rewriting your hardcoded table every time you want to change, let's say, your POLYNOMIAL.

I only did it for the Reflect function because I was bored haha, but I believe the idea isn't hard to grasp.
I've seen and heard of template metaprogramming before.

And while it does provide benefits, the blasphemous atrocity of mangled code that's required to accomplish it is simply not worth it IMO.

Though I value the insight!
Then I guess you'd be loath to taking two iterators (in a template function) (the C++ solution) as opposed to a const void* and a length (the C solution).

The nice thing about the template solution above is that it is basically zero maintenance code, as opposed to hardcoding a lookup table which is more likely to require maintenance (change the polynomial) and have errors in it.
Well, the CRC32 algorithm and its associated constants aren't likely to change, and using iterators leaves the user open to accidentally using pointer arithmetic when they really meant to pass as a raw buffer.
Disch wrote:
...blasphemous atrocity of mangled code...

Hahahahaha, straightforward and descriptive, as always :D

Disch wrote:
Though I value the insight!

(^_^)
Honestly I never even thought about taking an iterator approach. That's an interesting idea. You'd have to do some reinterpret casting though...

1
2
3
4
5
6
for(; i != last; ++i)
{
  const u8* p = reinterpret_cast<const u8*>(&*i);
  for(int j = 0; j < sizeof(*i); ++j)
    crc_code( p[j] );
}
You would allow the user to pass a pointer to a buffer of unsigned or signed chars only.
Anything else you wouldn't want to compile anyway.

helios, I don't understand your comment. Passing an iterator pair to define a range is a standard technique in the STL.


Or something like

1
2
3
4
5
6
template< typename T >
CRC32& operator()( const T& t )
{
    Hash( boost::addressof( t ), sizeof( T ) );
    return *this;
}


I'm not all that familar with boost... what does addressof() do that makes it different from the address of operator?
helios, I don't understand your comment. Passing an iterator pair to define a range is a standard technique in the STL.
Yes, but the difference is that all iterators on the STL operate on elements, not raw bytes. It's obviously not the same to do ++(unsigned char *)p as ++(int *)p. An iterator is... somewhat out of place. I mean, I don't think someone would want to pass an std::list<char>::iterator.
Meh... seems like a rather silly thing to add an additional lib dependency for.

You would allow the user to pass a pointer to a buffer of unsigned or signed chars only.
Anything else you wouldn't want to compile anyway.


This is actually probably true, since anything else would introduce endian issues.
Topic archived. No new replies allowed.