crc32 - Python CRC-32 woes -
i'm writing python program extract data middle of 6 gb bz2 file. bzip2 file made of independently decryptable blocks of data, need find block (they delimited magic bits), create temporary one-block bzip2 file in memory, , pass bz2.decompress function. easy, no?
the bzip2 format has crc32 checksum file @ end. no problem, binascii.crc32 rescue. wait. data checksummed not end on byte boundary, , crc32 function operates on whole number of bytes.
my plan: use binascii.crc32 function on last byte, , function of own update computed crc last 1–7 bits. hours of coding , testing has left me bewildered, , puzzlement can boiled down question: how come crc32("\x00") not 0x00000000? shouldn't be, according wikipedia article?
you start 0b00000000 , pad 32 0's, polynomial division 0x04c11db7 until there no ones left in first 8 bits, immediately. last 32 bits checksum, , how can not zeroes?
i've searched google answers , looked @ code of several crc-32 implementations without finding clue why so.
how come crc32("\x00") not 0x00000000?
the basic crc algorithm treat input message polynomial in gf(2), divide fixed crc polynomial, , use polynomial remainder resulting hash.
crc-32 makes number of modifications on basic algorithm:
- the bits in each byte of message reversed. example, byte 0x01 treated polynomial x^7, not polynomial x^0.
- the message padded 32 zeros on right side.
- the first 4 bytes of reversed , padded message xor'd 0xffffffff.
- the remainder polynomial reversed.
- the remainder polynomial xor'd 0xffffffff.
- and recall crc-32 polynomial, in non-reversed form, 0x104c11db7.
let's work out crc-32 of one-byte string 0x00:
- message: 0x00
- reversed: 0x00
- padded: 0x00 00 00 00 00
- xor'd: 0xff ff ff ff 00
- remainder when divided 0x104c11db7: 0x4e 08 bf b4
- xor'd: 0xb1 f7 40 4b
- reversed: 0xd2 02 ef 8d
and there have it: crc-32 of 0x00 0xd202ef8d.
(you should verify this.)
Comments
Post a Comment