Saturday, September 17, 2011

A simple script to check those crc32 checksums

Some files you get online come with a crc32 checksum as part of the filename, but without an accompanying checksum file that you could use to verify said files. Since it's annoying to manually create such files (especially when there's a lot of files a set), why not just get Python to do it for us?

from glob import glob
from zlib import crc32

First, start with the imports. We'll need glob (to get the file list) and crc32 (for the checksums, of course).

def checksum(filename):
    mask = 0xffffffff
    
    with open(filename) as f:
        return hex(crc32(f.read()) & mask)[2:-1]

Above is a simple function to generate the crc32 checksum. Just open a given file, read the contents, generate the crc32 and return the results. Nothing fancy.

files = glob('*.mkv')

Next is getting a list of files we want to check. It's usually videos that I download that have these annoying crc32 without a checksum file to verify them with. Here, I get a list of all Matroska videos in the current directory/folder. You'll need to change this according to your needs.

for f in sorted(files):
    cksum = checksum(f)
    
    result = 'OK' if cksum in f or cksum.upper() in f else 'FAILED'
    print '%s: %s' % (f, result)

Finally, we just loop through the list of files, check that the generated crc32 is indeed the same as the crc32 on the filename, and print the results. Of course, we're assuming whoever named the files didn't mess up the crc32 here, so failure means there's some error in the downloaded file.

Here's the completed script:

from glob import glob
from zlib import crc32

def checksum(filename):
    mask = 0xffffffff
    
    with open(filename) as f:
        return hex(crc32(f.read()) & mask)[2:-1]
        
files = glob('*.mkv')

for f in sorted(files):
    cksum = checksum(f)
    
    result = 'OK' if cksum in f or cksum.upper() in f else 'FAILED'
    print '%s: %s' % (f, result)

Save the entire script as verify.py. To use the script, just copy it to the location that contains files in need of checking and run from terminal:

~$ python verify.py

The rest is just a matter of waiting for the results.