How to get the MD5 hash of big files in Python?

Sometimes, we want to get the MD5 hash of big files in Python.

In this article, we’ll look at how to get the MD5 hash of big files in Python.

How to get the MD5 hash of big files in Python?

To get the MD5 hash of big files in Python, we can use the md5.update and md5.digest methods.

For instance, we write:

import hashlib
import os


def generate_file_md5(rootdir, filename, blocksize=2**20):
    m = hashlib.md5()
    with open(os.path.join(rootdir, filename), "rb") as f:
        while True:
            buf = f.read(blocksize)
            if not buf:
                break
            m.update(buf)
    return m.hexdigest()


print(generate_file_md5('', 'img.png'))

We create the generate_file_md5 function that takes the rootdir, filename, and blocksize parameters.

In the function, we open the file with open with 'rb' permission to let us read the file in blocks.

We use os.path.join(rootdir, filename) to get the full path of the file.

Next, we have a while loop that loops through each block of the file that’s read with f.read.

And we call m.update on each block that’s read to update the md5 hash after each block is read.

Finally, we return md5 hash of the full file with m.hexdigest().

As a result, the print output should be something like 'b60ab2708daec7685f3d412a5e05191a'.

Conclusion

To get the MD5 hash of big files in Python, we can use the md5.update and md5.digest methods.