Sometimes, we want to get the MD5 hash of big files in Python.
In this article, we’ll look at how to get the MD5 hash of big files in Python.
How to get the MD5 hash of big files in Python?
To get the MD5 hash of big files in Python, we can use the md5.update
and md5.digest
methods.
For instance, we write:
import hashlib
import os
def generate_file_md5(rootdir, filename, blocksize=2**20):
m = hashlib.md5()
with open(os.path.join(rootdir, filename), "rb") as f:
while True:
buf = f.read(blocksize)
if not buf:
break
m.update(buf)
return m.hexdigest()
print(generate_file_md5('', 'img.png'))
We create the generate_file_md5
function that takes the rootdir
, filename
, and blocksize
parameters.
In the function, we open the file with open
with 'rb'
permission to let us read the file in blocks.
We use os.path.join(rootdir, filename)
to get the full path of the file.
Next, we have a while loop that loops through each block of the file that’s read with f.read
.
And we call m.update
on each block that’s read to update the md5 hash after each block is read.
Finally, we return md5 hash of the full file with m.hexdigest()
.
As a result, the print
output should be something like 'b60ab2708daec7685f3d412a5e05191a'
.
Conclusion
To get the MD5 hash of big files in Python, we can use the md5.update
and md5.digest
methods.