In this tutorial, we will introduce how to find duplicate files or images using python. You can build your own search engine by following our tutorial.
How to determine two files are the same?
The simplest way is to compare their md5 hash value. If two files are the same, their md5 hash value are also the same.
How to calculate the file md5 value using python?
Here is a tutorial to calculate the md5 value of the file.
In order to find all duplicate files in your computer, we should traverse all files in computer, then we should compute all md5 values.
How to traverse files in computer using python?
Here are two tutorials that can help you.
How to find the same file md5 value from python list or dictionary?
We can save all file md5 values in a python list or dictionary, which one we should use?
The answer is using python dictionary. This tutorial will tell you the reason.
After having found the duplicate files, you can use python to delete one of them using python easily.