In this tutorial, we will introduce how to find duplicate files or images using python. You can build your own search engine by following our tutorial.
How to determine two files are the same?
The simplest way is to compare their md5 hash value. If two files are the same, their md5 hash value are also the same.
How to calculate the file md5 value using python?
Here is a tutorial to calculate the md5 value of the file.
Python Calculate the MD5 Value for Big File – Python Tutorial
In order to find all duplicate files in your computer, we should traverse all files in computer, then we should compute all md5 values.
How to traverse files in computer using python?
Here are two tutorials that can help you.
Python Traverse Files in a Directory Using glob Library: A Beginner Guide
Python Traverse Files in a Directory for Beginners
How to find the same file md5 value from python list or dictionary?
We can save all file md5 values in a python list or dictionary, which one we should use?
The answer is using python dictionary. This tutorial will tell you the reason.
Python Find Element in List or Dictionary, Which is Faster? – Python Performance Optimization
After having found the duplicate files, you can use python to delete one of them using python easily.