Tuesday, October 07, 2008

Deleting Duplicate Files using md5sum

I finally found the ultimate script to delete duplicate files. It uses md5sum so we are deleting files that have the same content, not just the same file name. It deletes the 2nd and 3rd duplicates as well. It only prints the command to delete the file. Its up to you to run it!!!! Have fun.

find . -type f -print0 | \
xargs -0 md5sum| \
sort| \
awk 'dup[$1]++{print $0}'| \
sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/rm \1/'

you can redirect output to a script file:
Add '> rmfilename' to the end of the 5th line

change its permissions after reviewing it:
chmod u+x rmfilename

and run it:
./rmfilename

No comments: