rem-dups is a great command-line utility for finding and removing duplicate clone files.
rem-dups was found here after searching the Internet for a useful utility to remove Cruft which happens to anyone who runs computers long enough or works with large amounts of data, files, pictures, etc.
The nice thing about this script is that it uses standard Linux tools found with most distributions and you have control over what to choose to keep or remove. What you want to keep note of is that this script finds exact duplicate files, so if you are working files with some differences, you'll need to use another type of tool, maybe like diff.
You can download script "rem-dups" or you can copy and paste the script shown below to a location where you keep handy utility programs:
#!/bin/sh # rem-dups - Finds duplicate files, puts them in rem-duplicates.sh for removal OUTF=rem-duplicates.sh; echo "#! /bin/sh" > $OUTF; echo "# File created by $0 $1 $2" >> $OUTF; echo "cd $(pwd)" >> $OUTF; find "$@" -type f -exec md5sum {} \; | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/#rm \1/' >> $OUTF; chmod a+x $OUTF; ls -l $OUTF |
After you download and save it, use the command chmod a+x rem-dups to make the script executable, otherwise you will need to run it as sh rem-dups to use the script.
Once you have downloaded and saved rem-dups, change to the directory you want to check, and run the script to check the directory and all sub-directories for duplicate files. This creates a new "rem-duplicates.sh" script file which you can modify and run afterwards.
Below is an example of what you would type on the command-line, and below that, is an explanation of each command done:
1. 2. 3. |
|
CHOOSE THE FILES YOU WANT TO REMOVE
After the script finishes, you will have an editable "rem-duplicates.sh" output file.
Use your favorite text editor to choose which files you want to remove. For the files you choose to remove, you will want to remove the preceeding comment mark "#" so that the command can remove the file.
1. |
|
In this example, we have two pictures with different names, but they are identical.
#! /bin/sh # File created by /home/you/rem-dups cd /home/you/Pictures #rm CatsAndDogs.jpg #rm DogsAndCats.jpg |
Using a text editor, we chose to remove "DogsAndCats.jpg" by removing the preceeding "#" and to keep "CatsAndDogs.jpg" by leaving the "#" in front of that line.
#! /bin/sh # File created by /home/you/rem-dups cd /home/you/Pictures #rm CatsAndDogs.jpg rm DogsAndCats.jpg |
Save the edited file when you are done.
Run the script ./rem-duplicates.sh to remove duplicate clone files, then cleanup by removing the script "rem-duplicates.sh". When you are done, you can then exit the command-line terminal.
1. 2. 3. |
|
Links To Sections Of This Page | ||||
---|---|---|---|---|
Top Of Page | Get Script | Run Script | Edit Choices | Cleanup |