Date: Thu, 13 Sep 2012 15:18:43 -0400 (EDT) From: vogelke+freebsd@pobox.com (Karl Vogel) To: freebsd-questions@freebsd.org Subject: Re: cksum entire dir?? Message-ID: <20120913191844.1C07FBEA7@kev.msw.wpafb.af.mil> In-Reply-To: <CAFuo_fyaToC_0NcvD6jobOK3qWm2D8CXzUn6Drxzr_tEkEL6dQ@mail.gmail.com> (message from Waitman Gobble on Wed, 12 Sep 2012 22:52:22 -0700)
next in thread | previous in thread | raw e-mail | index | archive | help
Here's a simple, system-independent way to find duplicate files. All you
need is something to generate a digest you trust (MD5, SHA1, whatever) plus
normal Unix stuff: awk, expand, grep, join, sort, and uniq.
Generate the signatures:
me% cd ~/bin
me% find . -type f -print0 | xargs -0 md5 -r | sort > /tmp/sig1
me% cat /tmp/sig1
0287839688bd660676582266685b05bd ./mkrcs
0b97494883c76da546e3603d1b65e7b2 ./pwgen
ddbed53e795724e4a6683e7b0987284c ./authlog
ddbed53e795724e4a6683e7b0987284c ./cmdlog
fdff1fd84d47f76dbd4954c607d66714 ./dbrun
ff5e24efec5cf1e17cf32c58e9c4b317 ./tr0
Find duplicate signatures:
me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 "
2 ddbed53e795724e4a6683e7b0987284c
me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 " |
awk '{print $2}' > /tmp/sig2
Associate the duplicates with files:
me% join /tmp/sig[12]
ddbed53e795724e4a6683e7b0987284c ./authlog
ddbed53e795724e4a6683e7b0987284c ./cmdlog
If your filenames contain whitespace, you can URL-encode them, play some
games with awk, or use perl.
--
Karl Vogel I don't speak for the USAF or my company
This is really a lovely horse, I once rode her mother.
--Ted Walsh, Horse Racing Commentator
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120913191844.1C07FBEA7>
