Date: Thu, 13 Sep 2012 15:18:43 -0400 (EDT) From: vogelke+freebsd@pobox.com (Karl Vogel) To: freebsd-questions@freebsd.org Subject: Re: cksum entire dir?? Message-ID: <20120913191844.1C07FBEA7@kev.msw.wpafb.af.mil> In-Reply-To: <CAFuo_fyaToC_0NcvD6jobOK3qWm2D8CXzUn6Drxzr_tEkEL6dQ@mail.gmail.com> (message from Waitman Gobble on Wed, 12 Sep 2012 22:52:22 -0700)
next in thread | previous in thread | raw e-mail | index | archive | help
Here's a simple, system-independent way to find duplicate files. All you need is something to generate a digest you trust (MD5, SHA1, whatever) plus normal Unix stuff: awk, expand, grep, join, sort, and uniq. Generate the signatures: me% cd ~/bin me% find . -type f -print0 | xargs -0 md5 -r | sort > /tmp/sig1 me% cat /tmp/sig1 0287839688bd660676582266685b05bd ./mkrcs 0b97494883c76da546e3603d1b65e7b2 ./pwgen ddbed53e795724e4a6683e7b0987284c ./authlog ddbed53e795724e4a6683e7b0987284c ./cmdlog fdff1fd84d47f76dbd4954c607d66714 ./dbrun ff5e24efec5cf1e17cf32c58e9c4b317 ./tr0 Find duplicate signatures: me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 " 2 ddbed53e795724e4a6683e7b0987284c me% awk '{print $1}' /tmp/sig1 | uniq -c | expand | grep -v "^ *1 " | awk '{print $2}' > /tmp/sig2 Associate the duplicates with files: me% join /tmp/sig[12] ddbed53e795724e4a6683e7b0987284c ./authlog ddbed53e795724e4a6683e7b0987284c ./cmdlog If your filenames contain whitespace, you can URL-encode them, play some games with awk, or use perl. -- Karl Vogel I don't speak for the USAF or my company This is really a lovely horse, I once rode her mother. --Ted Walsh, Horse Racing Commentator
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120913191844.1C07FBEA7>