From nobody Fri May 5 01:32:28 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCCq335pHz48xKl for ; Fri, 5 May 2023 01:32:31 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Received: from mail.optiplex-networks.com (mail.optiplex-networks.com [212.159.80.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCCq32NKJz4MSk for ; Fri, 5 May 2023 01:32:31 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Authentication-Results: mx1.freebsd.org; none Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id ED58115C38AC; Fri, 5 May 2023 02:32:28 +0100 (BST) Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ISYT1c1CxPkJ; Fri, 5 May 2023 02:32:28 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 8931B15C38AD; Fri, 5 May 2023 02:32:28 +0100 (BST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.optiplex-networks.com 8931B15C38AD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=optiplex-networks.com; s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997; t=1683250348; bh=HbJA+JH62tTgpHctuvTvn3kZPTS/BNysMGSmIdkdrYk=; h=Message-ID:Date:MIME-Version:To:From; b=Ru436zWB21P6g7zHoQSSZSH9dqIQlmy9hgW8OzEej33h+MZuPmAWwJsIVkZUB9pJS cdtbic7pR7WkvxGtCJ0rqcKpQtnnRXiOZnbAkgMMn9Xh4DQ3qo8Hy3HqiyuJb+wV9w CuBvfzUuZOcKD/NCOha40Vm6A0s+yzikUN51NzdkRDRe54pcqih+9Z0jPAT/PRg++A FP6Mi8ZlFaqWI5FK72xJpI6WAq3f9f/ARiU2IbpzEzwMbsfj5EiP3gTpBQyhsAF8y0 6Log/HR+3Xbq4vLneKszc1EDum5rMYHFVUzZDQDWHQYbg7J4CN0fZVnbNB/7vtIulG gaV3SUOfykzKA== X-Virus-Scanned: amavisd-new at mail.optiplex-networks.com Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id H6qog8uSNaVS; Fri, 5 May 2023 02:32:28 +0100 (BST) Received: from [192.168.20.23] (unknown [192.168.20.23]) by mail.optiplex-networks.com (Postfix) with ESMTPSA id 74F2D15C38AC; Fri, 5 May 2023 02:32:28 +0100 (BST) Message-ID: <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> Date: Fri, 5 May 2023 02:32:28 +0100 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: Paul Procacci Cc: freebsd-questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> From: Kaya Saman In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4QCCq32NKJz4MSk X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6871, ipnet:212.159.64.0/18, country:GB] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On 5/5/23 01:13, Paul Procacci wrote: > #!/bin/sh > > # > # dir_1, dir_2, and dir_3 are the directories I want to search through. > for i in dir_1 dir_2 dir_3; > do > =C2=A0 # Retrieve the filenames within each of those directories > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0=C2=A0=C2=A0=C2=A0 If the file doesn't exist in the base dir, cop= y it and continue=20 > with the top of the loop. > =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && cp $i/$file dir_base/ && conti= nue > > =C2=A0=C2=A0=C2=A0 # > =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bot= h locations. > =C2=A0=C2=A0=C2=A0 # > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base > =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file` > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i > =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file` > > =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from th= e source directory > =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file > > =C2=A0 done > done Thanks so much! just a quick question... you have dir_base written in the script. Do I=20 need to define this or is this part of the shell language itself? Right now I have modifed the script to make it non destructive so that=20 it doesn't do any copying or removing yet... call it a test instance if=20 you like. I personally prefer doing things like this so I don't have any=20 accidents and loose things in the meantime... So my initial modification is this: > #!/bin/sh > > # > # dir_1, dir_2, and dir_3 are the directories I want to search through. > for i in /dir_1 /dir_2 /dir_3; > do > =C2=A0 # Retrieve the filenames within each of those directories > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0=C2=A0=C2=A0 # If the file doesn't exist in the base dir, copy it= and continue=20 > with the top of the loop. > =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && ls $i/$file && continue > > =C2=A0=C2=A0=C2=A0 # > =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bot= h locations. > =C2=A0=C2=A0=C2=A0 # > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base > =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file` > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i > =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file` > > =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from th= e source directory > =C2=A0 =C2=A0 [ $ref -nq $src ] && ls $i/file > /tmp/file > > =C2=A0 done > done If this works it should just output the different files into a file=20 called "file" under /tmp Ok, this didn't work at all.... it just listed a whole bunch of top=20 level folders and didn't recurse through them :-( I ran it on the assumption that I needed to run the script under /dir=20 and that dir_base was a shell function which would essentially be /dir/. [EDIT] Currently, I managed to get it partly running by modifying ls to use ls=20 -R *but* I think that the 'stat' statements don't allow for recursion? The script is running as I type this but it's most likely just=20 outputting a whole bunch of ls information... as I see many 'stat'=20 errors in the shell output.