From nobody Fri May 5 02:08:22 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCDch0LHJz494KD for ; Fri, 5 May 2023 02:08:36 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: from mail-oo1-xc2b.google.com (mail-oo1-xc2b.google.com [IPv6:2607:f8b0:4864:20::c2b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCDcg5b58z3ChT for ; Fri, 5 May 2023 02:08:35 +0000 (UTC) (envelope-from pprocacci@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc2b.google.com with SMTP id 006d021491bc7-54cb8d72c0bso666047eaf.3 for ; Thu, 04 May 2023 19:08:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683252514; x=1685844514; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qH5eMyAUQVSoqRxDc0HN6Anu/VTobZIWu23yP7xOs4Y=; b=kgEuG+r8KoaWf0ZKBkT6j5h8rAuD2BQvWeNXBnunUppVqOLJpE6fC4p+P3AvOZzxkq GLSyoj/o8SgAZbKP89SMwUbPSO9JDaJWQxzSah7OaKA7+2TjAE+VTOlmbcnUiYs46zfi 4bWzgEoI8qDW/K0oppnIqs7zm0y99x+iWZivL9hoC6g/Nzk6vj1ksu+PxpABztECeuU3 I7hpmHU4RoYa7KpdTbrEBbKll/YhDC8sFJhtmAmRj3bZxhconfmG8y+ZvIBXijh5as91 NPx5ndAoxs5fCPjFXe7ER1EgDfSQTb057A4RP8LdpiWCzrmyMIVNxf4+qy74gSHswWAN dEvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683252514; x=1685844514; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qH5eMyAUQVSoqRxDc0HN6Anu/VTobZIWu23yP7xOs4Y=; b=ZmTp8BSDSA88sXIGFdb3ZufGSnhL/zm5ah1bCjhnYtjivQHoMRhqz4mE+9Ws4Dj/H7 0bm+WeYPw1VFvKba86bEK2YRC+LP7fJNRkVZHvEz3K+dXRRS2X80k4Enm/02CGjy40xS XhDvgKOwUoenQk2HP3iCwGMzc6fjPc5wzmvu6RVK4qEcOgOsKzmJkw1k+HCPvRJdDACs 7rHwnwdsYvzEmWzcWCm2dMW7awjI52ePUmpa3zMHcFxGeQyP9YEWPQN+TZG7ULa+0ZIr +BwXukTVucPNUi8FSYGDIEjgQyHB8oF1/yZ/Yh5EhQX6puREZnh97ayQ2CcwZwkTkBEq tuKg== X-Gm-Message-State: AC+VfDx/+Io9BgVwS0I+T7y/1hPtTN9HkP+GGjAuG9jel6Q4qseasVb0 pNah41+xzeM+HwpSMC2kaqZ7AD897Uqt22LTnA== X-Google-Smtp-Source: ACHHUZ6Cwefl6naKGq/013bIQj5Tej/t43JgBijm1LdwhLjMG0EUdXLIgWe1yir4XoMQicQafl8AWbyBWCN7naB7ZhI= X-Received: by 2002:aca:b954:0:b0:38e:76b7:373b with SMTP id j81-20020acab954000000b0038e76b7373bmr2293228oif.53.1683252514647; Thu, 04 May 2023 19:08:34 -0700 (PDT) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> In-Reply-To: <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> From: Paul Procacci Date: Thu, 4 May 2023 22:08:22 -0400 Message-ID: Subject: Re: Tool to compare directories and delete duplicate files from one directory To: Kaya Saman Cc: freebsd-questions@freebsd.org Content-Type: multipart/alternative; boundary="00000000000033f28805fae8c4fc" X-Rspamd-Queue-Id: 4QCDcg5b58z3ChT X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --00000000000033f28805fae8c4fc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable There are multiple reasons why it may not work. My guess is because the potential for characters that could be showing up within the filenames and whatnot. This can be solved with an interpreted language that's a bit more forgiving= . Take the following perl script. It does the same thing as the shell script (almost). It renames the source file instead of making a copy of it. run as: ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x ###########################################################################= ######## #!/usr/bin/env perl use strict; use warnings; sub msgDie { my ($ret) =3D shift; my ($msg) =3D shift // "$0 dir_base dir\n"; print $msg; exit($ret); } msgDie(1) unless(scalar @ARGV eq 2); my $base =3D $ARGV[0]; my $dir =3D $ARGV[1]; msgDie(1, "base directory doesn't exist\n") unless -d $base; msgDie(1, "source directory doesn't exist\n") unless -d $dir; opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); while(readdir $dh) { next if($_ eq '.' || $_ eq '..'); if( ! -f "$base/$_" ){ rename("$dir/$_", "$base/$_"); next; } my ($ref) =3D (stat("$base/$_"))[7]; my ($src) =3D (stat("$dir/$_"))[7]; unlink("$dir/$_") if($ref =3D=3D $src); } ###########################################################################= ######## ~Paul On Thu, May 4, 2023 at 9:32=E2=80=AFPM Kaya Saman wrote: > > On 5/5/23 01:13, Paul Procacci wrote: > > #!/bin/sh > > > > # > > # dir_1, dir_2, and dir_3 are the directories I want to search through. > > for i in dir_1 dir_2 dir_3; > > do > > # Retrieve the filenames within each of those directories > > ls $i/ | while read file; > > do > > If the file doesn't exist in the base dir, copy it and continue > > with the top of the loop. > > [ ! -f dir_base/$file ] && cp $i/$file dir_base/ && continue > > > > # > > # Getting to this point means the file eixsts in both locations. > > # > > > > # Get the file size as it is in the dir_base > > ref=3D`stat -f '%z' dir_base/$file` > > > > # Get the file size as it is in $i > > src=3D`stat -f '%z' $i/$file` > > > > # If the sizes are the same, remove the file from the source > directory > > [ $ref -eq $src ] && rm -f $i/file > > > > done > > done > > > Thanks so much! > > > just a quick question... you have dir_base written in the script. Do I > need to define this or is this part of the shell language itself? > > > Right now I have modifed the script to make it non destructive so that > it doesn't do any copying or removing yet... call it a test instance if > you like. I personally prefer doing things like this so I don't have any > accidents and loose things in the meantime... > > > So my initial modification is this: > > > > #!/bin/sh > > > > # > > # dir_1, dir_2, and dir_3 are the directories I want to search through. > > for i in /dir_1 /dir_2 /dir_3; > > do > > # Retrieve the filenames within each of those directories > > ls $i/ | while read file; > > do > > # If the file doesn't exist in the base dir, copy it and continue > > with the top of the loop. > > [ ! -f dir_base/$file ] && ls $i/$file && continue > > > > # > > # Getting to this point means the file eixsts in both locations. > > # > > > > # Get the file size as it is in the dir_base > > ref=3D`stat -f '%z' dir_base/$file` > > > > # Get the file size as it is in $i > > src=3D`stat -f '%z' $i/$file` > > > > # If the sizes are the same, remove the file from the source > directory > > [ $ref -nq $src ] && ls $i/file > /tmp/file > > > > done > > done > > > If this works it should just output the different files into a file > called "file" under /tmp > > > Ok, this didn't work at all.... it just listed a whole bunch of top > level folders and didn't recurse through them :-( > > > I ran it on the assumption that I needed to run the script under /dir > and that dir_base was a shell function which would essentially be /dir/. > > > [EDIT] > > > Currently, I managed to get it partly running by modifying ls to use ls > -R *but* I think that the 'stat' statements don't allow for recursion? > > > The script is running as I type this but it's most likely just > outputting a whole bunch of ls information... as I see many 'stat' > errors in the shell output. > > > --=20 __________________ :(){ :|:& };: --00000000000033f28805fae8c4fc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
There are multiple reasons why it may not work.=C2=A0= My guess is because the potential for characters that could be showing up = within the filenames and whatnot.

This can be solved with= an interpreted language that's a bit more forgiving.
Tak= e the following perl script.=C2=A0 It does the same thing as the shell scri= pt (almost).=C2=A0 It renames the source file instead of making a copy of i= t.

run as:=C2=A0 ./test.pl /absolute/= path/to/master_dir /absolute_path_to_dir_x

###########################################################################= ########
#!/usr/bin/env perl

use strict;
use warnings;

sub msgD= ie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // &qu= ot;$0 dir_base dir\n";
=C2=A0 print $msg;
=C2=A0 exit($ret);
= }

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];=
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn= 't exist\n") unless -d $base;
msgDie(1, "source directory = doesn't exist\n") unless -d $dir;

opendir(my $dh, $dir) or = msgDie("Unable to open directory: $dir\n");
while(readdir $dh)=
{
=C2=A0 next if($_ eq '.' || $_ eq '..');
=C2=A0= if( ! -f "$base/$_" ){
=C2=A0 =C2=A0 rename("$dir/$_&quo= t;, "$base/$_");
=C2=A0 =C2=A0 next;
=C2=A0 }

=C2=A0= my ($ref) =3D (stat("$base/$_"))[7];
=C2=A0 my ($src) =3D (st= at("$dir/$_"))[7];
=C2=A0 unlink("$dir/$_") if($ref = =3D=3D $src);
}
#####################################################= ##############################

~Paul

<= div class=3D"gmail_quote">
On Thu, May= 4, 2023 at 9:32=E2=80=AFPM Kaya Saman <kayasaman@optiplex-networks.com> wrote:
=

On 5/5/23 01:13, Paul Procacci wrote:
> #!/bin/sh
>
> #
> # dir_1, dir_2, and dir_3 are the directories I want to search through= .
> for i in dir_1 dir_2 dir_3;
> do
> =C2=A0 # Retrieve the filenames within each of those directories
> =C2=A0 ls $i/ | while read file;
> =C2=A0 do
> =C2=A0=C2=A0=C2=A0=C2=A0 If the file doesn't exist in the base dir= , copy it and continue
> with the top of the loop.
> =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && cp $i/$file dir_base/= && continue
>
> =C2=A0=C2=A0=C2=A0 #
> =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bo= th locations.
> =C2=A0=C2=A0=C2=A0 #
>
> =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base
> =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file`
>
> =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i
> =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file`
>
> =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from t= he source directory
> =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file
>
> =C2=A0 done
> done


Thanks so much!


just a quick question... you have dir_base written in the script. Do I
need to define this or is this part of the shell language itself?


Right now I have modifed the script to make it non destructive so that
it doesn't do any copying or removing yet... call it a test instance if=
you like. I personally prefer doing things like this so I don't have an= y
accidents and loose things in the meantime...


So my initial modification is this:


> #!/bin/sh
>
> #
> # dir_1, dir_2, and dir_3 are the directories I want to search through= .
> for i in /dir_1 /dir_2 /dir_3;
> do
> =C2=A0 # Retrieve the filenames within each of those directories
> =C2=A0 ls $i/ | while read file;
> =C2=A0 do
> =C2=A0=C2=A0=C2=A0 # If the file doesn't exist in the base dir, co= py it and continue
> with the top of the loop.
> =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && ls $i/$file &&= ; continue
>
> =C2=A0=C2=A0=C2=A0 #
> =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bo= th locations.
> =C2=A0=C2=A0=C2=A0 #
>
> =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base
> =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file`
>
> =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i
> =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file`
>
> =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from t= he source directory
> =C2=A0 =C2=A0 [ $ref -nq $src ] && ls $i/file > /tmp/file >
> =C2=A0 done
> done


If this works it should just output the different files into a file
called "file" under /tmp


Ok, this didn't work at all.... it just listed a whole bunch of top level folders and didn't recurse through them :-(


I ran it on the assumption that I needed to run the script under /dir
and that dir_base was a shell function which would essentially be /dir/.

[EDIT]


Currently, I managed to get it partly running by modifying ls to use ls -R *but* I think that the 'stat' statements don't allow for rec= ursion?


The script is running as I type this but it's most likely just
outputting a whole bunch of ls information... as I see many 'stat' =
errors in the shell output.




--
_____________= _____

:(){ :|:& };:
--00000000000033f28805fae8c4fc--