From nobody Fri May 5 02:30:14 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCF5h21nFz4985B for ; Fri, 5 May 2023 02:30:16 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Received: from mail.optiplex-networks.com (mail.optiplex-networks.com [212.159.80.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCF5h01zQz3GZW for ; Fri, 5 May 2023 02:30:16 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Authentication-Results: mx1.freebsd.org; none Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id E396D15C2F4F; Fri, 5 May 2023 03:30:14 +0100 (BST) Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Wftqiw0V4ojJ; Fri, 5 May 2023 03:30:14 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 59D4415C38AA; Fri, 5 May 2023 03:30:14 +0100 (BST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.optiplex-networks.com 59D4415C38AA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=optiplex-networks.com; s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997; t=1683253814; bh=egU9luFGKeN87+G1d9VVrR5vEnFiZqWwWdQzbH+KPkU=; h=Message-ID:Date:MIME-Version:To:From; b=uBCcmIPh/6jBJhqxD/hnPf7T6NsiA4E0sQ+BOy6Wz4JiUdTa6QJRqDZvl/2oNHGXy HK9GvGjhf9KG1Ct5yudWYfxJ4q2qCcEM4N3B0uqS9s+YuugpCYvITbAiyCSpl+WMVP UX0w5vcflj7eokqhVWHocjYQAi9cOaxk6sDROkS45yiJamNOIxmjnvX5G7PpnauBUd pCZ9QQjbqsthZiHbwaltwGR6OgzdNmBhQ+gHaSHCXjK2K6hNTTG8+op6Za00D1D4fP 0otdHzqa5VR03M49ehuz5zVQ95OF4rI8tUT62JTVZpCATn/XN60/dTXdJzEt1sap0j FPG6GZQhRTaQg== X-Virus-Scanned: amavisd-new at mail.optiplex-networks.com Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JHzYyicy9Wog; Fri, 5 May 2023 03:30:14 +0100 (BST) Received: from [192.168.20.23] (unknown [192.168.20.23]) by mail.optiplex-networks.com (Postfix) with ESMTPSA id 39B5D15C2F4F; Fri, 5 May 2023 03:30:14 +0100 (BST) Content-Type: multipart/alternative; boundary="------------FQX0zPbApKHBu0wXoTJEWsO0" Message-ID: Date: Fri, 5 May 2023 03:30:14 +0100 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: Paul Procacci Cc: freebsd-questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> From: Kaya Saman In-Reply-To: X-Rspamd-Queue-Id: 4QCF5h01zQz3GZW X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6871, ipnet:212.159.64.0/18, country:GB] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N This is a multi-part message in MIME format. --------------FQX0zPbApKHBu0wXoTJEWsO0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 5/5/23 03:08, Paul Procacci wrote: > There are multiple reasons why it may not work.=C2=A0 My guess is becau= se=20 > the potential for characters that could be showing up within the=20 > filenames and whatnot. > > This can be solved with an interpreted language that's a bit more=20 > forgiving. > Take the following perl script.=C2=A0 It does the same thing as the she= ll=20 > script (almost).=C2=A0 It renames the source file instead of making a c= opy=20 > of it. > > run as:=C2=A0 ./test.pl /absolute/path/to/master_dir=20 > /absolute_path_to_dir_x > > #######################################################################= ############=20 > > #!/usr/bin/env perl > > use strict; > use warnings; > > sub msgDie > { > =C2=A0 my ($ret) =3D shift; > =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n"; > =C2=A0 print $msg; > =C2=A0 exit($ret); > } > > msgDie(1) unless(scalar @ARGV eq 2); > > my $base =3D $ARGV[0]; > my $dir =C2=A0=3D $ARGV[1]; > > msgDie(1, "base directory doesn't exist\n") unless -d $base; > msgDie(1, "source directory doesn't exist\n") unless -d $dir; > > opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); > while(readdir $dh) > { > =C2=A0 next if($_ eq '.' || $_ eq '..'); > =C2=A0 if( ! -f "$base/$_" ){ > =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_"); > =C2=A0 =C2=A0 next; > =C2=A0 } > > =C2=A0 my ($ref) =3D (stat("$base/$_"))[7]; > =C2=A0 my ($src) =3D (stat("$dir/$_"))[7]; > =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src); > } > #######################################################################= ############ > > ~Paul > > This didn't seem to work :-( What exactly happened is this: I created a set of test directories in /tmp So, I have /tmp/test1 and /tmp/test2 to mimic the structure of the directories I intend to run this thing I=20 did this: create a subdir called: dupdir in /tmp/test1 and /tmp/test2 /tmp/test2/dupdir contains these files: dup and dup1 /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. However*, now things get interesting as dup from test1 contains=20 "1234567" and dup from test2 contains "111" <- this is to simulate the=20 file size difference. I then ran: ./test.pl /tmp/test1 /tmp/test2 The expected behavior is that I should retain the file 'dup' in test1=20 while 'dup1' should be removed. In my actual file system I have many of these subdirs, so a fair test=20 would probably be something like creating: /tmp/test1/dupdir1 /tmp/test2/dupdir1 /tmp/test1/dupdir2 /tmp/test2/dupdir2 then putting the file dup into dupdir1 and dup1 into dupdir2 I guess my issue is complex?? If I only I had used the=20 --remove-source-files option during my initial rsync then I wouldn't=20 have had to worry about any of this since I used the --ignore-existing=20 option so that would have done the trick initially, but I decided to=20 play safe instead and now ended up with a slight headache on my hands. --------------FQX0zPbApKHBu0wXoTJEWsO0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 5/5/23 03:08, Paul Procacci wrote:<= br>
There are multiple reasons why it may not work.=C2=A0 My gue= ss is because the potential for characters that could be showing up within the filenames and whatnot.

This can be solved with an interpreted language that's a bit more forgiving.
Take the following perl script.=C2=A0 It does the same thing= as the shell script (almost).=C2=A0 It renames the source file ins= tead of making a copy of it.

run as:=C2=A0 ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x

#########################################################################= ##########
#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";
=C2=A0 print $msg;
=C2=A0 exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn't exist\n") unless -d $base; msgDie(1, "source directory doesn't exist\n") unless -d $dir;
opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
=C2=A0 next if($_ eq '.' || $_ eq '..');
=C2=A0 if( ! -f "$base/$_" ){
=C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");
=C2=A0 =C2=A0 next;
=C2=A0 }

=C2=A0 my ($ref) =3D (stat("$base/$_"))[7];
=C2=A0 my ($src) =3D (stat("$dir/$_"))[7];
=C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);
}
#########################################################################= ##########

~Paul



This didn't seem to work :-(


What exactly happened is this:


I created a set of test directories in /tmp


So, I have /tmp/test1 and /tmp/test2


to mimic the structure of the directories I intend to run this thing I did this:


create a subdir called: dupdir in /tmp/test1 and /tmp/test2


/tmp/test2/dupdir contains these files: dup and dup1


/tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.


However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference.


I then ran: ./test.pl /tmp/test1 /tmp/test2


The expected behavior is that I should retain the file 'dup' in test1 while 'dup1' should be removed.


In my actual file system I have many of these subdirs, so a fair test would probably be something like creating:

/tmp/test1/dupdir1

/tmp/test2/dupdir1

/tmp/test1/dupdir2

/tmp/test2/dupdir2


then putting the file dup into dupdir1 and dup1 into dupdir2


I guess my issue is complex?? If I only I had used the --remove-source-files option during my initial rsync then I wouldn't have had to worry about any of this since I used the --ignore-existing option so that would have done the trick initially, but I decided to play safe instead and now ended up with a slight headache on my hands.

--------------FQX0zPbApKHBu0wXoTJEWsO0--