Date: Thu, 4 May 2023 23:01:54 -0400 From: Paul Procacci <pprocacci@gmail.com> To: Kaya Saman <kayasaman@optiplex-networks.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com> In-Reply-To: <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Thu, May 4, 2023 at 10:30 PM Kaya Saman <kayasaman@optiplex-networks.com>
wrote:
>
> On 5/5/23 03:08, Paul Procacci wrote:
>
> There are multiple reasons why it may not work. My guess is because the
> potential for characters that could be showing up within the filenames and
> whatnot.
>
> This can be solved with an interpreted language that's a bit more
> forgiving.
> Take the following perl script. It does the same thing as the shell
> script (almost). It renames the source file instead of making a copy of it.
>
> run as: ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x
>
> ###################################################################################
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> sub msgDie
> {
> my ($ret) = shift;
> my ($msg) = shift // "$0 dir_base dir\n";
> print $msg;
> exit($ret);
> }
>
> msgDie(1) unless(scalar @ARGV eq 2);
>
> my $base = $ARGV[0];
> my $dir = $ARGV[1];
>
> msgDie(1, "base directory doesn't exist\n") unless -d $base;
> msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>
> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
> while(readdir $dh)
> {
> next if($_ eq '.' || $_ eq '..');
> if( ! -f "$base/$_" ){
> rename("$dir/$_", "$base/$_");
> next;
> }
>
> my ($ref) = (stat("$base/$_"))[7];
> my ($src) = (stat("$dir/$_"))[7];
> unlink("$dir/$_") if($ref == $src);
> }
>
> ###################################################################################
>
> ~Paul
>
>
>
> This didn't seem to work :-(
>
>
> What exactly happened is this:
>
>
> I created a set of test directories in /tmp
>
>
> So, I have /tmp/test1 and /tmp/test2
>
>
> to mimic the structure of the directories I intend to run this thing I did
> this:
>
>
> create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
> /tmp/test2/dupdir contains these files: dup and dup1
>
>
> /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
>
>
> However*, now things get interesting as dup from test1 contains "1234567"
> and dup from test2 contains "111" <- this is to simulate the file size
> difference.
>
>
>
>
>
>
Worked for me! Regardless. Use rsync then.
rsync --ignore-existing --remove-source-files /src /dest
This would at the very least move non-existent files from the source
over to the dest AND remove those source files AFTER the transfer
happens.
You'll be 1/2 way there doing that. What you'll be left with are file
that exist in BOTH src AND DEST.
~Paul
[-- Attachment #2 --]
<div dir="ltr"><div>On Thu, May 4, 2023 at 10:30 PM Kaya Saman <<a href="mailto:kayasaman@optiplex-networks.com" target="_blank">kayasaman@optiplex-networks.com</a>> wrote:<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 5/5/23 03:08, Paul Procacci wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>There are multiple reasons why it may not work. My guess
is because the potential for characters that could be showing
up within the filenames and whatnot.<br>
<br>
</div>
<div>This can be solved with an interpreted language that's a
bit more forgiving.<br>
</div>
<div>Take the following perl script. It does the same thing as
the shell script (almost). It renames the source file instead
of making a copy of it.<br>
<br>
run as: ./<a href="http://test.pl" target="_blank">test.pl</a>
/absolute/path/to/master_dir /absolute_path_to_dir_x<br>
</div>
<div><br>
</div>
<div>
###################################################################################
<br>
#!/usr/bin/env perl<br>
<br>
use strict;<br>
use warnings;<br>
<br>
sub msgDie<br>
{<br>
my ($ret) = shift;<br>
my ($msg) = shift // "$0 dir_base dir\n";<br>
print $msg;<br>
exit($ret);<br>
}<br>
<br>
msgDie(1) unless(scalar @ARGV eq 2);<br>
<br>
my $base = $ARGV[0];<br>
my $dir = $ARGV[1];<br>
<br>
msgDie(1, "base directory doesn't exist\n") unless -d $base;<br>
msgDie(1, "source directory doesn't exist\n") unless -d $dir;<br>
<br>
opendir(my $dh, $dir) or msgDie("Unable to open directory:
$dir\n");<br>
while(readdir $dh)<br>
{<br>
next if($_ eq '.' || $_ eq '..');<br>
if( ! -f "$base/$_" ){<br>
rename("$dir/$_", "$base/$_");<br>
next;<br>
}<br>
<br>
my ($ref) = (stat("$base/$_"))[7];<br>
my ($src) = (stat("$dir/$_"))[7];<br>
unlink("$dir/$_") if($ref == $src);<br>
}<br>
###################################################################################<br>
<br>
</div>
<div>~Paul<br>
</div>
</div>
<br>
<br>
</blockquote>
<p><br>
</p>
<p>This didn't seem to work :-(</p>
<p><br>
</p>
<p>What exactly happened is this:</p>
<p><br>
</p>
<p>I created a set of test directories in /tmp</p>
<p><br>
</p>
<p>So, I have /tmp/test1 and /tmp/test2</p>
<p><br>
</p>
<p>to mimic the structure of the directories I intend to run this
thing I did this:</p>
<p><br>
</p>
<p>create a subdir called: dupdir in /tmp/test1 and /tmp/test2</p>
<p><br>
</p>
<p>/tmp/test2/dupdir contains these files: dup and dup1</p>
<p><br>
</p>
<p>/tmp/test1/dupdir contains a modified 'dup' file but copied dup1
file.<br>
</p>
<p><br>
</p>
<p>However*, now things get interesting as dup from test1 contains
"1234567" and dup from test2 contains "111" <- this is to
simulate the file size difference.</p>
<p><br><br><br><br></p></div></blockquote><div> <br></div><div>Worked for me! Regardless. Use rsync then.</div><div><br></div><div>rsync --ignore-existing
--remove-source-files /src /dest<br><pre><code>This would at the very least move non-existent files from the source over to the dest AND remove those source files AFTER the transfer happens.<br></code></pre><pre><code>You'll be 1/2 way there doing that. What you'll be left with are file that exist in BOTH src AND DEST.<br><br></code></pre><pre><code>~Paul<br></code></pre>
</div></div></div></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg>
