Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 May 2023 22:08:22 -0400
From:      Paul Procacci <pprocacci@gmail.com>
To:        Kaya Saman <kayasaman@optiplex-networks.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com>
In-Reply-To: <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000033f28805fae8c4fc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

There are multiple reasons why it may not work.  My guess is because the
potential for characters that could be showing up within the filenames and
whatnot.

This can be solved with an interpreted language that's a bit more forgiving=
.
Take the following perl script.  It does the same thing as the shell script
(almost).  It renames the source file instead of making a copy of it.

run as:  ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x

###########################################################################=
########

#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
  my ($ret) =3D shift;
  my ($msg) =3D shift // "$0 dir_base dir\n";
  print $msg;
  exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];
my $dir  =3D $ARGV[1];

msgDie(1, "base directory doesn't exist\n") unless -d $base;
msgDie(1, "source directory doesn't exist\n") unless -d $dir;

opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
  next if($_ eq '.' || $_ eq '..');
  if( ! -f "$base/$_" ){
    rename("$dir/$_", "$base/$_");
    next;
  }

  my ($ref) =3D (stat("$base/$_"))[7];
  my ($src) =3D (stat("$dir/$_"))[7];
  unlink("$dir/$_") if($ref =3D=3D $src);
}
###########################################################################=
########

~Paul

On Thu, May 4, 2023 at 9:32=E2=80=AFPM Kaya Saman <kayasaman@optiplex-netwo=
rks.com>
wrote:

>
> On 5/5/23 01:13, Paul Procacci wrote:
> > #!/bin/sh
> >
> > #
> > # dir_1, dir_2, and dir_3 are the directories I want to search through.
> > for i in dir_1 dir_2 dir_3;
> > do
> >   # Retrieve the filenames within each of those directories
> >   ls $i/ | while read file;
> >   do
> >      If the file doesn't exist in the base dir, copy it and continue
> > with the top of the loop.
> >     [ ! -f dir_base/$file ] && cp $i/$file dir_base/ && continue
> >
> >     #
> >     # Getting to this point means the file eixsts in both locations.
> >     #
> >
> >     # Get the file size as it is in the dir_base
> >     ref=3D`stat -f '%z' dir_base/$file`
> >
> >     # Get the file size as it is in $i
> >     src=3D`stat -f '%z' $i/$file`
> >
> >     # If the sizes are the same, remove the file from the source
> directory
> >     [ $ref -eq $src ] && rm -f $i/file
> >
> >   done
> > done
>
>
> Thanks so much!
>
>
> just a quick question... you have dir_base written in the script. Do I
> need to define this or is this part of the shell language itself?
>
>
> Right now I have modifed the script to make it non destructive so that
> it doesn't do any copying or removing yet... call it a test instance if
> you like. I personally prefer doing things like this so I don't have any
> accidents and loose things in the meantime...
>
>
> So my initial modification is this:
>
>
> > #!/bin/sh
> >
> > #
> > # dir_1, dir_2, and dir_3 are the directories I want to search through.
> > for i in /dir_1 /dir_2 /dir_3;
> > do
> >   # Retrieve the filenames within each of those directories
> >   ls $i/ | while read file;
> >   do
> >     # If the file doesn't exist in the base dir, copy it and continue
> > with the top of the loop.
> >     [ ! -f dir_base/$file ] && ls $i/$file && continue
> >
> >     #
> >     # Getting to this point means the file eixsts in both locations.
> >     #
> >
> >     # Get the file size as it is in the dir_base
> >     ref=3D`stat -f '%z' dir_base/$file`
> >
> >     # Get the file size as it is in $i
> >     src=3D`stat -f '%z' $i/$file`
> >
> >     # If the sizes are the same, remove the file from the source
> directory
> >     [ $ref -nq $src ] && ls $i/file > /tmp/file
> >
> >   done
> > done
>
>
> If this works it should just output the different files into a file
> called "file" under /tmp
>
>
> Ok, this didn't work at all.... it just listed a whole bunch of top
> level folders and didn't recurse through them :-(
>
>
> I ran it on the assumption that I needed to run the script under /dir
> and that dir_base was a shell function which would essentially be /dir/.
>
>
> [EDIT]
>
>
> Currently, I managed to get it partly running by modifying ls to use ls
> -R *but* I think that the 'stat' statements don't allow for recursion?
>
>
> The script is running as I type this but it's most likely just
> outputting a whole bunch of ls information... as I see many 'stat'
> errors in the shell output.
>
>
>

--=20
__________________

:(){ :|:& };:

--00000000000033f28805fae8c4fc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>There are multiple reasons why it may not work.=C2=A0=
 My guess is because the potential for characters that could be showing up =
within the filenames and whatnot.<br><br></div><div>This can be solved with=
 an interpreted language that&#39;s a bit more forgiving.<br></div><div>Tak=
e the following perl script.=C2=A0 It does the same thing as the shell scri=
pt (almost).=C2=A0 It renames the source file instead of making a copy of i=
t.<br><br>run as:=C2=A0 ./<a href=3D"http://test.pl">test.pl</a>; /absolute/=
path/to/master_dir /absolute_path_to_dir_x<br></div><div><br></div><div>
###########################################################################=
########

<br>#!/usr/bin/env perl<br><br>use strict;<br>use warnings;<br><br>sub msgD=
ie<br>{<br>=C2=A0 my ($ret) =3D shift;<br>=C2=A0 my ($msg) =3D shift // &qu=
ot;$0 dir_base dir\n&quot;;<br>=C2=A0 print $msg;<br>=C2=A0 exit($ret);<br>=
}<br><br>msgDie(1) unless(scalar @ARGV eq 2);<br><br>my $base =3D $ARGV[0];=
<br>my $dir =C2=A0=3D $ARGV[1];<br><br>msgDie(1, &quot;base directory doesn=
&#39;t exist\n&quot;) unless -d $base;<br>msgDie(1, &quot;source directory =
doesn&#39;t exist\n&quot;) unless -d $dir;<br><br>opendir(my $dh, $dir) or =
msgDie(&quot;Unable to open directory: $dir\n&quot;);<br>while(readdir $dh)=
<br>{<br>=C2=A0 next if($_ eq &#39;.&#39; || $_ eq &#39;..&#39;);<br>=C2=A0=
 if( ! -f &quot;$base/$_&quot; ){<br>=C2=A0 =C2=A0 rename(&quot;$dir/$_&quo=
t;, &quot;$base/$_&quot;);<br>=C2=A0 =C2=A0 next;<br>=C2=A0 }<br><br>=C2=A0=
 my ($ref) =3D (stat(&quot;$base/$_&quot;))[7];<br>=C2=A0 my ($src) =3D (st=
at(&quot;$dir/$_&quot;))[7];<br>=C2=A0 unlink(&quot;$dir/$_&quot;) if($ref =
=3D=3D $src);<br>}<br>#####################################################=
##############################<br><br></div><div>~Paul<br></div></div><br><=
div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, May=
 4, 2023 at 9:32=E2=80=AFPM Kaya Saman &lt;<a href=3D"mailto:kayasaman@opti=
plex-networks.com">kayasaman@optiplex-networks.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><br>
On 5/5/23 01:13, Paul Procacci wrote:<br>
&gt; #!/bin/sh<br>
&gt;<br>
&gt; #<br>
&gt; # dir_1, dir_2, and dir_3 are the directories I want to search through=
.<br>
&gt; for i in dir_1 dir_2 dir_3;<br>
&gt; do<br>
&gt; =C2=A0 # Retrieve the filenames within each of those directories<br>
&gt; =C2=A0 ls $i/ | while read file;<br>
&gt; =C2=A0 do<br>
&gt; =C2=A0=C2=A0=C2=A0=C2=A0 If the file doesn&#39;t exist in the base dir=
, copy it and continue <br>
&gt; with the top of the loop.<br>
&gt; =C2=A0 =C2=A0 [ ! -f dir_base/$file ] &amp;&amp; cp $i/$file dir_base/=
 &amp;&amp; continue<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 #<br>
&gt; =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bo=
th locations.<br>
&gt; =C2=A0=C2=A0=C2=A0 #<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base<br>
&gt; =C2=A0 =C2=A0 ref=3D`stat -f &#39;%z&#39; dir_base/$file`<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i<br>
&gt; =C2=A0 =C2=A0 src=3D`stat -f &#39;%z&#39; $i/$file`<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from t=
he source directory<br>
&gt; =C2=A0 =C2=A0 [ $ref -eq $src ] &amp;&amp; rm -f $i/file<br>
&gt;<br>
&gt; =C2=A0 done<br>
&gt; done<br>
<br>
<br>
Thanks so much!<br>
<br>
<br>
just a quick question... you have dir_base written in the script. Do I <br>
need to define this or is this part of the shell language itself?<br>
<br>
<br>
Right now I have modifed the script to make it non destructive so that <br>
it doesn&#39;t do any copying or removing yet... call it a test instance if=
 <br>
you like. I personally prefer doing things like this so I don&#39;t have an=
y <br>
accidents and loose things in the meantime...<br>
<br>
<br>
So my initial modification is this:<br>
<br>
<br>
&gt; #!/bin/sh<br>
&gt;<br>
&gt; #<br>
&gt; # dir_1, dir_2, and dir_3 are the directories I want to search through=
.<br>
&gt; for i in /dir_1 /dir_2 /dir_3;<br>
&gt; do<br>
&gt; =C2=A0 # Retrieve the filenames within each of those directories<br>
&gt; =C2=A0 ls $i/ | while read file;<br>
&gt; =C2=A0 do<br>
&gt; =C2=A0=C2=A0=C2=A0 # If the file doesn&#39;t exist in the base dir, co=
py it and continue <br>
&gt; with the top of the loop.<br>
&gt; =C2=A0 =C2=A0 [ ! -f dir_base/$file ] &amp;&amp; ls $i/$file &amp;&amp=
; continue<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 #<br>
&gt; =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bo=
th locations.<br>
&gt; =C2=A0=C2=A0=C2=A0 #<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base<br>
&gt; =C2=A0 =C2=A0 ref=3D`stat -f &#39;%z&#39; dir_base/$file`<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i<br>
&gt; =C2=A0 =C2=A0 src=3D`stat -f &#39;%z&#39; $i/$file`<br>
&gt;<br>
&gt; =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from t=
he source directory<br>
&gt; =C2=A0 =C2=A0 [ $ref -nq $src ] &amp;&amp; ls $i/file &gt; /tmp/file<b=
r>
&gt;<br>
&gt; =C2=A0 done<br>
&gt; done<br>
<br>
<br>
If this works it should just output the different files into a file <br>
called &quot;file&quot; under /tmp<br>
<br>
<br>
Ok, this didn&#39;t work at all.... it just listed a whole bunch of top <br=
>
level folders and didn&#39;t recurse through them :-(<br>
<br>
<br>
I ran it on the assumption that I needed to run the script under /dir <br>
and that dir_base was a shell function which would essentially be /dir/.<br=
>
<br>
<br>
[EDIT]<br>
<br>
<br>
Currently, I managed to get it partly running by modifying ls to use ls <br=
>
-R *but* I think that the &#39;stat&#39; statements don&#39;t allow for rec=
ursion?<br>
<br>
<br>
The script is running as I type this but it&#39;s most likely just <br>
outputting a whole bunch of ls information... as I see many &#39;stat&#39; =
<br>
errors in the shell output.<br>
<br>
<br>
</blockquote></div><br clear=3D"all"><br><span class=3D"gmail_signature_pre=
fix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature">_____________=
_____<br><br>:(){ :|:&amp; };:</div>

--00000000000033f28805fae8c4fc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw>