Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 May 2023 23:01:54 -0400
From:      Paul Procacci <pprocacci@gmail.com>
To:        Kaya Saman <kayasaman@optiplex-networks.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com>
In-Reply-To: <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000a649b605fae9831d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman <kayasaman@optiplex-netw=
orks.com>
wrote:

>
> On 5/5/23 03:08, Paul Procacci wrote:
>
> There are multiple reasons why it may not work.  My guess is because the
> potential for characters that could be showing up within the filenames an=
d
> whatnot.
>
> This can be solved with an interpreted language that's a bit more
> forgiving.
> Take the following perl script.  It does the same thing as the shell
> script (almost).  It renames the source file instead of making a copy of =
it.
>
> run as:  ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x
>
> #########################################################################=
##########
>
> #!/usr/bin/env perl
>
> use strict;
> use warnings;
>
> sub msgDie
> {
>   my ($ret) =3D shift;
>   my ($msg) =3D shift // "$0 dir_base dir\n";
>   print $msg;
>   exit($ret);
> }
>
> msgDie(1) unless(scalar @ARGV eq 2);
>
> my $base =3D $ARGV[0];
> my $dir  =3D $ARGV[1];
>
> msgDie(1, "base directory doesn't exist\n") unless -d $base;
> msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>
> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
> while(readdir $dh)
> {
>   next if($_ eq '.' || $_ eq '..');
>   if( ! -f "$base/$_" ){
>     rename("$dir/$_", "$base/$_");
>     next;
>   }
>
>   my ($ref) =3D (stat("$base/$_"))[7];
>   my ($src) =3D (stat("$dir/$_"))[7];
>   unlink("$dir/$_") if($ref =3D=3D $src);
> }
>
> #########################################################################=
##########
>
> ~Paul
>
>
>
> This didn't seem to work :-(
>
>
> What exactly happened is this:
>
>
> I created a set of test directories in /tmp
>
>
> So, I have /tmp/test1 and /tmp/test2
>
>
> to mimic the structure of the directories I intend to run this thing I di=
d
> this:
>
>
> create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
> /tmp/test2/dupdir contains these files: dup and dup1
>
>
> /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.
>
>
> However*, now things get interesting as dup from test1 contains "1234567"
> and dup from test2 contains "111" <- this is to simulate the file size
> difference.
>
>
>
>
>
>
Worked for me!  Regardless.  Use rsync then.

rsync --ignore-existing --remove-source-files  /src /dest

This would at the very least move non-existent files from the source
over to the dest AND remove those source files AFTER the transfer
happens.

You'll be 1/2 way there doing that.  What you'll be left with are file
that exist in BOTH src AND DEST.

~Paul

--000000000000a649b605fae9831d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman &l=
t;<a href=3D"mailto:kayasaman@optiplex-networks.com" target=3D"_blank">kaya=
saman@optiplex-networks.com</a>&gt; wrote:<div class=3D"gmail_quote"><block=
quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1=
px solid rgb(204,204,204);padding-left:1ex">
 =20
   =20
 =20
  <div>
    <p><br>
    </p>
    <div>On 5/5/23 03:08, Paul Procacci wrote:<br>
    </div>
    <blockquote type=3D"cite">
     =20
      <div dir=3D"ltr">
        <div>There are multiple reasons why it may not work.=C2=A0 My guess
          is because the potential for characters that could be showing
          up within the filenames and whatnot.<br>
          <br>
        </div>
        <div>This can be solved with an interpreted language that&#39;s a
          bit more forgiving.<br>
        </div>
        <div>Take the following perl script.=C2=A0 It does the same thing a=
s
          the shell script (almost).=C2=A0 It renames the source file inste=
ad
          of making a copy of it.<br>
          <br>
          run as:=C2=A0 ./<a href=3D"http://test.pl" target=3D"_blank">test=
.pl</a>
          /absolute/path/to/master_dir /absolute_path_to_dir_x<br>
        </div>
        <div><br>
        </div>
        <div>
###########################################################################=
########
          <br>
          #!/usr/bin/env perl<br>
          <br>
          use strict;<br>
          use warnings;<br>
          <br>
          sub msgDie<br>
          {<br>
          =C2=A0 my ($ret) =3D shift;<br>
          =C2=A0 my ($msg) =3D shift // &quot;$0 dir_base dir\n&quot;;<br>
          =C2=A0 print $msg;<br>
          =C2=A0 exit($ret);<br>
          }<br>
          <br>
          msgDie(1) unless(scalar @ARGV eq 2);<br>
          <br>
          my $base =3D $ARGV[0];<br>
          my $dir =C2=A0=3D $ARGV[1];<br>
          <br>
          msgDie(1, &quot;base directory doesn&#39;t exist\n&quot;) unless =
-d $base;<br>
          msgDie(1, &quot;source directory doesn&#39;t exist\n&quot;) unles=
s -d $dir;<br>
          <br>
          opendir(my $dh, $dir) or msgDie(&quot;Unable to open directory:
          $dir\n&quot;);<br>
          while(readdir $dh)<br>
          {<br>
          =C2=A0 next if($_ eq &#39;.&#39; || $_ eq &#39;..&#39;);<br>
          =C2=A0 if( ! -f &quot;$base/$_&quot; ){<br>
          =C2=A0 =C2=A0 rename(&quot;$dir/$_&quot;, &quot;$base/$_&quot;);<=
br>
          =C2=A0 =C2=A0 next;<br>
          =C2=A0 }<br>
          <br>
          =C2=A0 my ($ref) =3D (stat(&quot;$base/$_&quot;))[7];<br>
          =C2=A0 my ($src) =3D (stat(&quot;$dir/$_&quot;))[7];<br>
          =C2=A0 unlink(&quot;$dir/$_&quot;) if($ref =3D=3D $src);<br>
          }<br>
###########################################################################=
########<br>
          <br>
        </div>
        <div>~Paul<br>
        </div>
      </div>
      <br>
      <br>
    </blockquote>
    <p><br>
    </p>
    <p>This didn&#39;t seem to work :-(</p>
    <p><br>
    </p>
    <p>What exactly happened is this:</p>
    <p><br>
    </p>
    <p>I created a set of test directories in /tmp</p>
    <p><br>
    </p>
    <p>So, I have /tmp/test1 and /tmp/test2</p>
    <p><br>
    </p>
    <p>to mimic the structure of the directories I intend to run this
      thing I did this:</p>
    <p><br>
    </p>
    <p>create a subdir called: dupdir in /tmp/test1 and /tmp/test2</p>
    <p><br>
    </p>
    <p>/tmp/test2/dupdir contains these files: dup and dup1</p>
    <p><br>
    </p>
    <p>/tmp/test1/dupdir contains a modified &#39;dup&#39; file but copied =
dup1
      file.<br>
    </p>
    <p><br>
    </p>
    <p>However*, now things get interesting as dup from test1 contains
      &quot;1234567&quot; and dup from test2 contains &quot;111&quot; &lt;-=
 this is to
      simulate the file size difference.</p>
    <p><br><br><br><br></p></div></blockquote><div>=C2=A0<br></div><div>Wor=
ked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.</div><div><br></div><di=
v>rsync --ignore-existing=20
--remove-source-files=C2=A0 /src /dest<br><pre><code>This would at the very=
 least move non-existent files from the source over to the dest AND remove =
those source files AFTER the transfer happens.<br></code></pre><pre><code>Y=
ou&#39;ll be 1/2 way there doing that.  What you&#39;ll be left with are fi=
le that exist in BOTH src AND DEST.<br><br></code></pre><pre><code>~Paul<br=
></code></pre>

</div></div></div></div>

--000000000000a649b605fae9831d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg>