Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 May 2023 04:20:23 +0100
From:      Kaya Saman <kayasaman@optiplex-networks.com>
To:        Paul Procacci <pprocacci@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <eda13374-48c1-1749-3a73-530370934eff@optiplex-networks.com>
In-Reply-To: <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------d0U3TqYER7CIuMFlsBlTFHlk
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable


On 5/5/23 04:01, Paul Procacci wrote:
> On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman=20
> <kayasaman@optiplex-networks.com> wrote:
>
>
>     On 5/5/23 03:08, Paul Procacci wrote:
>>     There are multiple reasons why it may not work.=C2=A0 My guess is
>>     because the potential for characters that could be showing up
>>     within the filenames and whatnot.
>>
>>     This can be solved with an interpreted language that's a bit more
>>     forgiving.
>>     Take the following perl script.=C2=A0 It does the same thing as th=
e
>>     shell script (almost).=C2=A0 It renames the source file instead of
>>     making a copy of it.
>>
>>     run as:=C2=A0 ./test.pl <http://test.pl>; /absolute/path/to/master_=
dir
>>     /absolute_path_to_dir_x
>>
>>     ##################################################################=
#################
>>
>>     #!/usr/bin/env perl
>>
>>     use strict;
>>     use warnings;
>>
>>     sub msgDie
>>     {
>>     =C2=A0 my ($ret) =3D shift;
>>     =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";
>>     =C2=A0 print $msg;
>>     =C2=A0 exit($ret);
>>     }
>>
>>     msgDie(1) unless(scalar @ARGV eq 2);
>>
>>     my $base =3D $ARGV[0];
>>     my $dir =C2=A0=3D $ARGV[1];
>>
>>     msgDie(1, "base directory doesn't exist\n") unless -d $base;
>>     msgDie(1, "source directory doesn't exist\n") unless -d $dir;
>>
>>     opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"=
);
>>     while(readdir $dh)
>>     {
>>     =C2=A0 next if($_ eq '.' || $_ eq '..');
>>     =C2=A0 if( ! -f "$base/$_" ){
>>     =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");
>>     =C2=A0 =C2=A0 next;
>>     =C2=A0 }
>>
>>     =C2=A0 my ($ref) =3D (stat("$base/$_"))[7];
>>     =C2=A0 my ($src) =3D (stat("$dir/$_"))[7];
>>     =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);
>>     }
>>     ##################################################################=
#################
>>
>>     ~Paul
>>
>>
>
>     This didn't seem to work :-(
>
>
>     What exactly happened is this:
>
>
>     I created a set of test directories in /tmp
>
>
>     So, I have /tmp/test1 and /tmp/test2
>
>
>     to mimic the structure of the directories I intend to run this
>     thing I did this:
>
>
>     create a subdir called: dupdir in /tmp/test1 and /tmp/test2
>
>
>     /tmp/test2/dupdir contains these files: dup and dup1
>
>
>     /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 fi=
le.
>
>
>     However*, now things get interesting as dup from test1 contains
>     "1234567" and dup from test2 contains "111" <- this is to simulate
>     the file size difference.
>
>
>
>
>
>
> Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.
>
> rsync --ignore-existing --remove-source-files=C2=A0 /src /dest
> |This would at the very least move non-existent files from the source=20
> over to the dest AND remove those source files AFTER the transfer=20
> happens. |
> |You'll be 1/2 way there doing that. What you'll be left with are file=20
> that exist in BOTH src AND DEST. |
> |~Paul |


Paul, I think we've got wires crossed....


I *have* already performed the rsync. Apologies if I wasn't clear!


The problem I am faced with is that the destination directory is already=20
populated with the information from 3 source directories.


I need to remove the sync'ed files in the source directories and leave=20
files that match in name but are of different sizes.


The problem is I can't use rsync again for this as there aren't any=20
options to simply compare files based on size. I can't use the=20
--existing option as the files exist in both directories....


This is the dilemma I am facing:


ls -l /merged_dir/folder/

234904506 - file 'a'


ls -l /source_dir/folder/

1080918146 - file 'a'


so in this case file 'a' is in both directories with the same name but=20
different size. I need to keep both versions. However, *if* they were=20
the same size then remove the file in the source_dir.....


That's all.. I don't need to transfer anything or copy anything at=20
all... just compare and remove files of same name and size.


Hopefully I am explaining better and things are more clear? Again I=20
apologize for the confusion=C2=A0 :-(

--------------d0U3TqYER7CIuMFlsBlTFHlk
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF=
-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class=3D"moz-cite-prefix">On 5/5/23 04:01, Paul Procacci wrote:<=
br>
    </div>
    <blockquote type=3D"cite"
cite=3D"mid:CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=3D5L7TiVSg@mail.gm=
ail.com">
      <meta http-equiv=3D"content-type" content=3D"text/html; charset=3DU=
TF-8">
      <div dir=3D"ltr">
        <div>On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman &lt;<a
            href=3D"mailto:kayasaman@optiplex-networks.com"
            target=3D"_blank" moz-do-not-send=3D"true"
            class=3D"moz-txt-link-freetext">kayasaman@optiplex-networks.c=
om</a>&gt;
          wrote:
          <div class=3D"gmail_quote">
            <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <p><br>
                </p>
                <div>On 5/5/23 03:08, Paul Procacci wrote:<br>
                </div>
                <blockquote type=3D"cite">
                  <div dir=3D"ltr">
                    <div>There are multiple reasons why it may not
                      work.=C2=A0 My guess is because the potential for
                      characters that could be showing up within the
                      filenames and whatnot.<br>
                      <br>
                    </div>
                    <div>This can be solved with an interpreted language
                      that's a bit more forgiving.<br>
                    </div>
                    <div>Take the following perl script.=C2=A0 It does th=
e
                      same thing as the shell script (almost).=C2=A0 It
                      renames the source file instead of making a copy
                      of it.<br>
                      <br>
                      run as:=C2=A0 ./<a href=3D"http://test.pl"
                        target=3D"_blank" moz-do-not-send=3D"true">test.p=
l</a>
                      /absolute/path/to/master_dir
                      /absolute_path_to_dir_x<br>
                    </div>
                    <div><br>
                    </div>
                    <div>
#########################################################################=
##########
                      <br>
                      #!/usr/bin/env perl<br>
                      <br>
                      use strict;<br>
                      use warnings;<br>
                      <br>
                      sub msgDie<br>
                      {<br>
                      =C2=A0 my ($ret) =3D shift;<br>
                      =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";<=
br>
                      =C2=A0 print $msg;<br>
                      =C2=A0 exit($ret);<br>
                      }<br>
                      <br>
                      msgDie(1) unless(scalar @ARGV eq 2);<br>
                      <br>
                      my $base =3D $ARGV[0];<br>
                      my $dir =C2=A0=3D $ARGV[1];<br>
                      <br>
                      msgDie(1, "base directory doesn't exist\n") unless
                      -d $base;<br>
                      msgDie(1, "source directory doesn't exist\n")
                      unless -d $dir;<br>
                      <br>
                      opendir(my $dh, $dir) or msgDie("Unable to open
                      directory: $dir\n");<br>
                      while(readdir $dh)<br>
                      {<br>
                      =C2=A0 next if($_ eq '.' || $_ eq '..');<br>
                      =C2=A0 if( ! -f "$base/$_" ){<br>
                      =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");<br>
                      =C2=A0 =C2=A0 next;<br>
                      =C2=A0 }<br>
                      <br>
                      =C2=A0 my ($ref) =3D (stat("$base/$_"))[7];<br>
                      =C2=A0 my ($src) =3D (stat("$dir/$_"))[7];<br>
                      =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);<br>
                      }<br>
#########################################################################=
##########<br>
                      <br>
                    </div>
                    <div>~Paul<br>
                    </div>
                  </div>
                  <br>
                  <br>
                </blockquote>
                <p><br>
                </p>
                <p>This didn't seem to work :-(</p>
                <p><br>
                </p>
                <p>What exactly happened is this:</p>
                <p><br>
                </p>
                <p>I created a set of test directories in /tmp</p>
                <p><br>
                </p>
                <p>So, I have /tmp/test1 and /tmp/test2</p>
                <p><br>
                </p>
                <p>to mimic the structure of the directories I intend to
                  run this thing I did this:</p>
                <p><br>
                </p>
                <p>create a subdir called: dupdir in /tmp/test1 and
                  /tmp/test2</p>
                <p><br>
                </p>
                <p>/tmp/test2/dupdir contains these files: dup and dup1</=
p>
                <p><br>
                </p>
                <p>/tmp/test1/dupdir contains a modified 'dup' file but
                  copied dup1 file.<br>
                </p>
                <p><br>
                </p>
                <p>However*, now things get interesting as dup from
                  test1 contains "1234567" and dup from test2 contains
                  "111" &lt;- this is to simulate the file size
                  difference.</p>
                <p><br>
                  <br>
                  <br>
                  <br>
                </p>
              </div>
            </blockquote>
            <div>=C2=A0<br>
            </div>
            <div>Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.</=
div>
            <div><br>
            </div>
            <div>rsync --ignore-existing --remove-source-files=C2=A0 /src
              /dest<br>
              <pre><code>This would at the very least move non-existent f=
iles from the source over to the dest AND remove those source files AFTER=
 the transfer happens.
</code></pre>
              <pre><code>You'll be 1/2 way there doing that.  What you'll=
 be left with are file that exist in BOTH src AND DEST.

</code></pre>
              <pre><code>~Paul
</code></pre>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Paul, I think we've got wires crossed....</p>
    <p><br>
    </p>
    <p>I *have* already performed the rsync. Apologies if I wasn't
      clear!</p>
    <p><br>
    </p>
    <p>The problem I am faced with is that the destination directory is
      already populated with the information from 3 source directories.</=
p>
    <p><br>
    </p>
    <p>I need to remove the sync'ed files in the source directories and
      leave files that match in name but are of different sizes.</p>
    <p><br>
    </p>
    <p>The problem is I can't use rsync again for this as there aren't
      any options to simply compare files based on size. I can't use the
      --existing option as the files exist in both directories....<br>
    </p>
    <p><br>
    </p>
    <p>This is the dilemma I am facing:</p>
    <p><br>
    </p>
    <p>ls -l /merged_dir/folder/</p>
    <p>234904506 - file 'a'</p>
    <p><br>
    </p>
    <p>ls -l /source_dir/folder/</p>
    <p>1080918146 - file 'a'</p>
    <p><br>
    </p>
    <p>so in this case file 'a' is in both directories with the same
      name but different size. I need to keep both versions. However,
      *if* they were the same size then remove the file in the
      source_dir.....</p>
    <p><br>
    </p>
    <p>That's all.. I don't need to transfer anything or copy anything
      at all... just compare and remove files of same name and size.</p>
    <p><br>
    </p>
    <p>Hopefully I am explaining better and things are more clear? Again
      I apologize for the confusion=C2=A0 :-(<br>
    </p>
  </body>
</html>

--------------d0U3TqYER7CIuMFlsBlTFHlk--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?eda13374-48c1-1749-3a73-530370934eff>