Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 May 2023 18:32:04 -0400
From:      Paul Procacci <pprocacci@gmail.com>
To:        Kaya Saman <kayasaman@optiplex-networks.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com>
In-Reply-To: <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000009a126805fae5be5a
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman <kayasaman@optiplex-netwo=
rks.com>
wrote:

>
> On 5/4/23 17:29, Paul Procacci wrote:
>
>
>
> On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman <
> kayasaman@optiplex-networks.com> wrote:
>
>> Hi,
>>
>>
>> I'm wondering if anyone knows of a tool like diff or so that can also
>> delete files based on name and size from either left/right or
>> source/destination directory?
>>
>>
>> Basically what I have done is performed an rsync without using the
>> --remove-source-files option onto a newly bought and created disk pool
>> (yes zpool) that i am trying to consolidate my data - as it's currently
>> spread out over multiple pools with the same folder name.
>>
>>
>> The issue I am facing mainly is that I perform another rsync and use the
>> --remove-source-files option, rsync will delete files based on name
>> while there are some files that have the same name but not same size and
>> I would like to retain these files.
>>
>>
>> Right now I have looked at many different options in both rsync and
>> other tools but found nothing suitable. I even tested using a few test
>> dirs and files that I put into /tmp and whatever I tried, the files of
>> different size either got transferred or deleted.
>>
>>
>> How would be a good way to approach this problem?
>>
>>
>> Even if I create some kind of shell script and use diff, I think it will
>> only compare names and not file sizes.
>>
>>
>> I'm really lost here....
>>
>>
>> Regards,
>>
>>
>> Kaya
>>
>>
>>
>>
> It sounds like you want fdupes.  It's in the ports tree.
>
> ~Paul
>
> --
> __________________
>
> :(){ :|:& };:
>
>
>
> I tried fdupes and installed it a while back. For me it felt like it only
> works on a single directory.
>
>
> My dir structure is that I have"
>
>
> /dir <- main directory where everything has now been rsync'ed to
>
> /dir_1 <- old directory with partial content
>
> /dir_2 <- more partial content
>
> /dir_3 <- more partial content
>
>
> The key thing here is that I need to compare:
>
>
> /dir_(x) with /dir
>
>
> if the files are different sizes in /dir_(x) then leave them, otherwise
> delete if both name and file size are the same.
>

Then a tiny shell script does the job assuming your files don't have any
spaces and no weird characters exist:

#!/bin/sh

for i in b c d;
do
  ls $i/ | while read file;
  do
    [ ! -f a/$file ] && cp $i/$file a/$file && continue

    ref=3D`stat -f '%z' a/$file`
    src=3D`stat -f '%z' %i/$file`
    [ $ref -eq $src ] && rm -f $i/file

  done
done

Change paths accordingly and backup your stuff. ;)

~Paul

--=20
__________________

:(){ :|:& };:

--0000000000009a126805fae5be5a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div dir=3D"ltr"><br></div><br><div class=3D"gmail_qu=
ote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at 5:47=E2=
=80=AFPM Kaya Saman &lt;<a href=3D"mailto:kayasaman@optiplex-networks.com">=
kayasaman@optiplex-networks.com</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">
 =20
   =20
 =20
  <div>
    <p><br>
    </p>
    <div>On 5/4/23 17:29, Paul Procacci wrote:<br>
    </div>
    <blockquote type=3D"cite">
     =20
      <div dir=3D"ltr">
        <div>
          <div dir=3D"ltr"><br>
          </div>
          <br>
          <div class=3D"gmail_quote">
            <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at
              11:53=E2=80=AFAM Kaya Saman &lt;<a href=3D"mailto:kayasaman@o=
ptiplex-networks.com" target=3D"_blank">kayasaman@optiplex-networks.com</a>=
&gt;
              wrote:<br>
            </div>
            <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
              <br>
              <br>
              I&#39;m wondering if anyone knows of a tool like diff or so
              that can also <br>
              delete files based on name and size from either left/right
              or <br>
              source/destination directory?<br>
              <br>
              <br>
              Basically what I have done is performed an rsync without
              using the <br>
              --remove-source-files option onto a newly bought and
              created disk pool <br>
              (yes zpool) that i am trying to consolidate my data - as
              it&#39;s currently <br>
              spread out over multiple pools with the same folder name.<br>
              <br>
              <br>
              The issue I am facing mainly is that I perform another
              rsync and use the <br>
              --remove-source-files option, rsync will delete files
              based on name <br>
              while there are some files that have the same name but not
              same size and <br>
              I would like to retain these files.<br>
              <br>
              <br>
              Right now I have looked at many different options in both
              rsync and <br>
              other tools but found nothing suitable. I even tested
              using a few test <br>
              dirs and files that I put into /tmp and whatever I tried,
              the files of <br>
              different size either got transferred or deleted.<br>
              <br>
              <br>
              How would be a good way to approach this problem?<br>
              <br>
              <br>
              Even if I create some kind of shell script and use diff, I
              think it will <br>
              only compare names and not file sizes.<br>
              <br>
              <br>
              I&#39;m really lost here....<br>
              <br>
              <br>
              Regards,<br>
              <br>
              <br>
              Kaya<br>
              <br>
              <br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
        <div>It sounds like you want fdupes.=C2=A0 It&#39;s in the ports tr=
ee.</div>
        <div><br>
        </div>
        <div>~Paul<br>
        </div>
        <div><br>
          <span>-- </span><br>
          <div dir=3D"ltr">__________________<br>
            <br>
            :(){ :|:&amp; };:</div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p><br>
    </p>
    <p>I tried fdupes and installed it a while back. For me it felt like
      it only works on a single directory.</p>
    <p><br>
    </p>
    <p>My dir structure is that I have&quot;</p>
    <p><br>
    </p>
    <p>/dir &lt;- main directory where everything has now been rsync&#39;ed
      to<br>
    </p>
    <p>/dir_1 &lt;- old directory with partial content<br>
    </p>
    <p>/dir_2 &lt;- more partial content<br>
    </p>
    <p>/dir_3 &lt;- more partial content</p>
    <p><br>
    </p>
    <p>The key thing here is that I need to compare:</p>
    <p><br>
    </p>
    <p>/dir_(x) with /dir</p>
    <p><br>
    </p>
    <p>if the files are different sizes in /dir_(x) then leave them,
      otherwise delete if both name and file size are the same.<br>
    </p>
  </div>

</blockquote></div><br>Then a tiny shell script does the job assuming your =
files don&#39;t have any spaces and no weird characters exist:<br><br clear=
=3D"all">#!/bin/sh<br><br>for i in b c d;<br>do<br>=C2=A0 ls $i/ | while re=
ad file;<br>=C2=A0 do<br>=C2=A0 =C2=A0 [ ! -f a/$file ] &amp;&amp; cp $i/$f=
ile a/$file &amp;&amp; continue<br><br>=C2=A0 =C2=A0 ref=3D`stat -f &#39;%z=
&#39; a/$file`<br>=C2=A0 =C2=A0 src=3D`stat -f &#39;%z&#39; %i/$file`<br>=
=C2=A0 =C2=A0 [ $ref -eq $src ] &amp;&amp; rm -f $i/file<br><br>=C2=A0 done=
<br>done<br><br></div><div>Change paths accordingly and backup your stuff. =
;)</div><div><br></div><div>~Paul<br></div><div><br><span class=3D"gmail_si=
gnature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature">__=
________________<br><br>:(){ :|:&amp; };:</div></div></div>

--0000000000009a126805fae5be5a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w>