Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 May 2023 00:53:14 +0100
From:      Kaya Saman <kayasaman@optiplex-networks.com>
To:        Paul Procacci <pprocacci@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com>
In-Reply-To: <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------iLtfd7qrOG0ADWnzwCu037z0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable


On 5/4/23 23:32, Paul Procacci wrote:
>
>
> On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman=20
> <kayasaman@optiplex-networks.com> wrote:
>
>
>     On 5/4/23 17:29, Paul Procacci wrote:
>>
>>
>>     On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman
>>     <kayasaman@optiplex-networks.com> wrote:
>>
>>         Hi,
>>
>>
>>         I'm wondering if anyone knows of a tool like diff or so that
>>         can also
>>         delete files based on name and size from either left/right or
>>         source/destination directory?
>>
>>
>>         Basically what I have done is performed an rsync without
>>         using the
>>         --remove-source-files option onto a newly bought and created
>>         disk pool
>>         (yes zpool) that i am trying to consolidate my data - as it's
>>         currently
>>         spread out over multiple pools with the same folder name.
>>
>>
>>         The issue I am facing mainly is that I perform another rsync
>>         and use the
>>         --remove-source-files option, rsync will delete files based
>>         on name
>>         while there are some files that have the same name but not
>>         same size and
>>         I would like to retain these files.
>>
>>
>>         Right now I have looked at many different options in both
>>         rsync and
>>         other tools but found nothing suitable. I even tested using a
>>         few test
>>         dirs and files that I put into /tmp and whatever I tried, the
>>         files of
>>         different size either got transferred or deleted.
>>
>>
>>         How would be a good way to approach this problem?
>>
>>
>>         Even if I create some kind of shell script and use diff, I
>>         think it will
>>         only compare names and not file sizes.
>>
>>
>>         I'm really lost here....
>>
>>
>>         Regards,
>>
>>
>>         Kaya
>>
>>
>>
>>
>>     It sounds like you want fdupes.=C2=A0 It's in the ports tree.
>>
>>     ~Paul
>>
>>     --=20
>>     __________________
>>
>>     :(){ :|:& };:
>
>
>
>     I tried fdupes and installed it a while back. For me it felt like
>     it only works on a single directory.
>
>
>     My dir structure is that I have"
>
>
>     /dir <- main directory where everything has now been rsync'ed to
>
>     /dir_1 <- old directory with partial content
>
>     /dir_2 <- more partial content
>
>     /dir_3 <- more partial content
>
>
>     The key thing here is that I need to compare:
>
>
>     /dir_(x) with /dir
>
>
>     if the files are different sizes in /dir_(x) then leave them,
>     otherwise delete if both name and file size are the same.
>
>
> Then a tiny shell script does the job assuming your files don't have=20
> any spaces and no weird characters exist:
>
> #!/bin/sh
>
> for i in b c d;
> do
> =C2=A0 ls $i/ | while read file;
> =C2=A0 do
> =C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$file a/$file && continue
>
> =C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file`
> =C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`
> =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file
>
> =C2=A0 done
> done
>
> Change paths accordingly and backup your stuff. ;)
>
> ~Paul
>
> --=20
> __________________
>
> :(){ :|:& };:


Thanks Paul,


I should be able to work with this. There are actually spaces and weird=20
characters in the file names so I assume doing something like "file"=20
should allow for that?


I don't think I need the line after the 'do' statement do I? From what I=20
understand it copies the file from directory i to directory a? As I=20
explained initially, the files have already been rsync'ed so I just need=20
to compare and delete accordingly.

When I performed the rsync it took around a week to complete per run,=20
currently zfs list shows around 12TB usage for my /dir but that's with=20
compression enabled, of the merged directory.


A quick Google shows that I can use something like this:

|search_dir=3D/the/path/to/base/dir for entry in "$search_dir"/* do echo=20
"$entry" done|


To list the files in the directory though this might be Bash and not Csh


Otherwise clunkily (my scripting style is pretty rubbish and non=20
efficient), I could do something like (it probably won't work!):


#!/bin/sh


#fb =3D file base

#fm - file merge - file that has already been merged using rsync unless=20
size was different


dir_base=3D/dir
for fb in "$dir_base"/*
do
 =C2=A0 echo "$fs"
done


dir_merge=3D/dir_1
for fm in "$dir_merge"/*
do
 =C2=A0 echo "$fm"
done


 =C2=A0 do

 =C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb`
 =C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm`
 =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $dir_merge/$fm

 =C2=A0 done



Regards,


Kaya

--------------iLtfd7qrOG0ADWnzwCu037z0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html>
  <head>
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF=
-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class=3D"moz-cite-prefix">On 5/4/23 23:32, Paul Procacci wrote:<=
br>
    </div>
    <blockquote type=3D"cite"
cite=3D"mid:CAFbbPuiNqYLLg8wcg8S_3=3Dy46osb06+duHqY9f0n=3DOuRgGVY=3Dw@mai=
l.gmail.com">
      <meta http-equiv=3D"content-type" content=3D"text/html; charset=3DU=
TF-8">
      <div dir=3D"ltr">
        <div>
          <div dir=3D"ltr"><br>
          </div>
          <br>
          <div class=3D"gmail_quote">
            <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at
              5:47=E2=80=AFPM Kaya Saman &lt;<a
                href=3D"mailto:kayasaman@optiplex-networks.com"
                moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">=
kayasaman@optiplex-networks.com</a>&gt;
              wrote:<br>
            </div>
            <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=

              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <p><br>
                </p>
                <div>On 5/4/23 17:29, Paul Procacci wrote:<br>
                </div>
                <blockquote type=3D"cite">
                  <div dir=3D"ltr">
                    <div>
                      <div dir=3D"ltr"><br>
                      </div>
                      <br>
                      <div class=3D"gmail_quote">
                        <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May=
 4,
                          2023 at 11:53=E2=80=AFAM Kaya Saman &lt;<a
                            href=3D"mailto:kayasaman@optiplex-networks.co=
m"
                            target=3D"_blank" moz-do-not-send=3D"true"
                            class=3D"moz-txt-link-freetext">kayasaman@opt=
iplex-networks.com</a>&gt;
                          wrote:<br>
                        </div>
                        <blockquote class=3D"gmail_quote"
                          style=3D"margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">Hi,<br>
                          <br>
                          <br>
                          I'm wondering if anyone knows of a tool like
                          diff or so that can also <br>
                          delete files based on name and size from
                          either left/right or <br>
                          source/destination directory?<br>
                          <br>
                          <br>
                          Basically what I have done is performed an
                          rsync without using the <br>
                          --remove-source-files option onto a newly
                          bought and created disk pool <br>
                          (yes zpool) that i am trying to consolidate my
                          data - as it's currently <br>
                          spread out over multiple pools with the same
                          folder name.<br>
                          <br>
                          <br>
                          The issue I am facing mainly is that I perform
                          another rsync and use the <br>
                          --remove-source-files option, rsync will
                          delete files based on name <br>
                          while there are some files that have the same
                          name but not same size and <br>
                          I would like to retain these files.<br>
                          <br>
                          <br>
                          Right now I have looked at many different
                          options in both rsync and <br>
                          other tools but found nothing suitable. I even
                          tested using a few test <br>
                          dirs and files that I put into /tmp and
                          whatever I tried, the files of <br>
                          different size either got transferred or
                          deleted.<br>
                          <br>
                          <br>
                          How would be a good way to approach this
                          problem?<br>
                          <br>
                          <br>
                          Even if I create some kind of shell script and
                          use diff, I think it will <br>
                          only compare names and not file sizes.<br>
                          <br>
                          <br>
                          I'm really lost here....<br>
                          <br>
                          <br>
                          Regards,<br>
                          <br>
                          <br>
                          Kaya<br>
                          <br>
                          <br>
                          <br>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                    <div>It sounds like you want fdupes.=C2=A0 It's in th=
e
                      ports tree.</div>
                    <div><br>
                    </div>
                    <div>~Paul<br>
                    </div>
                    <div><br>
                      <span>-- </span><br>
                      <div dir=3D"ltr">__________________<br>
                        <br>
                        :(){ :|:&amp; };:</div>
                    </div>
                  </div>
                </blockquote>
                <p><br>
                </p>
                <p><br>
                </p>
                <p>I tried fdupes and installed it a while back. For me
                  it felt like it only works on a single directory.</p>
                <p><br>
                </p>
                <p>My dir structure is that I have"</p>
                <p><br>
                </p>
                <p>/dir &lt;- main directory where everything has now
                  been rsync'ed to<br>
                </p>
                <p>/dir_1 &lt;- old directory with partial content<br>
                </p>
                <p>/dir_2 &lt;- more partial content<br>
                </p>
                <p>/dir_3 &lt;- more partial content</p>
                <p><br>
                </p>
                <p>The key thing here is that I need to compare:</p>
                <p><br>
                </p>
                <p>/dir_(x) with /dir</p>
                <p><br>
                </p>
                <p>if the files are different sizes in /dir_(x) then
                  leave them, otherwise delete if both name and file
                  size are the same.<br>
                </p>
              </div>
            </blockquote>
          </div>
          <br>
          Then a tiny shell script does the job assuming your files
          don't have any spaces and no weird characters exist:<br>
          <br clear=3D"all">
          #!/bin/sh<br>
          <br>
          for i in b c d;<br>
          do<br>
          =C2=A0 ls $i/ | while read file;<br>
          =C2=A0 do<br>
          =C2=A0 =C2=A0 [ ! -f a/$file ] &amp;&amp; cp $i/$file a/$file &=
amp;&amp;
          continue<br>
          <br>
          =C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file`<br>
          =C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`<br>
          =C2=A0 =C2=A0 [ $ref -eq $src ] &amp;&amp; rm -f $i/file<br>
          <br>
          =C2=A0 done<br>
          done<br>
          <br>
        </div>
        <div>Change paths accordingly and backup your stuff. ;)</div>
        <div><br>
        </div>
        <div>~Paul<br>
        </div>
        <div><br>
          <span class=3D"gmail_signature_prefix">-- </span><br>
          <div dir=3D"ltr" class=3D"gmail_signature">__________________<b=
r>
            <br>
            :(){ :|:&amp; };:</div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
    <p>Thanks Paul,</p>
    <p><br>
    </p>
    <p>I should be able to work with this. There are actually spaces and
      weird characters in the file names so I assume doing something
      like "file" should allow for that?</p>
    <p><br>
    </p>
    <p>I don't think I need the line after the 'do' statement do I? From
      what I understand it copies the file from directory i to directory
      a? As I explained initially, the files have already been rsync'ed
      so I just need to compare and delete accordingly.</p>
    <p>When I performed the rsync it took around a week to complete per
      run, currently zfs list shows around 12TB usage for my /dir but
      that's with compression enabled, of the merged directory.</p>
    <p><br>
    </p>
    <p>A quick Google shows that I can use something like this:</p>
    <pre class=3D"lang-bash s-code-block" style=3D"margin: 0px; padding: =
var(--su12); border: 0px; font-style: normal; font-variant-ligatures: nor=
mal; font-variant-caps: normal; font-variant-numeric: inherit; font-varia=
nt-east-asian: inherit; font-variant-alternates: inherit; font-weight: 40=
0; font-stretch: inherit; line-height: var(--lh-md); font-family: var(--f=
f-mono); font-optical-sizing: inherit; font-kerning: inherit; font-featur=
e-settings: inherit; font-variation-settings: inherit; font-size: var(--f=
s-body1); vertical-align: baseline; box-sizing: inherit; width: auto; max=
-height: 600px; overflow: auto; background-color: var(--highlight-bg); bo=
rder-radius: var(--br-md); --_cb-line-numbers-bg: var(--black-050); color=
: var(--highlight-color); overflow-wrap: normal; letter-spacing: normal; =
orphans: 2; text-align: left; text-indent: 0px; text-transform: none; wid=
ows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoratio=
n-thickness: initial; text-decoration-style: initial; text-decoration-col=
or: initial;"><code class=3D"hljs language-bash" style=3D"margin: 0px; pa=
dding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font=
-weight: inherit; font-stretch: inherit; line-height: inherit; font-famil=
y: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-fea=
ture-settings: inherit; font-variation-settings: inherit; font-size: var(=
--_pr-code-fs); vertical-align: baseline; box-sizing: inherit; background=
-color: transparent; white-space: inherit;">search_dir=3D/the/path/to/bas=
e/dir
<span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: =
0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo=
nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt=
ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher=
it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba=
seline; box-sizing: inherit; color: var(--highlight-keyword);">for</span>=
 entry <span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; b=
order: 0px; font-style: inherit; font-variant: inherit; font-weight: inhe=
rit; font-stretch: inherit; line-height: inherit; font-family: inherit; f=
ont-optical-sizing: inherit; font-kerning: inherit; font-feature-settings=
: inherit; font-variation-settings: inherit; font-size: 13px; vertical-al=
ign: baseline; box-sizing: inherit; color: var(--highlight-keyword);">in<=
/span> <span class=3D"hljs-string" style=3D"margin: 0px; padding: 0px; bo=
rder: 0px; font-style: inherit; font-variant: inherit; font-weight: inher=
it; font-stretch: inherit; line-height: inherit; font-family: inherit; fo=
nt-optical-sizing: inherit; font-kerning: inherit; font-feature-settings:=
 inherit; font-variation-settings: inherit; font-size: 13px; vertical-ali=
gn: baseline; box-sizing: inherit; color: var(--highlight-variable);">"<s=
pan class=3D"hljs-variable" style=3D"margin: 0px; padding: 0px; border: 0=
px; font-style: inherit; font-variant: inherit; font-weight: inherit; fon=
t-stretch: inherit; line-height: inherit; font-family: inherit; font-opti=
cal-sizing: inherit; font-kerning: inherit; font-feature-settings: inheri=
t; font-variation-settings: inherit; font-size: 13px; vertical-align: bas=
eline; box-sizing: inherit; color: var(--highlight-variable);">$search_di=
r</span>"</span>/*
<span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: =
0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo=
nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt=
ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher=
it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba=
seline; box-sizing: inherit; color: var(--highlight-keyword);">do</span>
  <span class=3D"hljs-built_in" style=3D"margin: 0px; padding: 0px; borde=
r: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit;=
 font-stretch: inherit; line-height: inherit; font-family: inherit; font-=
optical-sizing: inherit; font-kerning: inherit; font-feature-settings: in=
herit; font-variation-settings: inherit; font-size: 13px; vertical-align:=
 baseline; box-sizing: inherit; color: var(--highlight-literal);">echo</s=
pan> <span class=3D"hljs-string" style=3D"margin: 0px; padding: 0px; bord=
er: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit=
; font-stretch: inherit; line-height: inherit; font-family: inherit; font=
-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: i=
nherit; font-variation-settings: inherit; font-size: 13px; vertical-align=
: baseline; box-sizing: inherit; color: var(--highlight-variable);">"<spa=
n class=3D"hljs-variable" style=3D"margin: 0px; padding: 0px; border: 0px=
; font-style: inherit; font-variant: inherit; font-weight: inherit; font-=
stretch: inherit; line-height: inherit; font-family: inherit; font-optica=
l-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit;=
 font-variation-settings: inherit; font-size: 13px; vertical-align: basel=
ine; box-sizing: inherit; color: var(--highlight-variable);">$entry</span=
>"</span>
<span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: =
0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo=
nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt=
ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher=
it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba=
seline; box-sizing: inherit; color: var(--highlight-keyword);">done</span=
></code></pre>
    <p></p>
    <p><br>
    </p>
    <p>To list the files in the directory though this might be Bash and
      not Csh</p>
    <p><br>
    </p>
    <p>Otherwise clunkily (my scripting style is pretty rubbish and non
      efficient), I could do something like (it probably won't work!):</p=
>
    <p><br>
    </p>
    <p>#!/bin/sh<br>
    </p>
    <p><br>
    </p>
    <p>#fb =3D file base</p>
    <p>#fm - file merge - file that has already been merged using rsync
      unless size was different<br>
    </p>
    <p><br>
    </p>
    <p>dir_base=3D/dir<br>
      for fb in "$dir_base"/*<br>
      do<br>
      =C2=A0 echo "$fs"<br>
      done</p>
    <p><br>
    </p>
    <p>dir_merge=3D/dir_1<br>
      for fm in "$dir_merge"/*<br>
      do<br>
      =C2=A0 echo "$fm"<br>
      done</p>
    <p><br>
    </p>
    <p>=C2=A0 do<br>
      <br>
      =C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb`<br>
      =C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm`<br>
      =C2=A0 =C2=A0 [ $ref -eq $src ] &amp;&amp; rm -f $dir_merge/$fm<br>=

      <br>
      =C2=A0 done</p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p>Regards,</p>
    <p><br>
    </p>
    <p>Kaya<br>
    </p>
  </body>
</html>

--------------iLtfd7qrOG0ADWnzwCu037z0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ef0328b0-caab-b6a2-5b33-1ab069a07f80>