Date: Fri, 5 May 2023 00:53:14 +0100 From: Kaya Saman <kayasaman@optiplex-networks.com> To: Paul Procacci <pprocacci@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> In-Reply-To: <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------iLtfd7qrOG0ADWnzwCu037z0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 5/4/23 23:32, Paul Procacci wrote: > > > On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman=20 > <kayasaman@optiplex-networks.com> wrote: > > > On 5/4/23 17:29, Paul Procacci wrote: >> >> >> On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman >> <kayasaman@optiplex-networks.com> wrote: >> >> Hi, >> >> >> I'm wondering if anyone knows of a tool like diff or so that >> can also >> delete files based on name and size from either left/right or >> source/destination directory? >> >> >> Basically what I have done is performed an rsync without >> using the >> --remove-source-files option onto a newly bought and created >> disk pool >> (yes zpool) that i am trying to consolidate my data - as it's >> currently >> spread out over multiple pools with the same folder name. >> >> >> The issue I am facing mainly is that I perform another rsync >> and use the >> --remove-source-files option, rsync will delete files based >> on name >> while there are some files that have the same name but not >> same size and >> I would like to retain these files. >> >> >> Right now I have looked at many different options in both >> rsync and >> other tools but found nothing suitable. I even tested using a >> few test >> dirs and files that I put into /tmp and whatever I tried, the >> files of >> different size either got transferred or deleted. >> >> >> How would be a good way to approach this problem? >> >> >> Even if I create some kind of shell script and use diff, I >> think it will >> only compare names and not file sizes. >> >> >> I'm really lost here.... >> >> >> Regards, >> >> >> Kaya >> >> >> >> >> It sounds like you want fdupes.=C2=A0 It's in the ports tree. >> >> ~Paul >> >> --=20 >> __________________ >> >> :(){ :|:& };: > > > > I tried fdupes and installed it a while back. For me it felt like > it only works on a single directory. > > > My dir structure is that I have" > > > /dir <- main directory where everything has now been rsync'ed to > > /dir_1 <- old directory with partial content > > /dir_2 <- more partial content > > /dir_3 <- more partial content > > > The key thing here is that I need to compare: > > > /dir_(x) with /dir > > > if the files are different sizes in /dir_(x) then leave them, > otherwise delete if both name and file size are the same. > > > Then a tiny shell script does the job assuming your files don't have=20 > any spaces and no weird characters exist: > > #!/bin/sh > > for i in b c d; > do > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$file a/$file && continue > > =C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file` > =C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file` > =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file > > =C2=A0 done > done > > Change paths accordingly and backup your stuff. ;) > > ~Paul > > --=20 > __________________ > > :(){ :|:& };: Thanks Paul, I should be able to work with this. There are actually spaces and weird=20 characters in the file names so I assume doing something like "file"=20 should allow for that? I don't think I need the line after the 'do' statement do I? From what I=20 understand it copies the file from directory i to directory a? As I=20 explained initially, the files have already been rsync'ed so I just need=20 to compare and delete accordingly. When I performed the rsync it took around a week to complete per run,=20 currently zfs list shows around 12TB usage for my /dir but that's with=20 compression enabled, of the merged directory. A quick Google shows that I can use something like this: |search_dir=3D/the/path/to/base/dir for entry in "$search_dir"/* do echo=20 "$entry" done| To list the files in the directory though this might be Bash and not Csh Otherwise clunkily (my scripting style is pretty rubbish and non=20 efficient), I could do something like (it probably won't work!): #!/bin/sh #fb =3D file base #fm - file merge - file that has already been merged using rsync unless=20 size was different dir_base=3D/dir for fb in "$dir_base"/* do =C2=A0 echo "$fs" done dir_merge=3D/dir_1 for fm in "$dir_merge"/* do =C2=A0 echo "$fm" done =C2=A0 do =C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb` =C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm` =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $dir_merge/$fm =C2=A0 done Regards, Kaya --------------iLtfd7qrOG0ADWnzwCu037z0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF= -8"> </head> <body> <p><br> </p> <div class=3D"moz-cite-prefix">On 5/4/23 23:32, Paul Procacci wrote:<= br> </div> <blockquote type=3D"cite" cite=3D"mid:CAFbbPuiNqYLLg8wcg8S_3=3Dy46osb06+duHqY9f0n=3DOuRgGVY=3Dw@mai= l.gmail.com"> <meta http-equiv=3D"content-type" content=3D"text/html; charset=3DU= TF-8"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"><br> </div> <br> <div class=3D"gmail_quote"> <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman <<a href=3D"mailto:kayasaman@optiplex-networks.com" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">= kayasaman@optiplex-networks.com</a>> wrote:<br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px= 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div> <p><br> </p> <div>On 5/4/23 17:29, Paul Procacci wrote:<br> </div> <blockquote type=3D"cite"> <div dir=3D"ltr"> <div> <div dir=3D"ltr"><br> </div> <br> <div class=3D"gmail_quote"> <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May= 4, 2023 at 11:53=E2=80=AFAM Kaya Saman <<a href=3D"mailto:kayasaman@optiplex-networks.co= m" target=3D"_blank" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">kayasaman@opt= iplex-networks.com</a>> wrote:<br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br> <br> <br> I'm wondering if anyone knows of a tool like diff or so that can also <br> delete files based on name and size from either left/right or <br> source/destination directory?<br> <br> <br> Basically what I have done is performed an rsync without using the <br> --remove-source-files option onto a newly bought and created disk pool <br> (yes zpool) that i am trying to consolidate my data - as it's currently <br> spread out over multiple pools with the same folder name.<br> <br> <br> The issue I am facing mainly is that I perform another rsync and use the <br> --remove-source-files option, rsync will delete files based on name <br> while there are some files that have the same name but not same size and <br> I would like to retain these files.<br> <br> <br> Right now I have looked at many different options in both rsync and <br> other tools but found nothing suitable. I even tested using a few test <br> dirs and files that I put into /tmp and whatever I tried, the files of <br> different size either got transferred or deleted.<br> <br> <br> How would be a good way to approach this problem?<br> <br> <br> Even if I create some kind of shell script and use diff, I think it will <br> only compare names and not file sizes.<br> <br> <br> I'm really lost here....<br> <br> <br> Regards,<br> <br> <br> Kaya<br> <br> <br> <br> </blockquote> </div> <br> </div> <div>It sounds like you want fdupes.=C2=A0 It's in th= e ports tree.</div> <div><br> </div> <div>~Paul<br> </div> <div><br> <span>-- </span><br> <div dir=3D"ltr">__________________<br> <br> :(){ :|:& };:</div> </div> </div> </blockquote> <p><br> </p> <p><br> </p> <p>I tried fdupes and installed it a while back. For me it felt like it only works on a single directory.</p> <p><br> </p> <p>My dir structure is that I have"</p> <p><br> </p> <p>/dir <- main directory where everything has now been rsync'ed to<br> </p> <p>/dir_1 <- old directory with partial content<br> </p> <p>/dir_2 <- more partial content<br> </p> <p>/dir_3 <- more partial content</p> <p><br> </p> <p>The key thing here is that I need to compare:</p> <p><br> </p> <p>/dir_(x) with /dir</p> <p><br> </p> <p>if the files are different sizes in /dir_(x) then leave them, otherwise delete if both name and file size are the same.<br> </p> </div> </blockquote> </div> <br> Then a tiny shell script does the job assuming your files don't have any spaces and no weird characters exist:<br> <br clear=3D"all"> #!/bin/sh<br> <br> for i in b c d;<br> do<br> =C2=A0 ls $i/ | while read file;<br> =C2=A0 do<br> =C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$file a/$file &= amp;& continue<br> <br> =C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file`<br> =C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`<br> =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file<br> <br> =C2=A0 done<br> done<br> <br> </div> <div>Change paths accordingly and backup your stuff. ;)</div> <div><br> </div> <div>~Paul<br> </div> <div><br> <span class=3D"gmail_signature_prefix">-- </span><br> <div dir=3D"ltr" class=3D"gmail_signature">__________________<b= r> <br> :(){ :|:& };:</div> </div> </div> </blockquote> <p><br> </p> <p>Thanks Paul,</p> <p><br> </p> <p>I should be able to work with this. There are actually spaces and weird characters in the file names so I assume doing something like "file" should allow for that?</p> <p><br> </p> <p>I don't think I need the line after the 'do' statement do I? From what I understand it copies the file from directory i to directory a? As I explained initially, the files have already been rsync'ed so I just need to compare and delete accordingly.</p> <p>When I performed the rsync it took around a week to complete per run, currently zfs list shows around 12TB usage for my /dir but that's with compression enabled, of the merged directory.</p> <p><br> </p> <p>A quick Google shows that I can use something like this:</p> <pre class=3D"lang-bash s-code-block" style=3D"margin: 0px; padding: = var(--su12); border: 0px; font-style: normal; font-variant-ligatures: nor= mal; font-variant-caps: normal; font-variant-numeric: inherit; font-varia= nt-east-asian: inherit; font-variant-alternates: inherit; font-weight: 40= 0; font-stretch: inherit; line-height: var(--lh-md); font-family: var(--f= f-mono); font-optical-sizing: inherit; font-kerning: inherit; font-featur= e-settings: inherit; font-variation-settings: inherit; font-size: var(--f= s-body1); vertical-align: baseline; box-sizing: inherit; width: auto; max= -height: 600px; overflow: auto; background-color: var(--highlight-bg); bo= rder-radius: var(--br-md); --_cb-line-numbers-bg: var(--black-050); color= : var(--highlight-color); overflow-wrap: normal; letter-spacing: normal; = orphans: 2; text-align: left; text-indent: 0px; text-transform: none; wid= ows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoratio= n-thickness: initial; text-decoration-style: initial; text-decoration-col= or: initial;"><code class=3D"hljs language-bash" style=3D"margin: 0px; pa= dding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font= -weight: inherit; font-stretch: inherit; line-height: inherit; font-famil= y: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-fea= ture-settings: inherit; font-variation-settings: inherit; font-size: var(= --_pr-code-fs); vertical-align: baseline; box-sizing: inherit; background= -color: transparent; white-space: inherit;">search_dir=3D/the/path/to/bas= e/dir <span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: = 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo= nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt= ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher= it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba= seline; box-sizing: inherit; color: var(--highlight-keyword);">for</span>= entry <span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; b= order: 0px; font-style: inherit; font-variant: inherit; font-weight: inhe= rit; font-stretch: inherit; line-height: inherit; font-family: inherit; f= ont-optical-sizing: inherit; font-kerning: inherit; font-feature-settings= : inherit; font-variation-settings: inherit; font-size: 13px; vertical-al= ign: baseline; box-sizing: inherit; color: var(--highlight-keyword);">in<= /span> <span class=3D"hljs-string" style=3D"margin: 0px; padding: 0px; bo= rder: 0px; font-style: inherit; font-variant: inherit; font-weight: inher= it; font-stretch: inherit; line-height: inherit; font-family: inherit; fo= nt-optical-sizing: inherit; font-kerning: inherit; font-feature-settings:= inherit; font-variation-settings: inherit; font-size: 13px; vertical-ali= gn: baseline; box-sizing: inherit; color: var(--highlight-variable);">"<s= pan class=3D"hljs-variable" style=3D"margin: 0px; padding: 0px; border: 0= px; font-style: inherit; font-variant: inherit; font-weight: inherit; fon= t-stretch: inherit; line-height: inherit; font-family: inherit; font-opti= cal-sizing: inherit; font-kerning: inherit; font-feature-settings: inheri= t; font-variation-settings: inherit; font-size: 13px; vertical-align: bas= eline; box-sizing: inherit; color: var(--highlight-variable);">$search_di= r</span>"</span>/* <span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: = 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo= nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt= ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher= it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba= seline; box-sizing: inherit; color: var(--highlight-keyword);">do</span> <span class=3D"hljs-built_in" style=3D"margin: 0px; padding: 0px; borde= r: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit;= font-stretch: inherit; line-height: inherit; font-family: inherit; font-= optical-sizing: inherit; font-kerning: inherit; font-feature-settings: in= herit; font-variation-settings: inherit; font-size: 13px; vertical-align:= baseline; box-sizing: inherit; color: var(--highlight-literal);">echo</s= pan> <span class=3D"hljs-string" style=3D"margin: 0px; padding: 0px; bord= er: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit= ; font-stretch: inherit; line-height: inherit; font-family: inherit; font= -optical-sizing: inherit; font-kerning: inherit; font-feature-settings: i= nherit; font-variation-settings: inherit; font-size: 13px; vertical-align= : baseline; box-sizing: inherit; color: var(--highlight-variable);">"<spa= n class=3D"hljs-variable" style=3D"margin: 0px; padding: 0px; border: 0px= ; font-style: inherit; font-variant: inherit; font-weight: inherit; font-= stretch: inherit; line-height: inherit; font-family: inherit; font-optica= l-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit;= font-variation-settings: inherit; font-size: 13px; vertical-align: basel= ine; box-sizing: inherit; color: var(--highlight-variable);">$entry</span= >"</span> <span class=3D"hljs-keyword" style=3D"margin: 0px; padding: 0px; border: = 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; fo= nt-stretch: inherit; line-height: inherit; font-family: inherit; font-opt= ical-sizing: inherit; font-kerning: inherit; font-feature-settings: inher= it; font-variation-settings: inherit; font-size: 13px; vertical-align: ba= seline; box-sizing: inherit; color: var(--highlight-keyword);">done</span= ></code></pre> <p></p> <p><br> </p> <p>To list the files in the directory though this might be Bash and not Csh</p> <p><br> </p> <p>Otherwise clunkily (my scripting style is pretty rubbish and non efficient), I could do something like (it probably won't work!):</p= > <p><br> </p> <p>#!/bin/sh<br> </p> <p><br> </p> <p>#fb =3D file base</p> <p>#fm - file merge - file that has already been merged using rsync unless size was different<br> </p> <p><br> </p> <p>dir_base=3D/dir<br> for fb in "$dir_base"/*<br> do<br> =C2=A0 echo "$fs"<br> done</p> <p><br> </p> <p>dir_merge=3D/dir_1<br> for fm in "$dir_merge"/*<br> do<br> =C2=A0 echo "$fm"<br> done</p> <p><br> </p> <p>=C2=A0 do<br> <br> =C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb`<br> =C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm`<br> =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $dir_merge/$fm<br>= <br> =C2=A0 done</p> <p><br> </p> <p><br> </p> <p>Regards,</p> <p><br> </p> <p>Kaya<br> </p> </body> </html> --------------iLtfd7qrOG0ADWnzwCu037z0--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ef0328b0-caab-b6a2-5b33-1ab069a07f80>