Date: Thu, 4 May 2023 18:32:04 -0400 From: Paul Procacci <pprocacci@gmail.com> To: Kaya Saman <kayasaman@optiplex-networks.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> In-Reply-To: <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000009a126805fae5be5a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman <kayasaman@optiplex-netwo= rks.com> wrote: > > On 5/4/23 17:29, Paul Procacci wrote: > > > > On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman < > kayasaman@optiplex-networks.com> wrote: > >> Hi, >> >> >> I'm wondering if anyone knows of a tool like diff or so that can also >> delete files based on name and size from either left/right or >> source/destination directory? >> >> >> Basically what I have done is performed an rsync without using the >> --remove-source-files option onto a newly bought and created disk pool >> (yes zpool) that i am trying to consolidate my data - as it's currently >> spread out over multiple pools with the same folder name. >> >> >> The issue I am facing mainly is that I perform another rsync and use the >> --remove-source-files option, rsync will delete files based on name >> while there are some files that have the same name but not same size and >> I would like to retain these files. >> >> >> Right now I have looked at many different options in both rsync and >> other tools but found nothing suitable. I even tested using a few test >> dirs and files that I put into /tmp and whatever I tried, the files of >> different size either got transferred or deleted. >> >> >> How would be a good way to approach this problem? >> >> >> Even if I create some kind of shell script and use diff, I think it will >> only compare names and not file sizes. >> >> >> I'm really lost here.... >> >> >> Regards, >> >> >> Kaya >> >> >> >> > It sounds like you want fdupes. It's in the ports tree. > > ~Paul > > -- > __________________ > > :(){ :|:& };: > > > > I tried fdupes and installed it a while back. For me it felt like it only > works on a single directory. > > > My dir structure is that I have" > > > /dir <- main directory where everything has now been rsync'ed to > > /dir_1 <- old directory with partial content > > /dir_2 <- more partial content > > /dir_3 <- more partial content > > > The key thing here is that I need to compare: > > > /dir_(x) with /dir > > > if the files are different sizes in /dir_(x) then leave them, otherwise > delete if both name and file size are the same. > Then a tiny shell script does the job assuming your files don't have any spaces and no weird characters exist: #!/bin/sh for i in b c d; do ls $i/ | while read file; do [ ! -f a/$file ] && cp $i/$file a/$file && continue ref=3D`stat -f '%z' a/$file` src=3D`stat -f '%z' %i/$file` [ $ref -eq $src ] && rm -f $i/file done done Change paths accordingly and backup your stuff. ;) ~Paul --=20 __________________ :(){ :|:& };: --0000000000009a126805fae5be5a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><div dir=3D"ltr"><br></div><br><div class=3D"gmail_qu= ote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at 5:47=E2= =80=AFPM Kaya Saman <<a href=3D"mailto:kayasaman@optiplex-networks.com">= kayasaman@optiplex-networks.com</a>> wrote:<br></div><blockquote class= =3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg= b(204,204,204);padding-left:1ex"> =20 =20 =20 <div> <p><br> </p> <div>On 5/4/23 17:29, Paul Procacci wrote:<br> </div> <blockquote type=3D"cite"> =20 <div dir=3D"ltr"> <div> <div dir=3D"ltr"><br> </div> <br> <div class=3D"gmail_quote"> <div dir=3D"ltr" class=3D"gmail_attr">On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman <<a href=3D"mailto:kayasaman@o= ptiplex-networks.com" target=3D"_blank">kayasaman@optiplex-networks.com</a>= > wrote:<br> </div> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0= .8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br> <br> <br> I'm wondering if anyone knows of a tool like diff or so that can also <br> delete files based on name and size from either left/right or <br> source/destination directory?<br> <br> <br> Basically what I have done is performed an rsync without using the <br> --remove-source-files option onto a newly bought and created disk pool <br> (yes zpool) that i am trying to consolidate my data - as it's currently <br> spread out over multiple pools with the same folder name.<br> <br> <br> The issue I am facing mainly is that I perform another rsync and use the <br> --remove-source-files option, rsync will delete files based on name <br> while there are some files that have the same name but not same size and <br> I would like to retain these files.<br> <br> <br> Right now I have looked at many different options in both rsync and <br> other tools but found nothing suitable. I even tested using a few test <br> dirs and files that I put into /tmp and whatever I tried, the files of <br> different size either got transferred or deleted.<br> <br> <br> How would be a good way to approach this problem?<br> <br> <br> Even if I create some kind of shell script and use diff, I think it will <br> only compare names and not file sizes.<br> <br> <br> I'm really lost here....<br> <br> <br> Regards,<br> <br> <br> Kaya<br> <br> <br> <br> </blockquote> </div> <br> </div> <div>It sounds like you want fdupes.=C2=A0 It's in the ports tr= ee.</div> <div><br> </div> <div>~Paul<br> </div> <div><br> <span>-- </span><br> <div dir=3D"ltr">__________________<br> <br> :(){ :|:& };:</div> </div> </div> </blockquote> <p><br> </p> <p><br> </p> <p>I tried fdupes and installed it a while back. For me it felt like it only works on a single directory.</p> <p><br> </p> <p>My dir structure is that I have"</p> <p><br> </p> <p>/dir <- main directory where everything has now been rsync'ed to<br> </p> <p>/dir_1 <- old directory with partial content<br> </p> <p>/dir_2 <- more partial content<br> </p> <p>/dir_3 <- more partial content</p> <p><br> </p> <p>The key thing here is that I need to compare:</p> <p><br> </p> <p>/dir_(x) with /dir</p> <p><br> </p> <p>if the files are different sizes in /dir_(x) then leave them, otherwise delete if both name and file size are the same.<br> </p> </div> </blockquote></div><br>Then a tiny shell script does the job assuming your = files don't have any spaces and no weird characters exist:<br><br clear= =3D"all">#!/bin/sh<br><br>for i in b c d;<br>do<br>=C2=A0 ls $i/ | while re= ad file;<br>=C2=A0 do<br>=C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$f= ile a/$file && continue<br><br>=C2=A0 =C2=A0 ref=3D`stat -f '%z= ' a/$file`<br>=C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`<br>= =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file<br><br>=C2=A0 done= <br>done<br><br></div><div>Change paths accordingly and backup your stuff. = ;)</div><div><br></div><div>~Paul<br></div><div><br><span class=3D"gmail_si= gnature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature">__= ________________<br><br>:(){ :|:& };:</div></div></div> --0000000000009a126805fae5be5a--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w>