Date: Fri, 5 May 2023 04:20:23 +0100 From: Kaya Saman <kayasaman@optiplex-networks.com> To: Paul Procacci <pprocacci@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <eda13374-48c1-1749-3a73-530370934eff@optiplex-networks.com> In-Reply-To: <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------d0U3TqYER7CIuMFlsBlTFHlk Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 5/5/23 04:01, Paul Procacci wrote: > On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman=20 > <kayasaman@optiplex-networks.com> wrote: > > > On 5/5/23 03:08, Paul Procacci wrote: >> There are multiple reasons why it may not work.=C2=A0 My guess is >> because the potential for characters that could be showing up >> within the filenames and whatnot. >> >> This can be solved with an interpreted language that's a bit more >> forgiving. >> Take the following perl script.=C2=A0 It does the same thing as th= e >> shell script (almost).=C2=A0 It renames the source file instead of >> making a copy of it. >> >> run as:=C2=A0 ./test.pl <http://test.pl> /absolute/path/to/master_= dir >> /absolute_path_to_dir_x >> >> ##################################################################= ################# >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings; >> >> sub msgDie >> { >> =C2=A0 my ($ret) =3D shift; >> =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n"; >> =C2=A0 print $msg; >> =C2=A0 exit($ret); >> } >> >> msgDie(1) unless(scalar @ARGV eq 2); >> >> my $base =3D $ARGV[0]; >> my $dir =C2=A0=3D $ARGV[1]; >> >> msgDie(1, "base directory doesn't exist\n") unless -d $base; >> msgDie(1, "source directory doesn't exist\n") unless -d $dir; >> >> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"= ); >> while(readdir $dh) >> { >> =C2=A0 next if($_ eq '.' || $_ eq '..'); >> =C2=A0 if( ! -f "$base/$_" ){ >> =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_"); >> =C2=A0 =C2=A0 next; >> =C2=A0 } >> >> =C2=A0 my ($ref) =3D (stat("$base/$_"))[7]; >> =C2=A0 my ($src) =3D (stat("$dir/$_"))[7]; >> =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src); >> } >> ##################################################################= ################# >> >> ~Paul >> >> > > This didn't seem to work :-( > > > What exactly happened is this: > > > I created a set of test directories in /tmp > > > So, I have /tmp/test1 and /tmp/test2 > > > to mimic the structure of the directories I intend to run this > thing I did this: > > > create a subdir called: dupdir in /tmp/test1 and /tmp/test2 > > > /tmp/test2/dupdir contains these files: dup and dup1 > > > /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 fi= le. > > > However*, now things get interesting as dup from test1 contains > "1234567" and dup from test2 contains "111" <- this is to simulate > the file size difference. > > > > > > > Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then. > > rsync --ignore-existing --remove-source-files=C2=A0 /src /dest > |This would at the very least move non-existent files from the source=20 > over to the dest AND remove those source files AFTER the transfer=20 > happens. | > |You'll be 1/2 way there doing that. What you'll be left with are file=20 > that exist in BOTH src AND DEST. | > |~Paul | Paul, I think we've got wires crossed.... I *have* already performed the rsync. Apologies if I wasn't clear! The problem I am faced with is that the destination directory is already=20 populated with the information from 3 source directories. I need to remove the sync'ed files in the source directories and leave=20 files that match in name but are of different sizes. The problem is I can't use rsync again for this as there aren't any=20 options to simply compare files based on size. I can't use the=20 --existing option as the files exist in both directories.... This is the dilemma I am facing: ls -l /merged_dir/folder/ 234904506 - file 'a' ls -l /source_dir/folder/ 1080918146 - file 'a' so in this case file 'a' is in both directories with the same name but=20 different size. I need to keep both versions. However, *if* they were=20 the same size then remove the file in the source_dir..... That's all.. I don't need to transfer anything or copy anything at=20 all... just compare and remove files of same name and size. Hopefully I am explaining better and things are more clear? Again I=20 apologize for the confusion=C2=A0 :-( --------------d0U3TqYER7CIuMFlsBlTFHlk Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <html> <head> <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF= -8"> </head> <body> <p><br> </p> <div class=3D"moz-cite-prefix">On 5/5/23 04:01, Paul Procacci wrote:<= br> </div> <blockquote type=3D"cite" cite=3D"mid:CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=3D5L7TiVSg@mail.gm= ail.com"> <meta http-equiv=3D"content-type" content=3D"text/html; charset=3DU= TF-8"> <div dir=3D"ltr"> <div>On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman <<a href=3D"mailto:kayasaman@optiplex-networks.com" target=3D"_blank" moz-do-not-send=3D"true" class=3D"moz-txt-link-freetext">kayasaman@optiplex-networks.c= om</a>> wrote: <div class=3D"gmail_quote"> <blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div> <p><br> </p> <div>On 5/5/23 03:08, Paul Procacci wrote:<br> </div> <blockquote type=3D"cite"> <div dir=3D"ltr"> <div>There are multiple reasons why it may not work.=C2=A0 My guess is because the potential for characters that could be showing up within the filenames and whatnot.<br> <br> </div> <div>This can be solved with an interpreted language that's a bit more forgiving.<br> </div> <div>Take the following perl script.=C2=A0 It does th= e same thing as the shell script (almost).=C2=A0 It renames the source file instead of making a copy of it.<br> <br> run as:=C2=A0 ./<a href=3D"http://test.pl" target=3D"_blank" moz-do-not-send=3D"true">test.p= l</a> /absolute/path/to/master_dir /absolute_path_to_dir_x<br> </div> <div><br> </div> <div> #########################################################################= ########## <br> #!/usr/bin/env perl<br> <br> use strict;<br> use warnings;<br> <br> sub msgDie<br> {<br> =C2=A0 my ($ret) =3D shift;<br> =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";<= br> =C2=A0 print $msg;<br> =C2=A0 exit($ret);<br> }<br> <br> msgDie(1) unless(scalar @ARGV eq 2);<br> <br> my $base =3D $ARGV[0];<br> my $dir =C2=A0=3D $ARGV[1];<br> <br> msgDie(1, "base directory doesn't exist\n") unless -d $base;<br> msgDie(1, "source directory doesn't exist\n") unless -d $dir;<br> <br> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");<br> while(readdir $dh)<br> {<br> =C2=A0 next if($_ eq '.' || $_ eq '..');<br> =C2=A0 if( ! -f "$base/$_" ){<br> =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");<br> =C2=A0 =C2=A0 next;<br> =C2=A0 }<br> <br> =C2=A0 my ($ref) =3D (stat("$base/$_"))[7];<br> =C2=A0 my ($src) =3D (stat("$dir/$_"))[7];<br> =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);<br> }<br> #########################################################################= ##########<br> <br> </div> <div>~Paul<br> </div> </div> <br> <br> </blockquote> <p><br> </p> <p>This didn't seem to work :-(</p> <p><br> </p> <p>What exactly happened is this:</p> <p><br> </p> <p>I created a set of test directories in /tmp</p> <p><br> </p> <p>So, I have /tmp/test1 and /tmp/test2</p> <p><br> </p> <p>to mimic the structure of the directories I intend to run this thing I did this:</p> <p><br> </p> <p>create a subdir called: dupdir in /tmp/test1 and /tmp/test2</p> <p><br> </p> <p>/tmp/test2/dupdir contains these files: dup and dup1</= p> <p><br> </p> <p>/tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.<br> </p> <p><br> </p> <p>However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference.</p> <p><br> <br> <br> <br> </p> </div> </blockquote> <div>=C2=A0<br> </div> <div>Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.</= div> <div><br> </div> <div>rsync --ignore-existing --remove-source-files=C2=A0 /src /dest<br> <pre><code>This would at the very least move non-existent f= iles from the source over to the dest AND remove those source files AFTER= the transfer happens. </code></pre> <pre><code>You'll be 1/2 way there doing that. What you'll= be left with are file that exist in BOTH src AND DEST. </code></pre> <pre><code>~Paul </code></pre> </div> </div> </div> </div> </blockquote> <p><br> </p> <p>Paul, I think we've got wires crossed....</p> <p><br> </p> <p>I *have* already performed the rsync. Apologies if I wasn't clear!</p> <p><br> </p> <p>The problem I am faced with is that the destination directory is already populated with the information from 3 source directories.</= p> <p><br> </p> <p>I need to remove the sync'ed files in the source directories and leave files that match in name but are of different sizes.</p> <p><br> </p> <p>The problem is I can't use rsync again for this as there aren't any options to simply compare files based on size. I can't use the --existing option as the files exist in both directories....<br> </p> <p><br> </p> <p>This is the dilemma I am facing:</p> <p><br> </p> <p>ls -l /merged_dir/folder/</p> <p>234904506 - file 'a'</p> <p><br> </p> <p>ls -l /source_dir/folder/</p> <p>1080918146 - file 'a'</p> <p><br> </p> <p>so in this case file 'a' is in both directories with the same name but different size. I need to keep both versions. However, *if* they were the same size then remove the file in the source_dir.....</p> <p><br> </p> <p>That's all.. I don't need to transfer anything or copy anything at all... just compare and remove files of same name and size.</p> <p><br> </p> <p>Hopefully I am explaining better and things are more clear? Again I apologize for the confusion=C2=A0 :-(<br> </p> </body> </html> --------------d0U3TqYER7CIuMFlsBlTFHlk--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?eda13374-48c1-1749-3a73-530370934eff>