Date: Fri, 5 May 2023 03:30:14 +0100 From: Kaya Saman <kayasaman@optiplex-networks.com> To: Paul Procacci <pprocacci@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> In-Reply-To: <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On 5/5/23 03:08, Paul Procacci wrote: > There are multiple reasons why it may not work. My guess is because > the potential for characters that could be showing up within the > filenames and whatnot. > > This can be solved with an interpreted language that's a bit more > forgiving. > Take the following perl script. It does the same thing as the shell > script (almost). It renames the source file instead of making a copy > of it. > > run as: ./test.pl <http://test.pl> /absolute/path/to/master_dir > /absolute_path_to_dir_x > > ################################################################################### > > #!/usr/bin/env perl > > use strict; > use warnings; > > sub msgDie > { > my ($ret) = shift; > my ($msg) = shift // "$0 dir_base dir\n"; > print $msg; > exit($ret); > } > > msgDie(1) unless(scalar @ARGV eq 2); > > my $base = $ARGV[0]; > my $dir = $ARGV[1]; > > msgDie(1, "base directory doesn't exist\n") unless -d $base; > msgDie(1, "source directory doesn't exist\n") unless -d $dir; > > opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); > while(readdir $dh) > { > next if($_ eq '.' || $_ eq '..'); > if( ! -f "$base/$_" ){ > rename("$dir/$_", "$base/$_"); > next; > } > > my ($ref) = (stat("$base/$_"))[7]; > my ($src) = (stat("$dir/$_"))[7]; > unlink("$dir/$_") if($ref == $src); > } > ################################################################################### > > ~Paul > > This didn't seem to work :-( What exactly happened is this: I created a set of test directories in /tmp So, I have /tmp/test1 and /tmp/test2 to mimic the structure of the directories I intend to run this thing I did this: create a subdir called: dupdir in /tmp/test1 and /tmp/test2 /tmp/test2/dupdir contains these files: dup and dup1 /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference. I then ran: ./test.pl /tmp/test1 /tmp/test2 The expected behavior is that I should retain the file 'dup' in test1 while 'dup1' should be removed. In my actual file system I have many of these subdirs, so a fair test would probably be something like creating: /tmp/test1/dupdir1 /tmp/test2/dupdir1 /tmp/test1/dupdir2 /tmp/test2/dupdir2 then putting the file dup into dupdir1 and dup1 into dupdir2 I guess my issue is complex?? If I only I had used the --remove-source-files option during my initial rsync then I wouldn't have had to worry about any of this since I used the --ignore-existing option so that would have done the trick initially, but I decided to play safe instead and now ended up with a slight headache on my hands. [-- Attachment #2 --] <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <p><br> </p> <div class="moz-cite-prefix">On 5/5/23 03:08, Paul Procacci wrote:<br> </div> <blockquote type="cite" cite="mid:CAFbbPuhoMOM=wp26yZ42e9xnRP+tJ6B30y8+Ba3eCBh2v66Grw@mail.gmail.com"> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <div dir="ltr"> <div>There are multiple reasons why it may not work. My guess is because the potential for characters that could be showing up within the filenames and whatnot.<br> <br> </div> <div>This can be solved with an interpreted language that's a bit more forgiving.<br> </div> <div>Take the following perl script. It does the same thing as the shell script (almost). It renames the source file instead of making a copy of it.<br> <br> run as: ./<a href="http://test.pl" moz-do-not-send="true">test.pl</a> /absolute/path/to/master_dir /absolute_path_to_dir_x<br> </div> <div><br> </div> <div> ################################################################################### <br> #!/usr/bin/env perl<br> <br> use strict;<br> use warnings;<br> <br> sub msgDie<br> {<br> my ($ret) = shift;<br> my ($msg) = shift // "$0 dir_base dir\n";<br> print $msg;<br> exit($ret);<br> }<br> <br> msgDie(1) unless(scalar @ARGV eq 2);<br> <br> my $base = $ARGV[0];<br> my $dir = $ARGV[1];<br> <br> msgDie(1, "base directory doesn't exist\n") unless -d $base;<br> msgDie(1, "source directory doesn't exist\n") unless -d $dir;<br> <br> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");<br> while(readdir $dh)<br> {<br> next if($_ eq '.' || $_ eq '..');<br> if( ! -f "$base/$_" ){<br> rename("$dir/$_", "$base/$_");<br> next;<br> }<br> <br> my ($ref) = (stat("$base/$_"))[7];<br> my ($src) = (stat("$dir/$_"))[7];<br> unlink("$dir/$_") if($ref == $src);<br> }<br> ###################################################################################<br> <br> </div> <div>~Paul<br> </div> </div> <br> <br> </blockquote> <p><br> </p> <p>This didn't seem to work :-(</p> <p><br> </p> <p>What exactly happened is this:</p> <p><br> </p> <p>I created a set of test directories in /tmp</p> <p><br> </p> <p>So, I have /tmp/test1 and /tmp/test2</p> <p><br> </p> <p>to mimic the structure of the directories I intend to run this thing I did this:</p> <p><br> </p> <p>create a subdir called: dupdir in /tmp/test1 and /tmp/test2</p> <p><br> </p> <p>/tmp/test2/dupdir contains these files: dup and dup1</p> <p><br> </p> <p>/tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.<br> </p> <p><br> </p> <p>However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference.</p> <p><br> </p> <p>I then ran: ./test.pl /tmp/test1 /tmp/test2</p> <p><br> </p> <p>The expected behavior is that I should retain the file 'dup' in test1 while 'dup1' should be removed.</p> <p><br> </p> <p>In my actual file system I have many of these subdirs, so a fair test would probably be something like creating:</p> <p>/tmp/test1/dupdir1</p> <p>/tmp/test2/dupdir1</p> <p>/tmp/test1/dupdir2</p> <p>/tmp/test2/dupdir2</p> <p><br> </p> <p>then putting the file dup into dupdir1 and dup1 into dupdir2</p> <p><br> </p> <p>I guess my issue is complex?? If I only I had used the --remove-source-files option during my initial rsync then I wouldn't have had to worry about any of this since I used the --ignore-existing option so that would have done the trick initially, but I decided to play safe instead and now ended up with a slight headache on my hands.<br> </p> </body> </html>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fd9aa7d3-f6a7-2274-f970-d4421d187855>
