Date: Fri, 5 May 2023 00:53:14 +0100 From: Kaya Saman <kayasaman@optiplex-networks.com> To: Paul Procacci <pprocacci@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> In-Reply-To: <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On 5/4/23 23:32, Paul Procacci wrote:
>
>
> On Thu, May 4, 2023 at 5:47 PM Kaya Saman
> <kayasaman@optiplex-networks.com> wrote:
>
>
> On 5/4/23 17:29, Paul Procacci wrote:
>>
>>
>> On Thu, May 4, 2023 at 11:53 AM Kaya Saman
>> <kayasaman@optiplex-networks.com> wrote:
>>
>> Hi,
>>
>>
>> I'm wondering if anyone knows of a tool like diff or so that
>> can also
>> delete files based on name and size from either left/right or
>> source/destination directory?
>>
>>
>> Basically what I have done is performed an rsync without
>> using the
>> --remove-source-files option onto a newly bought and created
>> disk pool
>> (yes zpool) that i am trying to consolidate my data - as it's
>> currently
>> spread out over multiple pools with the same folder name.
>>
>>
>> The issue I am facing mainly is that I perform another rsync
>> and use the
>> --remove-source-files option, rsync will delete files based
>> on name
>> while there are some files that have the same name but not
>> same size and
>> I would like to retain these files.
>>
>>
>> Right now I have looked at many different options in both
>> rsync and
>> other tools but found nothing suitable. I even tested using a
>> few test
>> dirs and files that I put into /tmp and whatever I tried, the
>> files of
>> different size either got transferred or deleted.
>>
>>
>> How would be a good way to approach this problem?
>>
>>
>> Even if I create some kind of shell script and use diff, I
>> think it will
>> only compare names and not file sizes.
>>
>>
>> I'm really lost here....
>>
>>
>> Regards,
>>
>>
>> Kaya
>>
>>
>>
>>
>> It sounds like you want fdupes. It's in the ports tree.
>>
>> ~Paul
>>
>> --
>> __________________
>>
>> :(){ :|:& };:
>
>
>
> I tried fdupes and installed it a while back. For me it felt like
> it only works on a single directory.
>
>
> My dir structure is that I have"
>
>
> /dir <- main directory where everything has now been rsync'ed to
>
> /dir_1 <- old directory with partial content
>
> /dir_2 <- more partial content
>
> /dir_3 <- more partial content
>
>
> The key thing here is that I need to compare:
>
>
> /dir_(x) with /dir
>
>
> if the files are different sizes in /dir_(x) then leave them,
> otherwise delete if both name and file size are the same.
>
>
> Then a tiny shell script does the job assuming your files don't have
> any spaces and no weird characters exist:
>
> #!/bin/sh
>
> for i in b c d;
> do
> ls $i/ | while read file;
> do
> [ ! -f a/$file ] && cp $i/$file a/$file && continue
>
> ref=`stat -f '%z' a/$file`
> src=`stat -f '%z' %i/$file`
> [ $ref -eq $src ] && rm -f $i/file
>
> done
> done
>
> Change paths accordingly and backup your stuff. ;)
>
> ~Paul
>
> --
> __________________
>
> :(){ :|:& };:
Thanks Paul,
I should be able to work with this. There are actually spaces and weird
characters in the file names so I assume doing something like "file"
should allow for that?
I don't think I need the line after the 'do' statement do I? From what I
understand it copies the file from directory i to directory a? As I
explained initially, the files have already been rsync'ed so I just need
to compare and delete accordingly.
When I performed the rsync it took around a week to complete per run,
currently zfs list shows around 12TB usage for my /dir but that's with
compression enabled, of the merged directory.
A quick Google shows that I can use something like this:
|search_dir=/the/path/to/base/dir for entry in "$search_dir"/* do echo
"$entry" done|
To list the files in the directory though this might be Bash and not Csh
Otherwise clunkily (my scripting style is pretty rubbish and non
efficient), I could do something like (it probably won't work!):
#!/bin/sh
#fb = file base
#fm - file merge - file that has already been merged using rsync unless
size was different
dir_base=/dir
for fb in "$dir_base"/*
do
echo "$fs"
done
dir_merge=/dir_1
for fm in "$dir_merge"/*
do
echo "$fm"
done
do
ref=`stat -f '%z' $dir_base/$fb`
src=`stat -f '%z' %i$dir_merge/$fm`
[ $ref -eq $src ] && rm -f $dir_merge/$fm
done
Regards,
Kaya
[-- Attachment #2 --]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 5/4/23 23:32, Paul Procacci wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAFbbPuiNqYLLg8wcg8S_3=y46osb06+duHqY9f0n=OuRgGVY=w@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, May 4, 2023 at
5:47 PM Kaya Saman <<a
href="mailto:kayasaman@optiplex-networks.com"
moz-do-not-send="true" class="moz-txt-link-freetext">kayasaman@optiplex-networks.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p><br>
</p>
<div>On 5/4/23 17:29, Paul Procacci wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, May 4,
2023 at 11:53 AM Kaya Saman <<a
href="mailto:kayasaman@optiplex-networks.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">kayasaman@optiplex-networks.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
<br>
I'm wondering if anyone knows of a tool like
diff or so that can also <br>
delete files based on name and size from
either left/right or <br>
source/destination directory?<br>
<br>
<br>
Basically what I have done is performed an
rsync without using the <br>
--remove-source-files option onto a newly
bought and created disk pool <br>
(yes zpool) that i am trying to consolidate my
data - as it's currently <br>
spread out over multiple pools with the same
folder name.<br>
<br>
<br>
The issue I am facing mainly is that I perform
another rsync and use the <br>
--remove-source-files option, rsync will
delete files based on name <br>
while there are some files that have the same
name but not same size and <br>
I would like to retain these files.<br>
<br>
<br>
Right now I have looked at many different
options in both rsync and <br>
other tools but found nothing suitable. I even
tested using a few test <br>
dirs and files that I put into /tmp and
whatever I tried, the files of <br>
different size either got transferred or
deleted.<br>
<br>
<br>
How would be a good way to approach this
problem?<br>
<br>
<br>
Even if I create some kind of shell script and
use diff, I think it will <br>
only compare names and not file sizes.<br>
<br>
<br>
I'm really lost here....<br>
<br>
<br>
Regards,<br>
<br>
<br>
Kaya<br>
<br>
<br>
<br>
</blockquote>
</div>
<br>
</div>
<div>It sounds like you want fdupes. It's in the
ports tree.</div>
<div><br>
</div>
<div>~Paul<br>
</div>
<div><br>
<span>-- </span><br>
<div dir="ltr">__________________<br>
<br>
:(){ :|:& };:</div>
</div>
</div>
</blockquote>
<p><br>
</p>
<p><br>
</p>
<p>I tried fdupes and installed it a while back. For me
it felt like it only works on a single directory.</p>
<p><br>
</p>
<p>My dir structure is that I have"</p>
<p><br>
</p>
<p>/dir <- main directory where everything has now
been rsync'ed to<br>
</p>
<p>/dir_1 <- old directory with partial content<br>
</p>
<p>/dir_2 <- more partial content<br>
</p>
<p>/dir_3 <- more partial content</p>
<p><br>
</p>
<p>The key thing here is that I need to compare:</p>
<p><br>
</p>
<p>/dir_(x) with /dir</p>
<p><br>
</p>
<p>if the files are different sizes in /dir_(x) then
leave them, otherwise delete if both name and file
size are the same.<br>
</p>
</div>
</blockquote>
</div>
<br>
Then a tiny shell script does the job assuming your files
don't have any spaces and no weird characters exist:<br>
<br clear="all">
#!/bin/sh<br>
<br>
for i in b c d;<br>
do<br>
ls $i/ | while read file;<br>
do<br>
[ ! -f a/$file ] && cp $i/$file a/$file &&
continue<br>
<br>
ref=`stat -f '%z' a/$file`<br>
src=`stat -f '%z' %i/$file`<br>
[ $ref -eq $src ] && rm -f $i/file<br>
<br>
done<br>
done<br>
<br>
</div>
<div>Change paths accordingly and backup your stuff. ;)</div>
<div><br>
</div>
<div>~Paul<br>
</div>
<div><br>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">__________________<br>
<br>
:(){ :|:& };:</div>
</div>
</div>
</blockquote>
<p><br>
</p>
<p>Thanks Paul,</p>
<p><br>
</p>
<p>I should be able to work with this. There are actually spaces and
weird characters in the file names so I assume doing something
like "file" should allow for that?</p>
<p><br>
</p>
<p>I don't think I need the line after the 'do' statement do I? From
what I understand it copies the file from directory i to directory
a? As I explained initially, the files have already been rsync'ed
so I just need to compare and delete accordingly.</p>
<p>When I performed the rsync it took around a week to complete per
run, currently zfs list shows around 12TB usage for my /dir but
that's with compression enabled, of the merged directory.</p>
<p><br>
</p>
<p>A quick Google shows that I can use something like this:</p>
<pre class="lang-bash s-code-block" style="margin: 0px; padding: var(--su12); border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-variant-alternates: inherit; font-weight: 400; font-stretch: inherit; line-height: var(--lh-md); font-family: var(--ff-mono); font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: var(--fs-body1); vertical-align: baseline; box-sizing: inherit; width: auto; max-height: 600px; overflow: auto; background-color: var(--highlight-bg); border-radius: var(--br-md); --_cb-line-numbers-bg: var(--black-050); color: var(--highlight-color); overflow-wrap: normal; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><code class="hljs language-bash" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: var(--_pr-code-fs); vertical-align: baseline; box-sizing: inherit; background-color: transparent; white-space: inherit;">search_dir=/the/path/to/base/dir
<span class="hljs-keyword" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-keyword);">for</span> entry <span class="hljs-keyword" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-keyword);">in</span> <span class="hljs-string" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-variable);">"<span class="hljs-variable" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-variable);">$search_dir</span>"</span>/*
<span class="hljs-keyword" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-keyword);">do</span>
<span class="hljs-built_in" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-literal);">echo</span> <span class="hljs-string" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-variable);">"<span class="hljs-variable" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-variable);">$entry</span>"</span>
<span class="hljs-keyword" style="margin: 0px; padding: 0px; border: 0px; font-style: inherit; font-variant: inherit; font-weight: inherit; font-stretch: inherit; line-height: inherit; font-family: inherit; font-optical-sizing: inherit; font-kerning: inherit; font-feature-settings: inherit; font-variation-settings: inherit; font-size: 13px; vertical-align: baseline; box-sizing: inherit; color: var(--highlight-keyword);">done</span></code></pre>
<p></p>
<p><br>
</p>
<p>To list the files in the directory though this might be Bash and
not Csh</p>
<p><br>
</p>
<p>Otherwise clunkily (my scripting style is pretty rubbish and non
efficient), I could do something like (it probably won't work!):</p>
<p><br>
</p>
<p>#!/bin/sh<br>
</p>
<p><br>
</p>
<p>#fb = file base</p>
<p>#fm - file merge - file that has already been merged using rsync
unless size was different<br>
</p>
<p><br>
</p>
<p>dir_base=/dir<br>
for fb in "$dir_base"/*<br>
do<br>
echo "$fs"<br>
done</p>
<p><br>
</p>
<p>dir_merge=/dir_1<br>
for fm in "$dir_merge"/*<br>
do<br>
echo "$fm"<br>
done</p>
<p><br>
</p>
<p> do<br>
<br>
ref=`stat -f '%z' $dir_base/$fb`<br>
src=`stat -f '%z' %i$dir_merge/$fm`<br>
[ $ref -eq $src ] && rm -f $dir_merge/$fm<br>
<br>
done</p>
<p><br>
</p>
<p><br>
</p>
<p>Regards,</p>
<p><br>
</p>
<p>Kaya<br>
</p>
</body>
</html>
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ef0328b0-caab-b6a2-5b33-1ab069a07f80>
