Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 May 2023 10:53:01 +0100
From:      Kaya Saman <kayasaman@optiplex-networks.com>
To:        questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <a1e4553d-2823-cade-712a-0c26322bd4b5@optiplex-networks.com>
In-Reply-To: <3e2b4ee6-c098-456a-bb3a-4b1f45e4d888@holgerdanske.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com> <eda13374-48c1-1749-3a73-530370934eff@optiplex-networks.com> <CAFbbPujbyPHm2GO%2BFnR0G8rnsmpA3AxY2NzYOAAXetApiF8HVg@mail.gmail.com> <b4ac4aea-a051-fbfe-f860-cd7836e5a1bb@optiplex-networks.com> <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com> <6a0aba81-485a-8985-d20d-6da58e9b5580@optiplex-networks.com> <347612746.1721811.1683912265841@fidget.co-bxl> <08804029-03de-e856-568b-74494dfc81cf@holgerdansk e.com> <126434505.494354.1684104532813@ichabod.co-bxl> <c1699605-fa7f-71da-db06-dfcfb43618d6@holgerdanske.com> <818813a2-8ab0-df5 4-3c59-0e1ba9ce743d@holgerdanske.com> <941908372.622746.1684189567246@ichabod.co-bxl> <1e30ac66-a339-ce08-75ac-8e566f4d2278@optiplex-networks.com> <3e2b4ee6-c098-456a-bb3a-4b1f45e4d888@holgerdanske.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 5/18/23 01:35, David Christensen wrote:
> On 5/17/23 00:55, Kaya Saman wrote:
>>
>> On 5/15/23 23:26, Sysadmin Lists wrote:
>>>> ----------------------------------------
>>>> From: David Christensen <dpchrist@holgerdanske.com>
>>>> Date: May 15, 2023, 1:43:38 AM
>>>> To: <questions@freebsd.org>
>>>> Subject: Re: Tool to compare directories and delete duplicate files=20
>>>> from one directory
>>>>
>>>>
>>>> I looks like your script only finds duplicates when the subpath is
>>>> identical (?):
>>>>
>>> Yeah. Wasn't that the original problem description? I went off the=20
>>> example
>>> given by Paul earlier in this thread, and it looked like only files=20
>>> with
>>> matching subpaths were being considered (because the OP accidentally=20
>>> rsync'd
>>> files from a source to a bunch of destination dirs).
>>>
>>
>> Glad to see this thread has turned into an interesting discussion....
>>
>>
>> Just as the OP :-) I will clarify....
>>
>> There was no accidental rsync in place.
>>
>>
>> Due to lack of storage my files where basically all over the place on=20
>> different zpools. The problem is that most of those were on iscsi=20
>> drives (all running Freebsd), so I needed to get them in a single=20
>> place. Of course as the files where all over things became a mess.
>>
>> I bought a few new drives and created a new zpool just for this case.=20
>> So virtually I had to sync the multiple directories to a single=20
>> destination. *but* of course I didn't use the --remove-source-files=20
>> option as I didn't want things to be destructive.
>>
>>
>> But then I needed the extra space too and that's where this post came=20
>> from.
>>
>>
>> Regards,
>>
>>
>> Kaya
>
>
> I seem to recall that you decided to run a Perl script posted by a=20
> reader.=C2=A0 How has that worked out?


Very well.


>
>
> My first response presupposed that you wanted to delete /dir1, /dir2,=20
> and /dir3.=C2=A0 Further messages indicated that you wanted to keep tho=
se=20
> directories and any unique files they contain.=C2=A0 Please clarify you=
r=20
> plans for those directories and their contents.


Nope..... I wanted to delete the duplicate files within /dir1/path...=20
/dir2/path... and /dir3/path.... while keeping any files that differ.


>
>
> How do you plan to validate the consolidation process when it is=20
> complete?


The consolidation process is already finished. Rsync already took care=20
of that. I used: rsync -avvc --progress --ignore-existing src dst


The script I was given then simply deleted the duplicates from the=20
source directories <- in fact this is really specific to me; as I just=20
wanted to make my life easier in order to find the files that have the=20
same names but different sizes.


Now that I have only the different files left, I can merge them by=20
changing the directory name and adding a .1 or so to the end and then=20
simply rsync those directories over in addition.


Again, it's just a really specific use case for this particular merge to=20
me at the moment.


>
>
> David
>
>

Regards,


Kaya




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a1e4553d-2823-cade-712a-0c26322bd4b5>