Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 May 2023 21:25:18 +0100
From:      Kaya Saman <kayasaman@optiplex-networks.com>
To:        questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <6a0aba81-485a-8985-d20d-6da58e9b5580@optiplex-networks.com>
In-Reply-To: <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com> <eda13374-48c1-1749-3a73-530370934eff@optiplex-networks.com> <CAFbbPujbyPHm2GO%2BFnR0G8rnsmpA3AxY2NzYOAAXetApiF8HVg@mail.gmail.com> <b4ac4aea-a051-fbfe-f860-cd7836e5a1bb@optiplex-networks.com> <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 5/6/23 21:33, David Christensen wrote:
> I thought I sent this, but it never hit the list (?) -- David
>
>
> On 5/4/23 21:06, Kaya Saman wrote:
>
>> To start with this is the directory structure:
>>
>>
>> =C2=A0=C2=A0ls -lhR /tmp/test1
>> total 1
>> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0=
 5 04:57 dupdir1
>> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0=
 5 04:57 dupdir2
>>
>> /tmp/test1/dupdir1:
>> total 1
>> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 8B Apr 30 =
03:17 dup
>>
>> /tmp/test1/dupdir2:
>> total 1
>> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 7B May=C2=A0=
 5 03:23 dup1
>>
>>
>> ls -lhR /tmp/test2
>> total 1
>> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0=
 5 04:56 dupdir1
>> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0=
 5 04:56 dupdir2
>>
>> /tmp/test2/dupdir1:
>> total 1
>> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 4B Apr 30 =
02:53 dup
>>
>> /tmp/test2/dupdir2:
>> total 1
>> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 7B Apr 30 =
02:47 dup1
>>
>>
>> So what I want to happen is the script to recurse from the top level=20
>> directories test1 and test2 then expected behavior should be to=20
>> remove file dup1 as dup is different between directories.
>
>
> My previous post missed the mark, but I have been watching this thread=20
> with interest (trepidation?).
>
>
> I think Tim already identified a tool that will safely get you close=20
> to your goal, if not all the way:
>
> On 5/4/23 09:28, Tim Daneliuk wrote:
>> I've never used it, but there is a port of fdupes in the ports tree.
>> Not sure if it does exactly what you want though.
>
>
> fdupes(1) is also available as a package:
>
> 2023-05-04 21:25:31 toor@vf1 ~
> # freebsd-version; uname -a
> 12.4-RELEASE-p2
> FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD=20
> 12.4-RELEASE-p1 GENERIC=C2=A0 amd64
>
> 2023-05-04 21:25:40 toor@vf1 ~
> # pkg search fdupes
> fdupes-2.2.1,1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Program for identifying or deleti=
ng=20
> duplicate files
>
>
> Looking at the man page:
>
> https://man.freebsd.org/cgi/man.cgi?query=3Dfdupes&sektion=3D1&manpath=3D=
FreeBSD+13.2-RELEASE+and+Ports=20
>
>
>
> I am fairly certain that you will want to give the destination=20
> directory as the first argument and the source directories after that:
>
> $ fdupes --recurse /dir /dir_1 /dir_2 /dir_3
>
>
> The above will provide you with information, but not delete anything.
>
>
> Practice under /tmp to gain familiarity with fdupes(1) is a good idea.
>
>
> As you are using ZFS, I assume you know how to take snapshots and do=20
> rollbacks (?).=C2=A0 These could serve as backup and restore operations=
 if=20
> things go badly.
>
>
> Given a 12+ TB of data, you may want the --noprompt option when you do=20
> give the --delete option and actual arguments,
>
>
> David
>

Thanks David!


I tried using fdupes like this but I wasn't able to see anything.=20
Probably because it took so long to run and never completed? It does=20
actually feature a -d flag too which does delete stuff but from my=20
testing this deletes all duplicates and doesn't allow you to choose the=20
directory to delete the duplicate files from, unless I failed to=20
understand the man page.


At present the Perl script from Paul in it's last iteration solved my=20
problem and was pretty fast at the same time.


Of course at first I tested it on my test dirs in /tmp, then I took zfs=20
snapshots on the actual working dirs and finally ran the script. It=20
worked flawlessly.


Regards,


Kaya




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a0aba81-485a-8985-d20d-6da58e9b5580>