Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 May 2023 01:43:38 -0700
From:      David Christensen <dpchrist@holgerdanske.com>
To:        questions@freebsd.org
Subject:   Re: Tool to compare directories and delete duplicate files from one directory
Message-ID:  <818813a2-8ab0-df54-3c59-0e1ba9ce743d@holgerdanske.com>
In-Reply-To: <c1699605-fa7f-71da-db06-dfcfb43618d6@holgerdanske.com>
References:  <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <CAFbbPuhoMOM=wp26yZ42e9xnRP%2BtJ6B30y8%2BBa3eCBh2v66Grw@mail.gmail.com> <fd9aa7d3-f6a7-2274-f970-d4421d187855@optiplex-networks.com> <CAFbbPujpPPrm-axMC9S5OnOiYn2oPuQbkRjnQY4tp=5L7TiVSg@mail.gmail.com> <eda13374-48c1-1749-3a73-530370934eff@optiplex-networks.com> <CAFbbPujbyPHm2GO%2BFnR0G8rnsmpA3AxY2NzYOAAXetApiF8HVg@mail.gmail.com> <b4ac4aea-a051-fbfe-f860-cd7836e5a1bb@optiplex-networks.com> <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com> <6a0aba81-485a-8985-d20d-6da58e9b5580@optiplex-networks.com> <347612746.1721811.1683912265841@fidget.co-bxl> <08804029-03de-e856-568b-74494dfc81cf@holgerdansk e.com> <126434505.494354.1684104532813@ichabod.co-bxl> <c1699605-fa7f-71da-db06-dfcfb43618d6@holgerdanske.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/15/23 01:29, David Christensen wrote:
> On 5/14/23 15:48, Sysadmin Lists wrote:
>> #!/bin/sh -e
>> # remove or report duplicate files: $0 [-n] dir[1] dir[2] ... dir[n]
>> if [ "X$1" = "X-n" ]; then n=1; shift; fi
>>
>> echo "Building files list from: ${@}"
>>
>> find "${@}" -xdev -type f |
>> awk -v n=$n 'BEGIN { cmd = "stat -f %z "
>> for (x = 1; x < ARGC; x++) args = args ? args "|" ARGV[x] : ARGV[x]; 
>> ARGC = 0 }
>>       { files[$0] = match($0, "(" args ")/?") + RLENGTH }
>> END  { for (i in ARGV) sub("/*$", "/", ARGV[i])
>>         print "Comparing files ..."
>>         for (i = 1; i < x; i++) for (file in files) if (file ~ "^" 
>> ARGV[i]) {
>>             for (j = i +1; j < x; j++)
>>                 if (ARGV[j] substr(file, files[file]) in files) {
>>                     dup = ARGV[j] substr(file, files[file])
>>                     cmd "\"" file "\"" | getline fil_s; close(cmd "\"" 
>> file "\"")
>>                     cmd "\"" dup  "\"" | getline dup_s; close(cmd "\"" 
>> dup  "\"")
>>                     if (dup_s == fil_s) act("dup")
>>                     else act("diff") }
>>             delete files[file]
>>       } }
>> function act(message) {
>>      print ((message == "dup") ? "duplicates:" : "difference:"), dup, 
>> file
>>      if (!n) system("rm -vi \"" dup "\" </dev/tty")
>> }' "${@}"

> Your script does not appear to do anything (?):
> 
> 2023-05-15 01:19:00 dpchrist@vf1 /vf1zpool1/dpchrist
> $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo
> Building files list from: foo
> Comparing files ...
> 
> 2023-05-15 01:19:33 dpchrist@vf1 /vf1zpool1/dpchrist
> $ ls -R1 foo | wc
>        26      24      82
> 
> 2023-05-15 01:19:35 dpchrist@vf1 /vf1zpool1/dpchrist
> $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo
> Building files list from: foo
> Comparing files ...
> 
> 2023-05-15 01:19:48 dpchrist@vf1 /vf1zpool1/dpchrist
> $ ls -R1 foo | wc
>        26      24      82


I looks like your script only finds duplicates when the subpath is 
identical (?):

2023-05-15 01:38:20 dpchrist@vf1 /vf1zpool1/dpchrist
$ cp -Ra foo bar

2023-05-15 01:39:18 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
duplicates: bar/1/i-j foo/1/i-j
duplicates: bar/1/2/e foo/1/2/e
duplicates: bar/1/a-b foo/1/a-b
duplicates: bar/1/g foo/1/g
duplicates: bar/1/2/i foo/1/2/i
duplicates: bar/q-r foo/q-r
duplicates: bar/m-n foo/m-n
duplicates: bar/1/2/m foo/1/2/m
duplicates: bar/c foo/c
duplicates: bar/e-f foo/e-f
duplicates: bar/1/s foo/1/s
duplicates: bar/k foo/k
duplicates: bar/o foo/o
duplicates: bar/q foo/q
duplicates: bar/1/c-d foo/1/c-d
duplicates: bar/1/2/s-t foo/1/2/s-t
duplicates: bar/1/2/o-p foo/1/2/o-p
duplicates: bar/1/2/k-l foo/1/2/k-l
duplicates: bar/g-h foo/g-h

2023-05-15 01:39:41 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 foo | wc
       26      24      82

2023-05-15 01:39:44 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 bar | wc
       26      24      82

2023-05-15 01:40:10 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
duplicates: bar/1/i-j foo/1/i-j
duplicates: bar/1/2/e foo/1/2/e
duplicates: bar/1/a-b foo/1/a-b
duplicates: bar/1/g foo/1/g
duplicates: bar/1/2/i foo/1/2/i
duplicates: bar/q-r foo/q-r
duplicates: bar/m-n foo/m-n
duplicates: bar/1/2/m foo/1/2/m
duplicates: bar/c foo/c
duplicates: bar/e-f foo/e-f
duplicates: bar/1/s foo/1/s
duplicates: bar/k foo/k
duplicates: bar/o foo/o
duplicates: bar/q foo/q
duplicates: bar/1/c-d foo/1/c-d
duplicates: bar/1/2/s-t foo/1/2/s-t
duplicates: bar/1/2/o-p foo/1/2/o-p
duplicates: bar/1/2/k-l foo/1/2/k-l
duplicates: bar/g-h foo/g-h

2023-05-15 01:40:22 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 foo | wc
       26      24      82

2023-05-15 01:40:29 dpchrist@vf1 /vf1zpool1/dpchrist
$ ls -R1 bar | wc
       26      24      82

2023-05-15 01:40:34 dpchrist@vf1 /vf1zpool1/dpchrist
$ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo bar
Building files list from: foo bar
Comparing files ...
duplicates: bar/1/2/a foo/1/2/a
remove bar/1/2/a? n
duplicates: bar/1/i-j foo/1/i-j
remove bar/1/i-j? n
duplicates: bar/1/2/e foo/1/2/e
remove bar/1/2/e? n
duplicates: bar/1/a-b foo/1/a-b
remove bar/1/a-b? n
duplicates: bar/1/g foo/1/g
remove bar/1/g? n
duplicates: bar/1/2/i foo/1/2/i
remove bar/1/2/i? n
duplicates: bar/q-r foo/q-r
remove bar/q-r? n
duplicates: bar/m-n foo/m-n
remove bar/m-n? n
duplicates: bar/1/2/m foo/1/2/m
remove bar/1/2/m? n
duplicates: bar/c foo/c
remove bar/c? n
duplicates: bar/e-f foo/e-f
remove bar/e-f? n
duplicates: bar/1/s foo/1/s
remove bar/1/s? n
duplicates: bar/k foo/k
remove bar/k? n
duplicates: bar/o foo/o
remove bar/o? n
duplicates: bar/q foo/q
remove bar/q? n
duplicates: bar/1/c-d foo/1/c-d
remove bar/1/c-d? n
duplicates: bar/1/2/s-t foo/1/2/s-t
remove bar/1/2/s-t? n
duplicates: bar/1/2/o-p foo/1/2/o-p
remove bar/1/2/o-p? n
duplicates: bar/1/2/k-l foo/1/2/k-l
remove bar/1/2/k-l? n
duplicates: bar/g-h foo/g-h
remove bar/g-h? n


David




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?818813a2-8ab0-df54-3c59-0e1ba9ce743d>