Date: Fri, 5 May 2023 02:32:28 +0100 From: Kaya Saman <kayasaman@optiplex-networks.com> To: Paul Procacci <pprocacci@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Tool to compare directories and delete duplicate files from one directory Message-ID: <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> In-Reply-To: <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com> References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <CAFbbPugfhXGPfscKpx6B0ue=DcF_qssL6P-0GgB1CWKtm3U=tQ@mail.gmail.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <CAFbbPuiNqYLLg8wcg8S_3=y46osb06%2BduHqY9f0n=OuRgGVY=w@mail.gmail.com> <ef0328b0-caab-b6a2-5b33-1ab069a07f80@optiplex-networks.com> <CAFbbPujUALOS%2BsUxsp=54vxVAHe_jkvi3d-CksK78c7rxAVoNg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/5/23 01:13, Paul Procacci wrote: > #!/bin/sh > > # > # dir_1, dir_2, and dir_3 are the directories I want to search through. > for i in dir_1 dir_2 dir_3; > do > =C2=A0 # Retrieve the filenames within each of those directories > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0=C2=A0=C2=A0=C2=A0 If the file doesn't exist in the base dir, cop= y it and continue=20 > with the top of the loop. > =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && cp $i/$file dir_base/ && conti= nue > > =C2=A0=C2=A0=C2=A0 # > =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bot= h locations. > =C2=A0=C2=A0=C2=A0 # > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base > =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file` > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i > =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file` > > =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from th= e source directory > =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file > > =C2=A0 done > done Thanks so much! just a quick question... you have dir_base written in the script. Do I=20 need to define this or is this part of the shell language itself? Right now I have modifed the script to make it non destructive so that=20 it doesn't do any copying or removing yet... call it a test instance if=20 you like. I personally prefer doing things like this so I don't have any=20 accidents and loose things in the meantime... So my initial modification is this: > #!/bin/sh > > # > # dir_1, dir_2, and dir_3 are the directories I want to search through. > for i in /dir_1 /dir_2 /dir_3; > do > =C2=A0 # Retrieve the filenames within each of those directories > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0=C2=A0=C2=A0 # If the file doesn't exist in the base dir, copy it= and continue=20 > with the top of the loop. > =C2=A0 =C2=A0 [ ! -f dir_base/$file ] && ls $i/$file && continue > > =C2=A0=C2=A0=C2=A0 # > =C2=A0=C2=A0=C2=A0 # Getting to this point means the file eixsts in bot= h locations. > =C2=A0=C2=A0=C2=A0 # > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in the dir_base > =C2=A0 =C2=A0 ref=3D`stat -f '%z' dir_base/$file` > > =C2=A0=C2=A0=C2=A0 # Get the file size as it is in $i > =C2=A0 =C2=A0 src=3D`stat -f '%z' $i/$file` > > =C2=A0=C2=A0=C2=A0 # If the sizes are the same, remove the file from th= e source directory > =C2=A0 =C2=A0 [ $ref -nq $src ] && ls $i/file > /tmp/file > > =C2=A0 done > done If this works it should just output the different files into a file=20 called "file" under /tmp Ok, this didn't work at all.... it just listed a whole bunch of top=20 level folders and didn't recurse through them :-( I ran it on the assumption that I needed to run the script under /dir=20 and that dir_base was a shell function which would essentially be /dir/. [EDIT] Currently, I managed to get it partly running by modifying ls to use ls=20 -R *but* I think that the 'stat' statements don't allow for recursion? The script is running as I type this but it's most likely just=20 outputting a whole bunch of ls information... as I see many 'stat'=20 errors in the shell output.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7747f587-f33e-f39c-ac97-fe4fe19e0b76>