From nobody Thu May 4 21:21:23 2023 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QC6FX3gSYz49Y5B for ; Thu, 4 May 2023 21:21:36 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [184.105.128.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QC6FW0bg7z3P5r for ; Thu, 4 May 2023 21:21:35 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=dP5bTvji; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 184.105.128.27 as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com; dmarc=pass (policy=none) header.from=holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1683235283; bh=YdVDbDA2GsfZLJl5m7yuJV1iH+0pNCgO/p3ElqtA9Hk=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject: Content-Language:To:References:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=dP5bTvjiIJop4a13XbYS37lpWZMnEeMQVunuDf7YVngtEkm4LigTVznoinYHRftBH sdEQ1TDTWzrEXrP8xohZgfXSp8qLZoe1Ufbv1hhvqxgD4AgGaVgL7d1jKWUdNEyiC2 UtlYK1Jzbz5oedKK5QO6Es8KPkX2lbjU5zfro4p5fTBhvWmmlp9x0FtFfo1oh6rjiG FeyV1p3SxgcvzKfGi1hQ/niTducowHiHEMQy8Lj/PDd8SIj8kBWlYl8bDbPBMikELS TQTwkWvrN4KCwi9rvePx1SYRxd2W06oqqMW9C6Aee1auVVwhPGOuGE8ayh2UmdHIsi 3IGt1O4nC0GcqSSVdHe2SzBGPHJmK1HEc/ht7WCf+no+1vrjQEusuZKpcJvQaA2Gmb qYTw3zlHp3TXQd33CEq69vjtajo+aL1D8RWiKvzBDknaG8u7zlsUzhd+LcQzka/ZqO mhgtP+5UGNd0I8vGZA6YpXmz49DgiFpDEZcHHLZtA5pcK42dAzQsyaD1q4csWFRaaw gV/N/ojr8L4TzEt9tlu8O+nOMtEs2gEOTcHYtvjd8iyfOeNRjbx0opXU/PJe9uBKYi jvV6gIZh2MSICT/5KzwQOzkKasRR0MaSvwo5KLxP/YsQDjApRNdt86RTAhmmFWfbRo GQZIdWsoGblEYOBzlmrnwXLU= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Thu, 4 May 2023 14:21:23 -0700 Message-ID: <4efcf1a8-45d9-adb4-f148-5e9a43817990@holgerdanske.com> Date: Thu, 4 May 2023 14:21:23 -0700 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> From: David Christensen In-Reply-To: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[holgerdanske.com:+]; ASN(0.00)[asn:6939, ipnet:184.104.0.0/15, country:US]; MLMMJ_DEST(0.00)[questions@freebsd.org]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4QC6FW0bg7z3P5r X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On 5/4/23 08:53, Kaya Saman wrote: > Hi, > > > I'm wondering if anyone knows of a tool like diff or so that can also > delete files based on name and size from either left/right or > source/destination directory? > > > Basically what I have done is performed an rsync without using the > --remove-source-files option onto a newly bought and created disk pool > (yes zpool) that i am trying to consolidate my data - as it's currently > spread out over multiple pools with the same folder name. > > > The issue I am facing mainly is that I perform another rsync and use the > --remove-source-files option, rsync will delete files based on name > while there are some files that have the same name but not same size and > I would like to retain these files. > > > Right now I have looked at many different options in both rsync and > other tools but found nothing suitable. I even tested using a few test > dirs and files that I put into /tmp and whatever I tried, the files of > different size either got transferred or deleted. > > > How would be a good way to approach this problem? > > > Even if I create some kind of shell script and use diff, I think it will > only compare names and not file sizes. > > > I'm really lost here.... > > > Regards, > > > Kaya Mounting the source file system and destination file system on the same host will simplify matters. sshfs(1) works, but is not fast. Samba is fast. I have never used NFS, but it should be fast. While I know of several programs that can do copying and have destination file name collision detection (and/or destination content collision detection), AIUI their collision resolution is limited to cancel or overwrite (perhaps conditionally, such as newer source mtime; e.g. cp(1) --update). I would approach the problem by writing a program or script that does the copy and collision detection, plus has the collision resolution I want. Such as, compare the source and destination contents. If the contents are the same, do not copy. If the contents differ, copy to a destination file name that is a unique variant of the source file name. The challenge then becomes finding a unique destination file name. Inserting an encoded (e.g. hexadecimal, base32, base64) secure hash (e.g. SHA1, SHA256) of the file contents into the destination file name should make it very unlikely that two source files with the same name, but differing contents, would have colliding variant names. In addition, it would be good to include a --directory=DIR option (similar to tar(1)). David