From owner-freebsd-fs@FreeBSD.ORG Wed Jul 18 06:50:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 735D01065670 for ; Wed, 18 Jul 2012 06:50:40 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id D6C8F8FC0A for ; Wed, 18 Jul 2012 06:50:39 +0000 (UTC) Received: (qmail 24825 invoked from network); 18 Jul 2012 08:43:57 +0200 Received: from smtp.free.de (HELO orwell.free.de) (gallasch@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 18 Jul 2012 08:43:57 +0200 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Kai Gallasch In-Reply-To: <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca> Date: Wed, 18 Jul 2012 08:43:57 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de> References: <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca> To: CH X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Can you list internal checksums of a ZFS filesystem? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 06:50:40 -0000 Am 18.07.2012 um 00:26 schrieb CH: >=20 > Hello list, >=20 > I'm moving data to a ZFS filesystem, and it's a ton of big files (more > than 3 terabytes). I don't trust the network copy command completely, > and so I'd like to compare checksums. I'm not looking forward to it, > since it's going to be a slow process, especially if I can't run the > command on the server.=20 You could use rsync for transfering the data. According to its man page rsync calculates checksums for transfered = files and on its initial run compares checksums on the sending and = receiving side for each file: = http://www.freebsd.org/cgi/man.cgi?query=3Drsync&apropos=3D0&sektion=3D0&m= anpath=3DFreeBSD+Ports&arch=3Ddefault&format=3Dhtml -c, --checksum This changes the way rsync checks if the files have been = changed and are in need of a transfer. Without this option, rsync = uses a "quick check" that (by default) checks if each file's = size and time of last modification match between the sender and = receiver. This option changes this to compare a 128-bit checksum = for each file that has a matching size. Generating the checksums = means that both sides will expend a lot of disk I/O reading = all the data in the files in the transfer (and this is prior = to any reading that will be done to transfer changed files), = so this can slow things down significantly. The sending side generates its checksums while it is = doing the file-system scan that builds the list of the available = files. The receiver generates its checksums when it is = scanning for changed files, and will checksum any file that has the = same size as the corresponding sender's file: files with either a = changed size or a changed checksum are selected for transfer. Note that rsync always verifies that each transferred = file was correctly reconstructed on the receiving side by = checking a whole-file checksum that is generated as the file is = trans- ferred, but that automatic after-the-transfer = verification has nothing to do with this option's before-the-transfer = "Does this file need to be updated?" check. For protocol 30 and beyond (first supported in = 3.0.0), the checksum used is MD5. For older protocols, the checksum = used is MD4. So at the first run starting rsync without -c switch and on a second = run with -c should be quite sufficient for making sure, data has not = changed after being transfered. (Except of course, the underlying = filesystem layers lie about this to the application or a wrongly = implemented MD5 in rsync :-) Also rsync makes it possible to transfer the data in severeal runs, at = times most convenient to you (or your network). It also supports a switch for limiting bandwith usage.. Have a nice day, Kai.=