From owner-freebsd-fs@FreeBSD.ORG Wed Jul 18 14:59:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB8E1106566C for ; Wed, 18 Jul 2012 14:59:00 +0000 (UTC) (envelope-from freebsd-fs@ch.pkts.ca) Received: from foster.nce.ubc.ca (cl-937.chi-02.us.sixxs.net [IPv6:2001:4978:f:3a8::2]) by mx1.freebsd.org (Postfix) with ESMTP id 463708FC1C for ; Wed, 18 Jul 2012 14:58:59 +0000 (UTC) Received: from kirk.lan (S010600032d00065e.vc.shawcable.net [24.86.111.248]) (authenticated bits=0) by foster.nce.ubc.ca (8.14.4/8.14.4) with ESMTP id q6IEvsUT023769 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Wed, 18 Jul 2012 07:57:56 -0700 Date: Wed, 18 Jul 2012 07:57:54 -0700 From: CH To: Kai Gallasch Message-ID: <20120718075754.4908266b@kirk.lan> In-Reply-To: <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de> References: <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca> <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de> X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DATE_IN_FUTURE_06_12,SARE_SUB_OBFU_OTHER autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on foster.nce.ubc.ca Cc: freebsd-fs@freebsd.org Subject: Re: Can you list internal checksums of a ZFS filesystem? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 14:59:00 -0000 On Wed, 18 Jul 2012 08:43:57 +0200 Kai Gallasch wrote: > Am 18.07.2012 um 00:26 schrieb CH: > > > > Hello list, > > > > I'm moving data to a ZFS filesystem, and it's a ton of big files > > (more than 3 terabytes). I don't trust the network copy command > > completely, and so I'd like to compare checksums. I'm not looking > > forward to it, since it's going to be a slow process, especially if > > I can't run the command on the server. > > You could use rsync for transfering the data. > > According to its man page rsync calculates checksums for transfered > files and on its initial run compares checksums on the sending and > receiving side for each file: > > http://www.freebsd.org/cgi/man.cgi?query=rsync&apropos=0&sektion=0&manpath=FreeBSD+Ports&arch=default&format=html > > > So at the first run starting rsync without -c switch and on a > second run with -c should be quite sufficient for making sure, data > has not changed after being transfered. (Except of course, the > underlying filesystem layers lie about this to the application or a > wrongly implemented MD5 in rsync :-) > > Also rsync makes it possible to transfer the data in severeal runs, > at times most convenient to you (or your network). It also supports a > switch for limiting bandwith usage.. > > Have a nice day, > Kai. Actually, I did do rsync for the initial transfers, and it had to be restarted a couple of times for reasons that were not its fault (source computer rebooted, ssh connection lost, etc). However, after it finished copying everything (ie: exiting normally), I ran it again, and it found more stuff to copy. This shouldn't have happened since nothing was added to the source computer, and so now I distrust its results and want to check it independently. In particular, I don't trust its directory-walking algorithm, so some files may have been missed and may continue to be missed in future runs of rsync, with or without -c. The method I was going to use was 'find . -type f -print0 | xargs -0 md5sum > my.big.md5sum.file' on both source and destination, but if I can harvest the ZFS checksums (file or block) it would cut the cpu workload in half, and save a tree's worth of energy.