Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2012 08:43:57 +0200
From:      Kai Gallasch <gallasch@free.de>
To:        CH <freebsd-fs@ch.pkts.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Can you list internal checksums of a ZFS filesystem?
Message-ID:  <6D778EEA-5B8F-4F59-B198-E5B098F3AE2C@free.de>
In-Reply-To: <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca>
References:  <20120717152629.42e0641e@fedora14-x86-64.shechinah.mi.microbiology.ubc.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 18.07.2012 um 00:26 schrieb CH:
>=20
> Hello list,
>=20
> I'm moving data to a ZFS filesystem, and it's a ton of big files (more
> than 3 terabytes).  I don't trust the network copy command completely,
> and so I'd like to compare checksums.  I'm not looking forward to it,
> since it's going to be a slow process, especially if I can't run the
> command on the server.=20

You could use rsync for transfering the data.

According to its man page rsync calculates checksums for transfered =
files and on its initial run compares checksums on the sending and =
receiving side for each file:

=
http://www.freebsd.org/cgi/man.cgi?query=3Drsync&apropos=3D0&sektion=3D0&m=
anpath=3DFreeBSD+Ports&arch=3Ddefault&format=3Dhtml


       -c, --checksum
              This changes the way rsync checks if the files have been =
changed
              and are in need of a transfer.  Without this option, rsync =
 uses
              a "quick check" that (by default) checks if each file's =
size and
              time of last modification match between the sender and =
receiver.
              This  option changes this to compare a 128-bit checksum =
for each
              file that has a matching size.  Generating the  checksums  =
means
              that  both  sides  will expend a lot of disk I/O reading =
all the
              data in the files in the transfer (and  this  is  prior  =
to  any
              reading  that  will  be done to transfer changed files), =
so this
              can slow things down significantly.

              The sending side generates its checksums while it is  =
doing  the
              file-system  scan  that  builds the list of the available =
files.
              The receiver generates its checksums when  it  is  =
scanning  for
              changed files, and will checksum any file that has the =
same size
              as the corresponding sender's file:  files with either a =
changed
              size or a changed checksum are selected for transfer.

              Note  that  rsync always verifies that each transferred =
file was
              correctly reconstructed on the  receiving  side  by  =
checking  a
              whole-file  checksum  that  is  generated  as the file is =
trans-
              ferred, but that automatic after-the-transfer  =
verification  has
              nothing  to do with this option's before-the-transfer =
"Does this
              file need to be updated?" check.

              For protocol 30 and  beyond  (first  supported  in  =
3.0.0),  the
              checksum used is MD5.  For older protocols, the checksum =
used is
              MD4.


  So at the first run starting rsync without -c switch and on a second =
run with -c should be quite sufficient for making sure, data has not =
changed after being transfered. (Except of course, the underlying =
filesystem layers lie about this to the application or a wrongly =
implemented MD5 in rsync :-)

Also rsync makes it possible to transfer the data in severeal runs, at =
times most convenient to you (or your network).
It also supports a switch for limiting bandwith usage..

Have a nice day,
 Kai.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6D778EEA-5B8F-4F59-B198-E5B098F3AE2C>