From owner-freebsd-questions@FreeBSD.ORG Thu Jun 21 14:44:33 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 641751065675 for ; Thu, 21 Jun 2012 14:44:33 +0000 (UTC) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99]) by mx1.freebsd.org (Postfix) with ESMTP id A46B58FC1E for ; Thu, 21 Jun 2012 14:44:32 +0000 (UTC) Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1]) by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q5LEiToq003267; Thu, 21 Jun 2012 16:44:29 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Received: from localhost (wojtek@localhost) by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id q5LEiTCZ003264; Thu, 21 Jun 2012 16:44:29 +0200 (CEST) (envelope-from wojtek@wojtek.tensor.gdynia.pl) Date: Thu, 21 Jun 2012 16:44:29 +0200 (CEST) From: Wojciech Puchar To: Julien Cigar In-Reply-To: <4FE32FF5.60603@ulb.ac.be> Message-ID: References: <4FE2CE38.9000100@gmail.com> <4FE32FF5.60603@ulb.ac.be> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7 (wojtek.tensor.gdynia.pl [127.0.0.1]); Thu, 21 Jun 2012 16:44:29 +0200 (CEST) Cc: freebsd-questions@freebsd.org Subject: Re: Is ZFS production ready? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 14:44:33 -0000 > One interesting feature of ZFS if it's block checksum: all reads and writes > include block checksum, so it can easily detect situations where, for > example, data is quietly corrupted by RAM. you may be shocked but you are sometimes wrong. i already demostrated it and checksumming doesn't get any errors, and do write wrong data with right checksums :) it's quite easy to explain if one understand hardware details. Checksumming will protect you from - failed SATA/SAS port, on-disk controller that returns bad data as good. This is actually really rare case. i never seen that, but maybe it happens. - some types of DRAM failure - but not all. Actually just a small fraction because DRAM failure like that would bring your system to crash so quickly that you are unlikely to get big data corruption. Common case with DRAM memory is that after you write to it, keeps right data some time and RARELY flips some bit later in spite of refresh. With this type you may run your machine for hours, even days or longer. And ZFS would calculate proper checksum of wrong data and will write it to disk. This is the reason i keep few failed DIMMs - for testing how different software behaves on broken machine. UFS resulted in few corrupted files after half a day of heavy work and 4 crashes. fsck always recovered things well (of course "unexpected softupdate inconsistency....") ZFS survived 2 crashes. After third it panicked on startup. Of course - no zfs_fsck. And no possibility of making really good zfs_fsck because of data layout, at least not easy. > This feature is very important for databases. is data integrity not important for the rest? :) Still - disks itself perform quite heavy ECC and both SATA and SAS ports.