From owner-freebsd-fs@FreeBSD.ORG Tue Jan 8 18:59:10 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27D4416A418 for ; Tue, 8 Jan 2008 18:59:10 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (cl-162.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:a1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9E5D513C45D for ; Tue, 8 Jan 2008 18:59:09 +0000 (UTC) (envelope-from brooks@lor.one-eyed-alien.net) Received: from lor.one-eyed-alien.net (localhost [127.0.0.1]) by lor.one-eyed-alien.net (8.14.1/8.13.8) with ESMTP id m08IwwAA006573; Tue, 8 Jan 2008 12:58:58 -0600 (CST) (envelope-from brooks@lor.one-eyed-alien.net) Received: (from brooks@localhost) by lor.one-eyed-alien.net (8.14.1/8.13.8/Submit) id m08IwvQT006572; Tue, 8 Jan 2008 12:58:57 -0600 (CST) (envelope-from brooks) Date: Tue, 8 Jan 2008 12:58:57 -0600 From: Brooks Davis To: Scott Long Message-ID: <20080108185857.GA5601@lor.one-eyed-alien.net> References: <20080102070146.GH49874@cicely12.cicely.de> <477B8440.1020501@freebsd.org> <200801031750.31035.peter.schuller@infidyne.com> <477D16EE.6070804@freebsd.org> <20080103171825.GA28361@lor.one-eyed-alien.net> <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com> <20080107135925.GF65134@cicely12.cicely.de> <47830BC0.5060100@samsco.org> <20080108083822.GL76422@cicely12.cicely.de> <47839386.8020203@samsco.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="a8Wt8u1KmwUX3Y2C" Content-Disposition: inline In-Reply-To: <47839386.8020203@samsco.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (lor.one-eyed-alien.net [127.0.0.1]); Tue, 08 Jan 2008 12:58:59 -0600 (CST) Cc: freebsd-fs@freebsd.org, ticso@cicely.de, Tz-Huan Huang Subject: Re: ZFS i/o errors - which disk is the problem? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jan 2008 18:59:10 -0000 --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 08, 2008 at 08:15:18AM -0700, Scott Long wrote: > Bernd Walter wrote: >> On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote: >>> Bernd Walter wrote: >>>> On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote: >>>>> 2008/1/4, Brooks Davis : >>>> The data is corrupted by controller and/or disk subsystem. >>>> You have no other data sources for the broken data, so it is lost. >>>> The only garantied way is to get it back from backup. >>>> Maybe older snapshots/clones are still readable - I don't know. >>>> Nevertheless data is corrupted and that's the purpose for alternative >>>> data sources such as raidz/mirror and at last backup. >>>> You shouldn't have ignored those errors at first, because you are >>>> running with faulty hardware. >>>> Without ZFS checksumming the system would just process the broken >>>> data with unpredictable results. >>>> If all those errors are fresh then you likely used a broken RAID >>>> controller below ZFS, which silently corrupted syncronity and then >>>> blow when disk state changed. >>>> Unfortunately many RAID controllers are broken and therefor useless. >>>>=20 >>> Huh? Could you be any more vague? Which controllers are broken? Have= =20 >>> you contacted anyone about the breakage? Can you describe the breakage? >>> I call bullshit, pure and simple. >> Just go back a few mails in the same thread were someone fixed CRC >> errors by updating the RAID controller firmware. >> I'm amazed how often I read something like this lately. >> And if you read the whole thread then you will notice that we are >> currently talking about another person which has corrupted data on >> a RAID disk - not sure if this is the controller, a drive or the >> drivers, but something is faulty here and I wouldn't be surprised >> if it is the controller. >> And then there are so many RAID controllers without backed memory or >> other mechanism to garantie syncronity for the disks, which I call >> broken by design. >> You know yourself how important syncronity is for RAID, especially >> when it comes to parity based RAID and you know how fragile it is >> when it comes to power failure. >=20 > Your argument is complete hearsay and poorly formed opinion. That's > fine, just be honest about it and don't mislead others into thinking > that you know what you're talking about when it comes to RAID. We saw ZFS CRC errors on one system running Solaris x86 with a 16-port Areca controller (I don't have the model number handy) until we did a firmware upgrade after contacting Areca. The controller was running in JBOD mode. -- Brooks --a8Wt8u1KmwUX3Y2C Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFHg8fwXY6L6fI4GtQRAsweAKCwDbsQ5vPGkkmUhCQ/4WLBNwV3KACcCNvL 6BKxUgfbh8VCgNSEzT6S7+U= =0Scq -----END PGP SIGNATURE----- --a8Wt8u1KmwUX3Y2C--