From owner-freebsd-stable@FreeBSD.ORG  Sat Aug 25 09:27:49 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C45A316A468
	for <freebsd-stable@freebsd.org>; Sat, 25 Aug 2007 09:27:49 +0000 (UTC)
	(envelope-from davids@webmaster.com)
Received: from mail1.webmaster.com (mail1.webmaster.com [216.152.64.169])
	by mx1.freebsd.org (Postfix) with ESMTP id A7C9F13C442
	for <freebsd-stable@freebsd.org>; Sat, 25 Aug 2007 09:27:49 +0000 (UTC)
	(envelope-from davids@webmaster.com)
Received: from however by webmaster.com (MDaemon.PRO.v8.1.3.R)
	with ESMTP id md50001653234.msg
	for <freebsd-stable@freebsd.org>; Sat, 25 Aug 2007 02:27:22 -0700
From: "David Schwartz" <davids@webmaster.com>
To: <tom@samplonius.org>
Date: Sat, 25 Aug 2007 02:26:42 -0700
Message-ID: <MDEHLPKNGKAHNMBLJOLKAEJHGGAC.davids@webmaster.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0)
Importance: Normal
In-Reply-To: <27560580.441188027503141.JavaMail.root@ly.sdf.com>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
X-Authenticated-Sender: joelkatz@webmaster.com
X-Spam-Processed: mail1.webmaster.com, Sat, 25 Aug 2007 02:27:22 -0700
	(not processed: message from trusted or authenticated source)
X-MDRemoteIP: 206.171.168.138
X-Return-Path: davids@webmaster.com
X-MDaemon-Deliver-To: freebsd-stable@freebsd.org
X-MDAV-Processed: mail1.webmaster.com, Sat, 25 Aug 2007 02:27:24 -0700
Cc: freebsd-stable@freebsd.org
Subject: RE: A little story of failed raid5 (3ware 8000 series)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: davids@webmaster.com
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Aug 2007 09:27:49 -0000


>   This isn't really accurate.  First of all, if the RAID=20
> controller isn't confirming checksums before giving the data to=20
> the OS, what is the checksum for exactly?

The checksum is used to recover the data in the event one piece of the =
data is lost. With all of the data but one piece, and the checksum, the =
data can be recovered. Confirming the checksum on every read would be a =
waste of time since the individual drives already checks the data for =
errors.

> It is supposed to be=20
> for detecting data corruption, so if the card isn't using the=20
> checksum, its kinda of useless.

You are confused. Checking for data corruption is done, by checking if =
the *DATA* is corrupt. This does not require looking at the RAID5 =
checksum since the data has its own data checksum.

> I know some RAID systems do fake=20
> their checksums, as they don't actually validate data against the=20
> checksums during normal reads because they don't have the=20
> processing power.  I'd stay away from these type of systems=20
> (cough ... Blue Arc ... cough).

It has nothing to do with processing power. It's simply a waste. The =
RAID 5 checksum isn't for verifying the data, it's for recovering the =
data if it can't be read.
=20
> Second, most RAID systems don't use their own checksums=20
> anymore.  Netapp is quite famous for their ZCS (zone checksum)=20
> drives, and still uses a variation of this technology on their=20
> newer systems (which are using 512 sectors).  But most RAID=20
> vendors just rely on the drives own error detection and=20
> correction systems (hamming code based usually, which is actually=20
> pretty solid).  I'm pretty sure that that 3ware doesn't use any =
checksums.

You are seriously confused. You are confusing the RAID 5 checksum with =
the drive data checksum. We are talking about making sure the RAID 5 =
checksums are readable so that, if a drive fails, the data can be =
reconstructed from the checksum.
=20
> However, in this particular case, validating checksums would=20
> have been unhelpful, since the disk was unreadable.  diskcheckd=20
> would have detected this issue.  It would probably have prevented=20
> the problem, if it had been running previously.

No, it would have saved him. The problem was he lost a drive, and =
checksums *ON* *OTHER* *DRIVES* were unreadable. Quite possibly they had =
been unreabable for months, but were never checked, since they are only =
*needed* to reconstruct the data.
=20
> ZFS is also a good option.  It has file level checksumming. =20
> ZFS never trusts the disks, and is super paranoid.  And ZFS can=20
> do background scrubbing too.  I can't wait for ZFS in FreeBSD 7,=20
> because ZFS in software is going to 10 x better than anything 3ware =
has.

That wouuld not have helped him one bit. When the drive failed, the RAID =
5 checksums on the other drives still would not have been scrubbed. The =
RAID 5 checksum (technically an XOR) is only needed to recover the RAID =
5 array if a drive (or sector) fails.

The only thing that will fix this is specifically verifying the RAID 5 =
checksum blocks. If a controller provides no way to do this, it is badly =
broken. If a verify operation does not verify the checksum blocks, it is =
broken.

DS