From owner-freebsd-stable@FreeBSD.ORG Sat Aug 25 09:27:49 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C45A316A468 for ; Sat, 25 Aug 2007 09:27:49 +0000 (UTC) (envelope-from davids@webmaster.com) Received: from mail1.webmaster.com (mail1.webmaster.com [216.152.64.169]) by mx1.freebsd.org (Postfix) with ESMTP id A7C9F13C442 for ; Sat, 25 Aug 2007 09:27:49 +0000 (UTC) (envelope-from davids@webmaster.com) Received: from however by webmaster.com (MDaemon.PRO.v8.1.3.R) with ESMTP id md50001653234.msg for ; Sat, 25 Aug 2007 02:27:22 -0700 From: "David Schwartz" To: Date: Sat, 25 Aug 2007 02:26:42 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) Importance: Normal In-Reply-To: <27560580.441188027503141.JavaMail.root@ly.sdf.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 X-Authenticated-Sender: joelkatz@webmaster.com X-Spam-Processed: mail1.webmaster.com, Sat, 25 Aug 2007 02:27:22 -0700 (not processed: message from trusted or authenticated source) X-MDRemoteIP: 206.171.168.138 X-Return-Path: davids@webmaster.com X-MDaemon-Deliver-To: freebsd-stable@freebsd.org X-MDAV-Processed: mail1.webmaster.com, Sat, 25 Aug 2007 02:27:24 -0700 Cc: freebsd-stable@freebsd.org Subject: RE: A little story of failed raid5 (3ware 8000 series) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: davids@webmaster.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Aug 2007 09:27:49 -0000 > This isn't really accurate. First of all, if the RAID=20 > controller isn't confirming checksums before giving the data to=20 > the OS, what is the checksum for exactly? The checksum is used to recover the data in the event one piece of the = data is lost. With all of the data but one piece, and the checksum, the = data can be recovered. Confirming the checksum on every read would be a = waste of time since the individual drives already checks the data for = errors. > It is supposed to be=20 > for detecting data corruption, so if the card isn't using the=20 > checksum, its kinda of useless. You are confused. Checking for data corruption is done, by checking if = the *DATA* is corrupt. This does not require looking at the RAID5 = checksum since the data has its own data checksum. > I know some RAID systems do fake=20 > their checksums, as they don't actually validate data against the=20 > checksums during normal reads because they don't have the=20 > processing power. I'd stay away from these type of systems=20 > (cough ... Blue Arc ... cough). It has nothing to do with processing power. It's simply a waste. The = RAID 5 checksum isn't for verifying the data, it's for recovering the = data if it can't be read. =20 > Second, most RAID systems don't use their own checksums=20 > anymore. Netapp is quite famous for their ZCS (zone checksum)=20 > drives, and still uses a variation of this technology on their=20 > newer systems (which are using 512 sectors). But most RAID=20 > vendors just rely on the drives own error detection and=20 > correction systems (hamming code based usually, which is actually=20 > pretty solid). I'm pretty sure that that 3ware doesn't use any = checksums. You are seriously confused. You are confusing the RAID 5 checksum with = the drive data checksum. We are talking about making sure the RAID 5 = checksums are readable so that, if a drive fails, the data can be = reconstructed from the checksum. =20 > However, in this particular case, validating checksums would=20 > have been unhelpful, since the disk was unreadable. diskcheckd=20 > would have detected this issue. It would probably have prevented=20 > the problem, if it had been running previously. No, it would have saved him. The problem was he lost a drive, and = checksums *ON* *OTHER* *DRIVES* were unreadable. Quite possibly they had = been unreabable for months, but were never checked, since they are only = *needed* to reconstruct the data. =20 > ZFS is also a good option. It has file level checksumming. =20 > ZFS never trusts the disks, and is super paranoid. And ZFS can=20 > do background scrubbing too. I can't wait for ZFS in FreeBSD 7,=20 > because ZFS in software is going to 10 x better than anything 3ware = has. That wouuld not have helped him one bit. When the drive failed, the RAID = 5 checksums on the other drives still would not have been scrubbed. The = RAID 5 checksum (technically an XOR) is only needed to recover the RAID = 5 array if a drive (or sector) fails. The only thing that will fix this is specifically verifying the RAID 5 = checksum blocks. If a controller provides no way to do this, it is badly = broken. If a verify operation does not verify the checksum blocks, it is = broken. DS