From owner-freebsd-questions@FreeBSD.ORG Thu Jul 6 13:34:54 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8259B16A4DA for ; Thu, 6 Jul 2006 13:34:54 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from pi.codefab.com (pi.codefab.com [199.103.21.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC8A443D45 for ; Thu, 6 Jul 2006 13:34:53 +0000 (GMT) (envelope-from cswiger@mac.com) Received: from localhost (localhost [127.0.0.1]) by pi.codefab.com (Postfix) with ESMTP id 1E08A5DAF; Thu, 6 Jul 2006 09:34:53 -0400 (EDT) X-Virus-Scanned: amavisd-new at codefab.com Received: from pi.codefab.com ([127.0.0.1]) by localhost (pi.codefab.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id i1SANZVZCkh0; Thu, 6 Jul 2006 09:34:52 -0400 (EDT) Received: from [192.168.1.251] (pool-68-161-117-245.ny325.east.verizon.net [68.161.117.245]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pi.codefab.com (Postfix) with ESMTP id BE96E5D27; Thu, 6 Jul 2006 09:34:51 -0400 (EDT) Message-ID: <44AD1170.1000205@mac.com> Date: Thu, 06 Jul 2006 09:34:40 -0400 From: Chuck Swiger User-Agent: Thunderbird 1.5.0.4 (Windows/20060516) MIME-Version: 1.0 To: Jeremy Ehrhardt References: <44AC5190.9070001@caltech.edu> In-Reply-To: <44AC5190.9070001@caltech.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: vinum stability? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jul 2006 13:34:54 -0000 Jeremy Ehrhardt wrote: > We've been testing this box as a > file server, and it usually works fine, but smartd reported a few bad > sectors on one of the drives, then a few days later it crashed while I > was running chmod -R on a directory on "drugs" and had to be manually > rebooted. I can't figure out exactly what happened, especially given > that RAID 5 is supposed to be robust against single drive failures and > that despite the bad blocks smartctl claims the drive is healthy. As soon as you notice bad sectors appearing on a modern drive, it's time to replace it. This is because modern drives already use spare sectors to replace failing data areas transparently, and when that no longer can be done because all of the spares have been used, the drive is likely to die shortly thereafter. RAID-5 provides protection against a single-drive failure, but once errors are seen, the RAID-volume is operating in degraded mode which involves a significant performance penalty and you no longer have any protection against data loss-- if you have a problem with another disk in the meantime before the failing drive gets replaced, you're probably going to lose the entire RAID volume and all data on it. > I have three questions: > 1: what's up with gvinum RAID 5? Does it crash randomly? Is it > considered stable? Will it lose data? Gvinum isn't supposed to crash randomly, and it reasonably stable, but it doesn't seen to be as reliable as either a hardware RAID setup or the older vinum from FreeBSD-4 and earlier. As for losing data, see above. > 2: am I using a SATA controller that has serious problems or something > like that? In other words, is this actually gvinum's fault? If you had a failing drive, that's not gvinum's fault. gvinum is supposed to handle a single-drive failure, but it's not clear what actually went wrong...log messages or dmesg output might be useful. > 3: would I be better off using a different RAID 5 system on another OS? Changing OSes won't make much difference; using hardware to implement the RAID might be an improvement, rather than using gvinum's software RAID. Of course, you'd have to adjust your config to fit within your hardware controller's capabilities. -- -Chuck