From owner-freebsd-fs@FreeBSD.ORG Tue Jun 20 20:07:55 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7FEE016A502 for ; Tue, 20 Jun 2006 20:07:55 +0000 (UTC) (envelope-from user@dhp.com) Received: from shell.dhp.com (shell.dhp.com [199.245.105.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3977843D45 for ; Tue, 20 Jun 2006 20:07:54 +0000 (GMT) (envelope-from user@dhp.com) Received: by shell.dhp.com (Postfix, from userid 896) id 5BF863131D; Tue, 20 Jun 2006 16:07:44 -0400 (EDT) Date: Tue, 20 Jun 2006 16:07:44 -0400 (EDT) From: Ensel Sharon To: Scott Long In-Reply-To: <4496D0D8.8040705@samsco.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org Subject: Adaptec 2820sa redux, and possible problems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2006 20:07:55 -0000 Scott, et. al, > As others suggested, you need to experiment with simplier > configurations. This will help us identify the cause and hopefully > implement a fix. No one is asking you to throw away money or resources. > Since you've already done the simple test with a single drive, could you > do the following two tests: > > 1. RAID-5, full size (whatever >2TB value you were talking about). > 2. RAID-6, <2TB. Unfortunately, I was at a remote site and needed to get back on a plane. Therefore, I was forced to take the 8 disks, create a mirror with the first two, and a raid-6 array with the remaining 6. It's not a great solution, but I only lost 3/8 for raid overhead instead of 4/8. As far as your tests, this shows that a <2TB raid 6 does indeed work, and that a non-raid-6 array also works. Here is the bad news: - the system survived raid creation (both arrays show optimal) (although the cards kernel _did_ crash out at 2% ... I just rebooted and it picked up where it left off until build/verify was complete) - the system survived freebsd installation and my own OS installs, port installs, etc. - the system survived some very large, very long rsyncs (200+ GB, with hundreds of thousands of inodes) HOWEVER: - if I do any kind of massive data move between the two arrays, the screen will fill up with aac0 command timeouts, and eventually will just crash and burn with: Warning! Controller is no longer running! code=0xbcef0100 I am running the latest stable firmware on this card, which I believe is 9117. Large array to array copies _are not_ something I need to do on this system ... and if it can survive a pretty brutal rsync ... I guess what I am asking is, if I am willing to accept possible system instability in rare occurances, am I in danger of data loss if I just keep running on it and wait for a better firmware (or whatever fix is developed) ? Comments ?