From owner-freebsd-fs@FreeBSD.ORG  Tue Jun 20 20:07:55 2006
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@freebsd.org
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7FEE016A502
	for <freebsd-fs@freebsd.org>; Tue, 20 Jun 2006 20:07:55 +0000 (UTC)
	(envelope-from user@dhp.com)
Received: from shell.dhp.com (shell.dhp.com [199.245.105.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 3977843D45
	for <freebsd-fs@freebsd.org>; Tue, 20 Jun 2006 20:07:54 +0000 (GMT)
	(envelope-from user@dhp.com)
Received: by shell.dhp.com (Postfix, from userid 896)
	id 5BF863131D; Tue, 20 Jun 2006 16:07:44 -0400 (EDT)
Date: Tue, 20 Jun 2006 16:07:44 -0400 (EDT)
From: Ensel Sharon <user@dhp.com>
To: Scott Long <scottl@samsco.org>
In-Reply-To: <4496D0D8.8040705@samsco.org>
Message-ID: <Pine.LNX.4.21.0606201559520.12027-100000@shell.dhp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: freebsd-fs@freebsd.org
Subject: Adaptec 2820sa redux, and possible problems
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2006 20:07:55 -0000


Scott, et. al, 

> As others suggested, you need to experiment with simplier 
> configurations.  This will help us identify the cause and hopefully 
> implement a fix.  No one is asking you to throw away money or resources.
> Since you've already done the simple test with a single drive, could you
> do the following two tests:
> 
> 1.  RAID-5, full size (whatever >2TB value you were talking about).
> 2.  RAID-6, <2TB.


Unfortunately, I was at a remote site and needed to get back on a
plane.  Therefore, I was forced to take the 8 disks, create a mirror with
the first two, and a raid-6 array with the remaining 6.  It's not a great
solution, but I only lost 3/8 for raid overhead instead of 4/8.

As far as your tests, this shows that a <2TB raid 6 does indeed work, and
that a non-raid-6 array also works.

Here is the bad news:

- the system survived raid creation (both arrays show optimal) (although
the cards kernel _did_ crash out at 2% ... I just rebooted and it picked
up where it left off until build/verify was complete)

- the system survived freebsd installation and my own OS installs, port
installs, etc.

- the system survived some very large, very long rsyncs (200+ GB, with
hundreds of thousands of inodes)

HOWEVER:

- if I do any kind of massive data move  between the two arrays, the
screen will fill up with aac0 command timeouts, and eventually will just
crash and burn with:

Warning! Controller is no longer running!  code=0xbcef0100

I am running the latest stable firmware on this card, which I believe is
9117.

Large array to array copies _are not_ something I need to do on this
system ... and if it can survive a pretty brutal rsync ... I guess what I
am asking is, if I am willing to accept possible system instability in
rare occurances, am I in danger of data loss if I just keep running on it
and wait for a better firmware (or whatever fix is developed) ?

Comments ?