From owner-freebsd-stable@FreeBSD.ORG  Sat Oct 18 19:03:12 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 611B516A4B3
	for <stable@freebsd.org>; Sat, 18 Oct 2003 19:03:12 -0700 (PDT)
Received: from pit.databus.com (p70-227.acedsl.com [66.114.70.227])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 77F7943FBF
	for <stable@freebsd.org>; Sat, 18 Oct 2003 19:03:11 -0700 (PDT)
	(envelope-from barney@pit.databus.com)
Received: from pit.databus.com (localhost [127.0.0.1])
	by pit.databus.com (8.12.9p2/8.12.9) with ESMTP id h9J23AYL040695;
	Sat, 18 Oct 2003 22:03:10 -0400 (EDT)
	(envelope-from barney@pit.databus.com)
Received: (from barney@localhost)
	by pit.databus.com (8.12.9p2/8.12.9/Submit) id h9J23ACp040694;
	Sat, 18 Oct 2003 22:03:10 -0400 (EDT)
	(envelope-from barney)
Date: Sat, 18 Oct 2003 22:03:10 -0400
From: Barney Wolff <barney@databus.com>
To: Doug White <dwhite@gumbysoft.com>
Message-ID: <20031019020310.GA40618@pit.databus.com>
References: <20031018020939.GA24917@pit.databus.com>
	<20031018161424.X35407@carver.gumbysoft.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20031018161424.X35407@carver.gumbysoft.com>
User-Agent: Mutt/1.4.1i
X-Scanned-By: MIMEDefang 2.37
cc: stable@freebsd.org
Subject: Re: unintended ATARAIDDELETE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Oct 2003 02:03:12 -0000

On Sat, Oct 18, 2003 at 04:14:53PM -0700, Doug White wrote:
> 
> > I've had a very odd problem with a -stable system on an Asus A7V333-raid,
> > which has a Promise raid controller on the motherboard.  For several days
> > in a row the system lost its raid0 array during the 3am daily run, leaving
> > it with no disk.  The raid was actually turned off in the bios, with
> > manual intervention required on reboot to turn it back on.  I suspected
> > hardware, but in desperation booted a -stable kernel from 10/3/03.  That
> > kernel survived the daily run, and reported the following:
> > Oct 14 14:41:43 192.168.24.4 /kernel.maybe.ok: ad6: hard error reading fsbn 133757952 of 0-127 (ad6 bn 133757952; cn 132696 tn 6 sn 6) trying PIO mode
> > (I should note that I added a script in /usr/local/etc/periodic/daily to
> > back up this system, so files are read that normally see no access.)
> 
> This usually means your disk is bad, which is why it keeps trashing the
> array.  Your system is trying to tell you something :-)

Well of course the bad block is h/w.  But deleting a raid0 on a hard
error is insane.  I can more-or-less understand for raid1 why that
might be thought sensible, but a split raid0 is of no use for anything.
Nor could I find anywhere in the kernel that actually deletes the raid.
But for sure -stable from 9/24 behaved differently (ie, sanely) on
getting the error than -stable from 10/13 or so.  I don't think that's
hardware.  Time will tell, perhaps.

-- 
Barney Wolff         http://www.databus.com/bwresume.pdf
I'm available by contract or FT, in the NYC metro area or via the 'Net.