From owner-freebsd-stable@FreeBSD.ORG  Fri Oct 17 19:09:41 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BD3D016A4B3
	for <stable@freebsd.org>; Fri, 17 Oct 2003 19:09:41 -0700 (PDT)
Received: from pit.databus.com (p70-227.acedsl.com [66.114.70.227])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BF52A43FE1
	for <stable@freebsd.org>; Fri, 17 Oct 2003 19:09:40 -0700 (PDT)
	(envelope-from barney@pit.databus.com)
Received: from pit.databus.com (localhost [127.0.0.1])
	by pit.databus.com (8.12.9p2/8.12.9) with ESMTP id h9I29eYL026167
	for <stable@freebsd.org>; Fri, 17 Oct 2003 22:09:40 -0400 (EDT)
	(envelope-from barney@pit.databus.com)
Received: (from barney@localhost)
	by pit.databus.com (8.12.9p2/8.12.9/Submit) id h9I29egp026166
	for stable@freebsd.org; Fri, 17 Oct 2003 22:09:40 -0400 (EDT)
	(envelope-from barney)
Date: Fri, 17 Oct 2003 22:09:40 -0400
From: Barney Wolff <barney@databus.com>
To: stable@freebsd.org
Message-ID: <20031018020939.GA24917@pit.databus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.1i
X-Scanned-By: MIMEDefang 2.37
Subject: unintended ATARAIDDELETE
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Oct 2003 02:09:41 -0000

I've had a very odd problem with a -stable system on an Asus A7V333-raid,
which has a Promise raid controller on the motherboard.  For several days
in a row the system lost its raid0 array during the 3am daily run, leaving
it with no disk.  The raid was actually turned off in the bios, with
manual intervention required on reboot to turn it back on.  I suspected
hardware, but in desperation booted a -stable kernel from 10/3/03.  That
kernel survived the daily run, and reported the following:
Oct 14 14:41:43 192.168.24.4 /kernel.maybe.ok: ad6: hard error reading fsbn 133757952 of 0-127 (ad6 bn 133757952; cn 132696 tn 6 sn 6) trying PIO mode
(I should note that I added a script in /usr/local/etc/periodic/daily to
back up this system, so files are read that normally see no access.)

I suspect that something in the newer -stable kernel reacted to this hard
error by doing, intentionally or not, an ioctl ATARAIDDELETE.  Since
the error has since been remapped, I can't easily test this idea,
but thought I should report it in case it triggers a eureka moment
in a developer.

The syndrome appears only in response to a disk error; I've been running
a -stable kernel from 10/16/03 with no problem after the bad block
was remapped.  I added code to log and nop ata_raid_destroy, so I hope
to notice if it ever happens again.

-- 
Barney Wolff         http://www.databus.com/bwresume.pdf
I'm available by contract or FT, in the NYC metro area or via the 'Net.