From owner-freebsd-stable@FreeBSD.ORG Sat Oct 18 19:03:12 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 611B516A4B3 for ; Sat, 18 Oct 2003 19:03:12 -0700 (PDT) Received: from pit.databus.com (p70-227.acedsl.com [66.114.70.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 77F7943FBF for ; Sat, 18 Oct 2003 19:03:11 -0700 (PDT) (envelope-from barney@pit.databus.com) Received: from pit.databus.com (localhost [127.0.0.1]) by pit.databus.com (8.12.9p2/8.12.9) with ESMTP id h9J23AYL040695; Sat, 18 Oct 2003 22:03:10 -0400 (EDT) (envelope-from barney@pit.databus.com) Received: (from barney@localhost) by pit.databus.com (8.12.9p2/8.12.9/Submit) id h9J23ACp040694; Sat, 18 Oct 2003 22:03:10 -0400 (EDT) (envelope-from barney) Date: Sat, 18 Oct 2003 22:03:10 -0400 From: Barney Wolff To: Doug White Message-ID: <20031019020310.GA40618@pit.databus.com> References: <20031018020939.GA24917@pit.databus.com> <20031018161424.X35407@carver.gumbysoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031018161424.X35407@carver.gumbysoft.com> User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.37 cc: stable@freebsd.org Subject: Re: unintended ATARAIDDELETE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Oct 2003 02:03:12 -0000 On Sat, Oct 18, 2003 at 04:14:53PM -0700, Doug White wrote: > > > I've had a very odd problem with a -stable system on an Asus A7V333-raid, > > which has a Promise raid controller on the motherboard. For several days > > in a row the system lost its raid0 array during the 3am daily run, leaving > > it with no disk. The raid was actually turned off in the bios, with > > manual intervention required on reboot to turn it back on. I suspected > > hardware, but in desperation booted a -stable kernel from 10/3/03. That > > kernel survived the daily run, and reported the following: > > Oct 14 14:41:43 192.168.24.4 /kernel.maybe.ok: ad6: hard error reading fsbn 133757952 of 0-127 (ad6 bn 133757952; cn 132696 tn 6 sn 6) trying PIO mode > > (I should note that I added a script in /usr/local/etc/periodic/daily to > > back up this system, so files are read that normally see no access.) > > This usually means your disk is bad, which is why it keeps trashing the > array. Your system is trying to tell you something :-) Well of course the bad block is h/w. But deleting a raid0 on a hard error is insane. I can more-or-less understand for raid1 why that might be thought sensible, but a split raid0 is of no use for anything. Nor could I find anywhere in the kernel that actually deletes the raid. But for sure -stable from 9/24 behaved differently (ie, sanely) on getting the error than -stable from 10/13 or so. I don't think that's hardware. Time will tell, perhaps. -- Barney Wolff http://www.databus.com/bwresume.pdf I'm available by contract or FT, in the NYC metro area or via the 'Net.