From owner-freebsd-stable@FreeBSD.ORG Fri Oct 17 19:09:41 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD3D016A4B3 for ; Fri, 17 Oct 2003 19:09:41 -0700 (PDT) Received: from pit.databus.com (p70-227.acedsl.com [66.114.70.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF52A43FE1 for ; Fri, 17 Oct 2003 19:09:40 -0700 (PDT) (envelope-from barney@pit.databus.com) Received: from pit.databus.com (localhost [127.0.0.1]) by pit.databus.com (8.12.9p2/8.12.9) with ESMTP id h9I29eYL026167 for ; Fri, 17 Oct 2003 22:09:40 -0400 (EDT) (envelope-from barney@pit.databus.com) Received: (from barney@localhost) by pit.databus.com (8.12.9p2/8.12.9/Submit) id h9I29egp026166 for stable@freebsd.org; Fri, 17 Oct 2003 22:09:40 -0400 (EDT) (envelope-from barney) Date: Fri, 17 Oct 2003 22:09:40 -0400 From: Barney Wolff To: stable@freebsd.org Message-ID: <20031018020939.GA24917@pit.databus.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Scanned-By: MIMEDefang 2.37 Subject: unintended ATARAIDDELETE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2003 02:09:41 -0000 I've had a very odd problem with a -stable system on an Asus A7V333-raid, which has a Promise raid controller on the motherboard. For several days in a row the system lost its raid0 array during the 3am daily run, leaving it with no disk. The raid was actually turned off in the bios, with manual intervention required on reboot to turn it back on. I suspected hardware, but in desperation booted a -stable kernel from 10/3/03. That kernel survived the daily run, and reported the following: Oct 14 14:41:43 192.168.24.4 /kernel.maybe.ok: ad6: hard error reading fsbn 133757952 of 0-127 (ad6 bn 133757952; cn 132696 tn 6 sn 6) trying PIO mode (I should note that I added a script in /usr/local/etc/periodic/daily to back up this system, so files are read that normally see no access.) I suspect that something in the newer -stable kernel reacted to this hard error by doing, intentionally or not, an ioctl ATARAIDDELETE. Since the error has since been remapped, I can't easily test this idea, but thought I should report it in case it triggers a eureka moment in a developer. The syndrome appears only in response to a disk error; I've been running a -stable kernel from 10/16/03 with no problem after the bad block was remapped. I added code to log and nop ata_raid_destroy, so I hope to notice if it ever happens again. -- Barney Wolff http://www.databus.com/bwresume.pdf I'm available by contract or FT, in the NYC metro area or via the 'Net.