From owner-freebsd-geom@FreeBSD.ORG Mon Nov 15 18:04:30 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91B4716A4CE for ; Mon, 15 Nov 2004 18:04:30 +0000 (GMT) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1324543D31 for ; Mon, 15 Nov 2004 18:04:30 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-90-129.roa.east.verizon.net [151.199.90.129]) by gromit.dlib.vt.edu (8.13.1/8.13.1) with ESMTP id iAFI4SqB041842 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 15 Nov 2004 13:04:29 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1) with ESMTP id iAFI4LKt081424 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 15 Nov 2004 13:04:22 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1/Submit) id iAFI4LXj081423 for freebsd-geom@freebsd.org; Mon, 15 Nov 2004 13:04:21 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) X-Authentication-Warning: zappa.Chelsea-Ct.Org: paul set sender to paul@gromit.dlib.vt.edu using -f From: Paul Mather To: freebsd-geom@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Mon, 15 Nov 2004 13:04:20 -0500 Message-Id: <1100541860.31778.36.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 FreeBSD GNOME Team Port Subject: Panic after trying to recover from drive failure with geom_vinum X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2004 18:04:30 -0000 I have a 5.3-STABLE system upgraded from a 5.2.1 system that used a root-on-vinum mirrored setup. Both under 5.2.1 and 5.3, the system periodically gets those "TIMEOUT - WRITE_DMA retrying" errors you sometimes hear people mention. Usually, it is nothing, but it seems the one that happened last night caused geom_vinum to mark the drive as down and flag all its plexes and subdisks down, too: Nov 15 04:34:14 handle kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=1581375 Nov 15 04:34:15 handle kernel: ad0: FAILURE - WRITE_DMA timed out Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk swap.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex swap.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk root.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex root.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk var.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex var.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk usr.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex usr.p0 is down Of course, the drive wasn't actually down, but how to tell geom_vinum that? I tried "gvinum start laurel" (laurel is the name for the ad0 drive), but geom_vinum said it couldn't. So, I thought I'd try and start the plexes individually. Unfortunately, "gvinum start root.p0" caused the machine to reboot. (I was logged in via SSH so I couldn't see what happened on the console; I'm presuming there was a panic followed by a reboot.) Luckily, when the system came back, "laurel" was now flagged as "up" and so a "gvinum start" of each plex synchronised them and brought them all back up. My question is this: what would be a better way to recover from this in the future, i.e., how to let geom_vinum know the drive was in fact "up"? With classic vinum, "setstate" could have been used as a last resort. I thought in retrospect that perhaps an "atacontrol detach" followed by an "atacontrol attach" might have brought the drive's real state to geom_vinum's attention. Does this sound likely? I'm just trying to avoid another unnecessary panic+reboot in the future, here. :-) Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa