Date: Mon, 15 Nov 2004 13:04:20 -0500 From: Paul Mather <paul@gromit.dlib.vt.edu> To: freebsd-geom@freebsd.org Subject: Panic after trying to recover from drive failure with geom_vinum Message-ID: <1100541860.31778.36.camel@zappa.Chelsea-Ct.Org>
next in thread | raw e-mail | index | archive | help
I have a 5.3-STABLE system upgraded from a 5.2.1 system that used a root-on-vinum mirrored setup. Both under 5.2.1 and 5.3, the system periodically gets those "TIMEOUT - WRITE_DMA retrying" errors you sometimes hear people mention. Usually, it is nothing, but it seems the one that happened last night caused geom_vinum to mark the drive as down and flag all its plexes and subdisks down, too: Nov 15 04:34:14 handle kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=1581375 Nov 15 04:34:15 handle kernel: ad0: FAILURE - WRITE_DMA timed out Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk swap.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex swap.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk root.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex root.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk var.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex var.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk usr.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex usr.p0 is down Of course, the drive wasn't actually down, but how to tell geom_vinum that? I tried "gvinum start laurel" (laurel is the name for the ad0 drive), but geom_vinum said it couldn't. So, I thought I'd try and start the plexes individually. Unfortunately, "gvinum start root.p0" caused the machine to reboot. (I was logged in via SSH so I couldn't see what happened on the console; I'm presuming there was a panic followed by a reboot.) Luckily, when the system came back, "laurel" was now flagged as "up" and so a "gvinum start" of each plex synchronised them and brought them all back up. My question is this: what would be a better way to recover from this in the future, i.e., how to let geom_vinum know the drive was in fact "up"? With classic vinum, "setstate" could have been used as a last resort. I thought in retrospect that perhaps an "atacontrol detach" followed by an "atacontrol attach" might have brought the drive's real state to geom_vinum's attention. Does this sound likely? I'm just trying to avoid another unnecessary panic+reboot in the future, here. :-) Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1100541860.31778.36.camel>