Date: Mon, 15 Nov 2004 13:04:20 -0500 From: Paul Mather <paul@gromit.dlib.vt.edu> To: freebsd-geom@freebsd.org Subject: Panic after trying to recover from drive failure with geom_vinum Message-ID: <1100541860.31778.36.camel@zappa.Chelsea-Ct.Org>
next in thread | raw e-mail | index | archive | help
I have a 5.3-STABLE system upgraded from a 5.2.1 system that used a
root-on-vinum mirrored setup. Both under 5.2.1 and 5.3, the system
periodically gets those "TIMEOUT - WRITE_DMA retrying" errors you
sometimes hear people mention. Usually, it is nothing, but it seems the
one that happened last night caused geom_vinum to mark the drive as down
and flag all its plexes and subdisks down, too:
Nov 15 04:34:14 handle kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=1581375
Nov 15 04:34:15 handle kernel: ad0: FAILURE - WRITE_DMA timed out
Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk swap.p0.s0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex swap.p0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk root.p0.s0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex root.p0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk var.p0.s0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex var.p0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk usr.p0.s0 is down
Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex usr.p0 is down
Of course, the drive wasn't actually down, but how to tell geom_vinum
that? I tried "gvinum start laurel" (laurel is the name for the ad0
drive), but geom_vinum said it couldn't. So, I thought I'd try and
start the plexes individually. Unfortunately, "gvinum start root.p0"
caused the machine to reboot. (I was logged in via SSH so I couldn't
see what happened on the console; I'm presuming there was a panic
followed by a reboot.)
Luckily, when the system came back, "laurel" was now flagged as "up" and
so a "gvinum start" of each plex synchronised them and brought them all
back up.
My question is this: what would be a better way to recover from this in
the future, i.e., how to let geom_vinum know the drive was in fact "up"?
With classic vinum, "setstate" could have been used as a last resort. I
thought in retrospect that perhaps an "atacontrol detach" followed by an
"atacontrol attach" might have brought the drive's real state to
geom_vinum's attention. Does this sound likely?
I'm just trying to avoid another unnecessary panic+reboot in the future,
here. :-)
Cheers,
Paul.
--
e-mail: paul@gromit.dlib.vt.edu
"Without music to decorate it, time is just a bunch of boring production
deadlines or dates by which bills must be paid."
--- Frank Vincent Zappa
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1100541860.31778.36.camel>
