Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Mar 2003 10:22:36 +1030
From:      Greg 'groggy' Lehey <grog@FreeBSD.org>
To:        Scott Mitchell <scott+freebsd@fishballoon.org>
Cc:        freebsd-questions@FreeBSD.ORG
Subject:   Re: Strange crash, possibly vinum-related
Message-ID:  <20030317235236.GH9422@wantadilla.lemis.com>
In-Reply-To: <20030317105828.GA23237@tuatara.fishballoon.org>
References:  <20030310231532.GD522@tuatara.fishballoon.org> <20030317105828.GA23237@tuatara.fishballoon.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--uAgJxtfIS94j9H4T
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Monday, 17 March 2003 at 10:58:28 +0000, Scott Mitchell wrote:
> On Mon, Mar 10, 2003 at 11:15:32PM +0000, Scott Mitchell wrote:
>> Hi all,
>>
>> I wonder if anyone out there can shed any light on this:
>>
>> A drive failed on one of our Vinum-powered RAID-5 arrays over the weeken=
d.
>> This morning, we swapped out the offending drive (hot-swappable SCSI
>> hardware), disklabel-ed it and restarted the offending subdisk.  Everyth=
ing
>> seemed fine at this point, with vinum happily reviving the stale subdisk.
>>
>> However, twenty minutes later, with the revive 29% complete, I got this =
in
>> /var/log/messages:
>>
>> Mar 10 11:39:50 kokako vinum[12708]: can't revive raid.p0.s0: Invalid ar=
gument
>>
>> 'vinum list' was also showing an error message, which I foolishly didn't
>> capture, something along the lines of 'the revive process died'.  Lacking
>> any better ideas, I started the subdisk again.  The revival seemed to pi=
ck
>> up where it left off.
>>
>> Half an hour later, the box rebooted :-(  I wasn't actually watching it =
at
>> the time, so I don't know if it finished reviving the subdisk or not.
>> There's no indication in the logs as to what happened, but the timing of
>> the reboot is consistent with it happening around the time the subdisk
>> would have come back to life.
>>
>> Once the box came back up, I restarted the subdisk yet again (I had to
>> create the drive again first), with the RAID volume unmounted.  This time
>> the process finished without complaints and things seem to be working as
>> well as ever since then.
> [logs, etc. snipped...]
>
>
> No takers?=20

I've been intending to do so, but there's not much I can do based on
the information you've supplied.

> Maybe someone who's done this (replacing a failed Vinum drive on
> hot-swap SCSI hardware) before can at least tell me whether:
>
> 	- I should have done some camcontrol magic before rebuilding
>         the drive?

I can't see anything in particular you would need to do, but then I
haven't seen the details.

> 	- Rebuilding the drive without unmounting the volume first was
> 	just asking for trouble?

There have been reports of this kind of problem, mainly from Vallo
Kallaste, who has also responded.  I haven't seen it myself, and I
haven't heard of panics as a result.  But yes, umounting is a good
precaution.

> 	- -hackers or even -stable is a better venue for this kind of problem?

-questions will do fine.

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply or reply to the original recipients.
For more information, see http://www.lemis.com/questions.html
See complete headers for address and phone numbers

--uAgJxtfIS94j9H4T
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (FreeBSD)

iD8DBQE+dl/EIubykFB6QiMRApS3AJ4oXbMpGoPx5CJbGExlyI4d2tHMDQCfcSIv
lWiwpcwEScE9H4klnY+lUEI=
=tgIY
-----END PGP SIGNATURE-----

--uAgJxtfIS94j9H4T--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030317235236.GH9422>