Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Oct 2018 16:02:31 +0000
From:      bugzilla-noreply@freebsd.org
To:        geom@FreeBSD.org
Subject:   [Bug 232671] [gmirror] gmirror fails to recover from degraded mirror sets in some circumstances
Message-ID:  <bug-232671-14739-zS9D7eEtiI@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-232671-14739@https.bugs.freebsd.org/bugzilla/>
References:  <bug-232671-14739@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D232671

--- Comment #3 from Conrad Meyer <cem@freebsd.org> ---
(In reply to Mark Johnston from comment #2)
Yep, I did this code inspection on CURRENT from yesterday-ish, so that revi=
sion
was present.

I'm not sure I want us to flip flop between STARTING and RUNNING in such a
case; it seems like both (1) we are allowed to remain in STARTING indefinit=
ely
by just returning (as long as we can expect some future event to potentially
transition us to RUNNING), and (2) we have enough information at STARTING t=
ime
to know that RUNNING will fail.  I.e., I'd like to be slightly more
conservative about when we transition to RUNNING.

As far as particular code change for the root cause, adding a check for `if
(ndisks =3D=3D 0) return;` right before the 'if (dirty =3D=3D 0) {' check s=
eems like it
*might* be sufficient to fix the correctness issue here (although not the
admin-introspection issue(s)).  After all, there is no point launching a
gmirror with only broken and synchronizing disks ;-).

Additionally, for administrability I'd like to record some information on t=
he
mirror softc about *why* the state is what it is.  (Possibly at least two
formatted string buffers -- why we last transitioned, and why we haven't yet
transitioned to the next logical state.  If either is not relevant, "n/a" w=
ould
be ok.)  That way, when we timeout or whatever, that is discoverable (and
ideally printed to console).

It might also make sense to do a similar thing for g_mirror_disks.  It'd al=
so
be good to add gmirror disk id to almost all of these log messages, since d=
aNN
devices can be enumerated in a different order between boots, and that was
super confusing for this sighting.

Certainly adding more test cases would be a good idea along with this revis=
ion,
thanks for the pointer.

I can't promise any time to work on right now, sorry.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-232671-14739-zS9D7eEtiI>