Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jan 2017 21:15:12 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-geom@FreeBSD.org
Cc:        Miroslav Lachman <000.fbsd@quip.cz>, Alexander Motin <mav@FreeBSD.org>
Subject:   Re: gmirror and a flaky member
Message-ID:  <3952383e-e03a-1b27-f798-bfb1cf0b6007@FreeBSD.org>
In-Reply-To: <77c40117-35ab-2430-07f8-e1df6b87fe1c@FreeBSD.org>
References:  <7e4164bd-9804-02d5-5990-bc15354989e9@FreeBSD.org> <77c40117-35ab-2430-07f8-e1df6b87fe1c@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 06/01/2017 12:12, Andriy Gapon wrote:
> To add more substance, here is what gets logged when the disk disappears:
> 
> GEOM_MIRROR: Request failed (error=6). ada0p2[READ(offset=2517700608, length=4096)]
> GEOM_MIRROR: Device swap: provider ada0p2 disconnected.
> 
> And here's what gets logged when the disk reappears:
> GEOM_MIRROR: Component ada0p2 (device swap) broken, skipping.
> GEOM_MIRROR: Cannot add disk ada0p2 to swap (error=22).

I think I see a problem.
There are three places where G_MIRROR_DISK_STATE_DISCONNECTED event is posted:
1. g_mirror_orphan() that is called when GEOM notifies us that a disk is gone
2. g_mirror_regular_request(), when we get an error writing or reading data
3. g_mirror_sync_request(), when e get an error writing data to a disk being
synchronized
4. g_mirror_write_metadata() when we get an error while writing (updating) the
metadata to a member's label

#1 is called when the disk disappears when there is no I/O.
If the disk disappears while there is some I/O, then we can get either #1 or #2.
We can get #3 during disk re-synchronization.
We can get #4 in "rare" cases when we update the metadata (e.g. change the
mirror configuration).

In case #1 the code sets G_MIRROR_BUMP_SYNCID flag before posting the event.
In cases #2, #3 and #4 the code sets G_MIRROR_BUMP_GENID flag.

I believe that the code should set G_MIRROR_BUMP_GENID only in case #4.
In that case the metadata becomes different between the mirror members and,
thus, there is no way for the code to automatically rebuild the mirror.

In cases #1, #2 and #3 only the data becomes stale on a member and, thus, there
should be a chance to re-synchronize that member.  In fact, in case #3 the
member is already being synchronized.

I could be missing something, of course.
So, any comments and corrections are very welcome.
Thanks!
-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3952383e-e03a-1b27-f798-bfb1cf0b6007>