Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jan 2017 21:22:56 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        freebsd-geom@FreeBSD.org
Cc:        Miroslav Lachman <000.fbsd@quip.cz>, Alexander Motin <mav@FreeBSD.org>
Subject:   Re: gmirror and a flaky member
Message-ID:  <69d4d61c-dcb3-a7ac-ecd3-e47facd19b2e@FreeBSD.org>
In-Reply-To: <3952383e-e03a-1b27-f798-bfb1cf0b6007@FreeBSD.org>
References:  <7e4164bd-9804-02d5-5990-bc15354989e9@FreeBSD.org> <77c40117-35ab-2430-07f8-e1df6b87fe1c@FreeBSD.org> <3952383e-e03a-1b27-f798-bfb1cf0b6007@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 30/01/2017 21:15, Andriy Gapon wrote:
> On 06/01/2017 12:12, Andriy Gapon wrote:
>> To add more substance, here is what gets logged when the disk disappears:
>>
>> GEOM_MIRROR: Request failed (error=6). ada0p2[READ(offset=2517700608, length=4096)]
>> GEOM_MIRROR: Device swap: provider ada0p2 disconnected.
>>
>> And here's what gets logged when the disk reappears:
>> GEOM_MIRROR: Component ada0p2 (device swap) broken, skipping.
>> GEOM_MIRROR: Cannot add disk ada0p2 to swap (error=22).
> 
> I think I see a problem.
> There are three places where G_MIRROR_DISK_STATE_DISCONNECTED event is posted:
> 1. g_mirror_orphan() that is called when GEOM notifies us that a disk is gone
> 2. g_mirror_regular_request(), when we get an error writing or reading data
> 3. g_mirror_sync_request(), when e get an error writing data to a disk being
> synchronized
> 4. g_mirror_write_metadata() when we get an error while writing (updating) the
> metadata to a member's label
> 
> #1 is called when the disk disappears when there is no I/O.
> If the disk disappears while there is some I/O, then we can get either #1 or #2.
> We can get #3 during disk re-synchronization.
> We can get #4 in "rare" cases when we update the metadata (e.g. change the
> mirror configuration).
> 
> In case #1 the code sets G_MIRROR_BUMP_SYNCID flag before posting the event.
> In cases #2, #3 and #4 the code sets G_MIRROR_BUMP_GENID flag.
> 
> I believe that the code should set G_MIRROR_BUMP_GENID only in case #4.
> In that case the metadata becomes different between the mirror members and,
> thus, there is no way for the code to automatically rebuild the mirror.
> 
> In cases #1, #2 and #3 only the data becomes stale on a member and, thus, there
> should be a chance to re-synchronize that member.  In fact, in case #3 the
> member is already being synchronized.
> 
> I could be missing something, of course.
> So, any comments and corrections are very welcome.
> Thanks!
> 

At the very minimum I would like to change G_MIRROR_BUMP_GENID to
G_MIRROR_BUMP_SYNCID in g_mirror_regular_request() for ENXIO.


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69d4d61c-dcb3-a7ac-ecd3-e47facd19b2e>