From owner-freebsd-geom@freebsd.org Mon Jan 30 19:23:55 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4DEACC739A for ; Mon, 30 Jan 2017 19:23:55 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E6906212; Mon, 30 Jan 2017 19:23:53 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA13279; Mon, 30 Jan 2017 21:23:51 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1cYHYN-000HpV-LJ; Mon, 30 Jan 2017 21:23:51 +0200 Subject: Re: gmirror and a flaky member To: freebsd-geom@FreeBSD.org References: <7e4164bd-9804-02d5-5990-bc15354989e9@FreeBSD.org> <77c40117-35ab-2430-07f8-e1df6b87fe1c@FreeBSD.org> <3952383e-e03a-1b27-f798-bfb1cf0b6007@FreeBSD.org> Cc: Miroslav Lachman <000.fbsd@quip.cz>, Alexander Motin From: Andriy Gapon Message-ID: <69d4d61c-dcb3-a7ac-ecd3-e47facd19b2e@FreeBSD.org> Date: Mon, 30 Jan 2017 21:22:56 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <3952383e-e03a-1b27-f798-bfb1cf0b6007@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jan 2017 19:23:55 -0000 On 30/01/2017 21:15, Andriy Gapon wrote: > On 06/01/2017 12:12, Andriy Gapon wrote: >> To add more substance, here is what gets logged when the disk disappears: >> >> GEOM_MIRROR: Request failed (error=6). ada0p2[READ(offset=2517700608, length=4096)] >> GEOM_MIRROR: Device swap: provider ada0p2 disconnected. >> >> And here's what gets logged when the disk reappears: >> GEOM_MIRROR: Component ada0p2 (device swap) broken, skipping. >> GEOM_MIRROR: Cannot add disk ada0p2 to swap (error=22). > > I think I see a problem. > There are three places where G_MIRROR_DISK_STATE_DISCONNECTED event is posted: > 1. g_mirror_orphan() that is called when GEOM notifies us that a disk is gone > 2. g_mirror_regular_request(), when we get an error writing or reading data > 3. g_mirror_sync_request(), when e get an error writing data to a disk being > synchronized > 4. g_mirror_write_metadata() when we get an error while writing (updating) the > metadata to a member's label > > #1 is called when the disk disappears when there is no I/O. > If the disk disappears while there is some I/O, then we can get either #1 or #2. > We can get #3 during disk re-synchronization. > We can get #4 in "rare" cases when we update the metadata (e.g. change the > mirror configuration). > > In case #1 the code sets G_MIRROR_BUMP_SYNCID flag before posting the event. > In cases #2, #3 and #4 the code sets G_MIRROR_BUMP_GENID flag. > > I believe that the code should set G_MIRROR_BUMP_GENID only in case #4. > In that case the metadata becomes different between the mirror members and, > thus, there is no way for the code to automatically rebuild the mirror. > > In cases #1, #2 and #3 only the data becomes stale on a member and, thus, there > should be a chance to re-synchronize that member. In fact, in case #3 the > member is already being synchronized. > > I could be missing something, of course. > So, any comments and corrections are very welcome. > Thanks! > At the very minimum I would like to change G_MIRROR_BUMP_GENID to G_MIRROR_BUMP_SYNCID in g_mirror_regular_request() for ENXIO. -- Andriy Gapon