From owner-freebsd-geom@freebsd.org Mon Oct 9 07:19:15 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1995AE27BC5 for ; Mon, 9 Oct 2017 07:19:15 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 527D06D492 for ; Mon, 9 Oct 2017 07:19:13 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA28213 for ; Mon, 09 Oct 2017 10:19:06 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1e1SLC-0000LE-4w for freebsd-geom@FreeBSD.org; Mon, 09 Oct 2017 10:19:06 +0300 To: freebsd-geom@FreeBSD.org From: Andriy Gapon Subject: g_slice_orphan is unsafe? orphanization in general? Message-ID: <02411be8-b5c5-9e0d-c16d-8cdbdf480cf1@FreeBSD.org> Date: Mon, 9 Oct 2017 10:17:44 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Oct 2017 07:19:15 -0000 void g_slice_orphan(struct g_consumer *cp) { struct g_slicer *gsp; g_trace(G_T_TOPOLOGY, "g_slice_orphan(%p/%s)", cp, cp->provider->name); g_topology_assert(); /* XXX: Not good enough we leak the softc and its suballocations */ gsp = cp->geom->softc; cp->geom->softc = NULL; <--- I think that doing this is unsafe g_slice_free(gsp); <--- and this g_wither_geom(cp->geom, ENXIO); } I do not see how g_slice_orphan() - or, in fact, the whole orphanization process - is synchronized with g_io_request() and g_io_schedule_down(). I think that it is quite possible for a geom to become orphaned after g_io_check() succeeds and before geom->start is executed (or while it's being executed). I see that there is an attempt to prevent orphanization while a geom is in use. The code in one_event() does not call geom->orphan until pp->nstart == pp->nend. But there is no common lock to actually synchronize the check and updates of the variables. Atomic operations are not used either. So, in my opinion, the check is not really safe. It looks like r162200 attempted to fix the general problem but either it was not a complete solution or the I/O code was later changed: https://svnweb.freebsd.org/base?view=revision&revision=162200 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=102766 Back to g_slice, I see that g_slice_start() makes use of the softc. So, this is why I believe that destroying it right in g_slice_orphan is not correct. It seems that other GEOM classes go to greater lengths to ensure that there is no race between orphanization and I/O processing. Typically an internal lock and a flag seems to be used. I wonder if g_slice should follow that technique. Or maybe it's better to use a lock or atomic operations at GEOM infrastructure level to ensure that changes to pp->error, pp->flags, pp->nstart and pp->nend are really seen by other threads and in the correct order. -- Andriy Gapon