From owner-freebsd-arch@freebsd.org Mon Mar 12 17:11:42 2018 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A61BF2C980 for ; Mon, 12 Mar 2018 17:11:42 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x22b.google.com (mail-it0-x22b.google.com [IPv6:2607:f8b0:4001:c0b::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1882D86B1A for ; Mon, 12 Mar 2018 17:11:42 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x22b.google.com with SMTP id u5-v6so12450465itc.1 for ; Mon, 12 Mar 2018 10:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=NC9/9+VsdTpO39szZttPOl2mb16vBPjCeI1+teosK9g=; b=x9AH66wYYEGuk4uF9zD9nWrIo7Zx4HidSqcIlmDq33UFZRDnWtvr8rHBbcdKpYsiGV U3vBhNgvWHhp1otRpgenmhq2W23sz785a4cJKoY5OfEOeQO+NMbdfv3ZG04q4J8g0GSD nFufGqLXQEdnT7mifrvUkedFUxVxExkw9qlwYdkq6BBb5Gx9qbu6/N9j6B9K+/EOogIi GkJPUNrykKrKzi7hg7RhPgPN2ERm/T1tdBNerv49m2H3FPitknZQ6xngXMwFZ0YBTFXD OgRpzS1M5bOcyp07H2vdfjmnszpZI/ONHBw0RhEjaEiAc2C3nzx50mB5ihLtVM+seJpq binw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=NC9/9+VsdTpO39szZttPOl2mb16vBPjCeI1+teosK9g=; b=UMiyQhOLw+zzWGd7TJLtG11eQytU6qFtj5YE8gs7Kr6i1nE/ncWg2l7cCT4ZbG7wTP rXNAdGABCe4kg3bTysD0i7XWgypTowM5zfIuYWhdvoqCREnu7aXundh1aq4PiHqwC51O kDVM0rOXnxI0RhZOhIZV5VaypJuK+TJm6tHzVvaZz4FU8cXkXx9JFr0ANf6NLP8ar2Zn ZPMAgS9gSwcJnpZGagPXfii8+RfeHO/unukThiecEjla4qVjl57V3gCHgAfFobpEYNDi s4qAgs9ZvLRFhxlvaIT2I91g86JSVFivy7svwRVVY7Ir8eHVrGvp36wst6hmSrJAtwuN oP5A== X-Gm-Message-State: AElRT7Ez6TDR9b7E3NgwcMOce7COT8bgyQUAJGlYvhxbUmUj8jK1L3Zh Nc07DFQOt4MDwkyKUkSK/hH7YrAbKTRAPmwZIF7SZQ== X-Google-Smtp-Source: AG47ELvbknfpMTWBnOUrSYKNunuJ3wNZNf2om76XMuedjgAsGyCh/ndyOBD7U0ZrnE/Y1XpfsmA6XF+G793AvmIVamo= X-Received: by 10.36.16.147 with SMTP id 141mr9583195ity.73.1520874701268; Mon, 12 Mar 2018 10:11:41 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.203.196 with HTTP; Mon, 12 Mar 2018 10:11:40 -0700 (PDT) X-Originating-IP: [2603:300b:6:5100:1052:acc7:f9de:2b6d] In-Reply-To: <809d9254-ee56-59d8-69a4-08838e985cea@FreeBSD.org> References: <809d9254-ee56-59d8-69a4-08838e985cea@FreeBSD.org> From: Warner Losh Date: Mon, 12 Mar 2018 11:11:40 -0600 X-Google-Sender-Auth: 8wO9USNQdf8tekqXZOBFR5dBO20 Message-ID: Subject: Re: geom->access problem and workaround To: Andriy Gapon Cc: freebsd-geom@freebsd.org, "freebsd-arch@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Mar 2018 17:11:42 -0000 On Mon, Mar 12, 2018 at 7:17 AM, Andriy Gapon wrote: > > According to Poul-Henning (phk@), the principal author of GEOM, a GEOM > class's > access method was intended to be a light-weight operation involving mostly > access counts. That is, it should be (have been) close in spirit to what > g_access() function does. The method is only called from g_access and it > is > always done under the GEOM topology lock (like with most GEOM "control > plane" > methods). The lock ensures that the method and the function operate on a > consistent state of the topology and all geoms in it. > > In reality, many classes have their access method do a lot more than just > checking and modifying access bits. And often, what the method does is > incompatible with the topology lock. > > Some examples. > g_mirror_access() has to drop and reacquire the topology lock to avoid a > LOR > (deadlock) because the method needs to use the class's internal sc_lock. > > zvol_geom_access() also has to drop and reacquire the topology lock when it > interacts with ZFS internals involving many locks. The main issue here is > that > ZFS is both above the GEOM when ZFS uses GEOM for the storage access and > it is > "below" the GEOM when ZFS is accessed through the ZVOL provider. > > g_disk_access() -> daopen(). In this case the topology lock is never > dropped, > but the operation issues multiple SCSI commands and waits for their > completion. > So, if something goes wrong and takes a long time to complete then the > whole > topology will be frozen for all that time. > [Perhaps doing the lock dance would be a better alternative] > > But, of course, dropping the lock does not come free. > It opens races where two (at least) sets of incompatible access counts may > get > granted. Or a special action, that should be done only on a first access > to a > geom, could be executed more than once. > > Bringing everything to conformance with the original design would be an > ideal > solution, but it will take a lot of work both in the individual > nonconforming > classes and in at least some of their consumers. It seems to require > moving all > the complex operations from access methods to the GEOM "data plane". E.g, > doing > those things upon the first I/O operation. Or having a new special > BIO_GETATTR > (kind of) operation that could be executed after g_access() but before the > actual I/O is allowed. > > I am proposing an interim solution, so really a workaround, for the > problem of > dropping the topology lock: > > https://reviews.freebsd.org/D14533 > > That workaround cannot guarantee, of course, the complete stability of the > topology, but it prevents concurrent calls to access methods. > The idea is very simple. Before calling a geom's access method the geom is > marked with a special flag unless the flag is already set in which case > the code > waits until the flag is cleared. The flag is cleared after the call, of > course. > The topology lock is released while waiting for the flag. > > I think that having this new flag may help to get more visibility into the > problem. > > P.S. > The workaround does not help daopen() at all. The storage layer generally doesn't expect higher-level locks around calls to it, and feels that it's free to sleep in the open routine for resources to become available. This is true across most 'open' routines (eg, tty will wait for the right signals, etc). In a world of removable media, I'm not sure that one can avoid this. But I'm not sure that calling open on the underlying device is at all compatible with the design goal of access being cheap. I think you can't have both: either you open the device, and cope with the fact that open may sleep, or it looks like you'll have broken code. Once we've updated the access counts, we can drop the topology lock to call open. If it succeeds, all is good. If it fails, then we have to reacquire it to "unaccess" the device after the failure... However, that doesn't help with the concurrent attempts to do first open for the device. g_disk_access will still have issues of sleeping indefinitely, which of course can lead to deadlock in complicated geometry situations (I say of course, but I'm not 100% sure). The whole reason that daopen may (but not always) sleep is that it may need to do I/O to the device to get it's media-status / size / SN if it' s a removable device... Just like with the RO flag, we'd want the open routine to fail if it can't reasonably access the device. Warner