From owner-freebsd-geom@freebsd.org Mon Mar 12 13:17:52 2018 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4796F45305; Mon, 12 Mar 2018 13:17:52 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 512517AA0C; Mon, 12 Mar 2018 13:17:52 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f49.google.com with SMTP id q69-v6so23124566lfi.10; Mon, 12 Mar 2018 06:17:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language:content-transfer-encoding; bh=5jBSZcD7KR50Fj82AkHGF6h336mA1tGGLVXNTXAcH5c=; b=i53biMxorStuutX2Bb08VH0JVq2ugjn4MmeNQzbtMrvT86JPrr0BsIzAzNMg9p07UQ dXZzpkrN4GoZOxJuoDDDEjeMEq92+BnQ2/4SW9pNt3+pr1h86jNmP5OKU2yCsFrnAQES knA/BuzdmthjvCgY7lOq0uTrhnAMFjPjqW/xKwwRPPE8UrgUwznZ5/eKABUEI9aE/FIm lfuYcwgKZIYehYC5+cbrnRkev+ViJmZmmpfJgluMO4TWhuwgqrRoOoqbQgwGT9WhwQLi p8rDDMuFOlT1teuWv57Wv2Otkr4yl84i1mr2P3Mt766OiDD5Y4admoBL0GfbuwVsXidF C6CQ== X-Gm-Message-State: AElRT7EGPL9QsYqZLcL26Qs8GyKw4LsufhuYc0GtRS+J+RskZirMPol+ DlMv9tdhHNzdM+30it5/WOPhKPBf X-Google-Smtp-Source: AG47ELseJmK9YZf2riyV4lybw+SdZX9VsFxeudKefuuUYpATjSKV5ixwNxlIF6Hm+RnyR3+7kKU5bA== X-Received: by 10.46.14.10 with SMTP id 10mr5377960ljo.64.1520860670389; Mon, 12 Mar 2018 06:17:50 -0700 (PDT) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id v3sm1759720ljd.59.2018.03.12.06.17.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Mar 2018 06:17:49 -0700 (PDT) To: freebsd-geom@FreeBSD.org, freebsd-arch@FreeBSD.ORG From: Andriy Gapon Subject: geom->access problem and workaround Message-ID: <809d9254-ee56-59d8-69a4-08838e985cea@FreeBSD.org> Date: Mon, 12 Mar 2018 15:17:48 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Mar 2018 13:17:53 -0000 According to Poul-Henning (phk@), the principal author of GEOM, a GEOM class's access method was intended to be a light-weight operation involving mostly access counts. That is, it should be (have been) close in spirit to what g_access() function does. The method is only called from g_access and it is always done under the GEOM topology lock (like with most GEOM "control plane" methods). The lock ensures that the method and the function operate on a consistent state of the topology and all geoms in it. In reality, many classes have their access method do a lot more than just checking and modifying access bits. And often, what the method does is incompatible with the topology lock. Some examples. g_mirror_access() has to drop and reacquire the topology lock to avoid a LOR (deadlock) because the method needs to use the class's internal sc_lock. zvol_geom_access() also has to drop and reacquire the topology lock when it interacts with ZFS internals involving many locks. The main issue here is that ZFS is both above the GEOM when ZFS uses GEOM for the storage access and it is "below" the GEOM when ZFS is accessed through the ZVOL provider. g_disk_access() -> daopen(). In this case the topology lock is never dropped, but the operation issues multiple SCSI commands and waits for their completion. So, if something goes wrong and takes a long time to complete then the whole topology will be frozen for all that time. [Perhaps doing the lock dance would be a better alternative] But, of course, dropping the lock does not come free. It opens races where two (at least) sets of incompatible access counts may get granted. Or a special action, that should be done only on a first access to a geom, could be executed more than once. Bringing everything to conformance with the original design would be an ideal solution, but it will take a lot of work both in the individual nonconforming classes and in at least some of their consumers. It seems to require moving all the complex operations from access methods to the GEOM "data plane". E.g, doing those things upon the first I/O operation. Or having a new special BIO_GETATTR (kind of) operation that could be executed after g_access() but before the actual I/O is allowed. I am proposing an interim solution, so really a workaround, for the problem of dropping the topology lock: https://reviews.freebsd.org/D14533 That workaround cannot guarantee, of course, the complete stability of the topology, but it prevents concurrent calls to access methods. The idea is very simple. Before calling a geom's access method the geom is marked with a special flag unless the flag is already set in which case the code waits until the flag is cleared. The flag is cleared after the call, of course. The topology lock is released while waiting for the flag. I think that having this new flag may help to get more visibility into the problem. P.S. The workaround does not help daopen() at all. -- Andriy Gapon