From owner-freebsd-fs@FreeBSD.ORG Wed Aug 7 08:48:24 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 19043E48 for ; Wed, 7 Aug 2013 08:48:24 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 63E1F2633 for ; Wed, 7 Aug 2013 08:48:22 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA21322; Wed, 07 Aug 2013 11:48:20 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1V6zPo-000O92-C2; Wed, 07 Aug 2013 11:48:20 +0300 Message-ID: <520209B0.1030402@FreeBSD.org> Date: Wed, 07 Aug 2013 11:47:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130708 Thunderbird/17.0.7 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: Deadlock in nullfs/zfs somewhere References: <20130718112814.GA5991@kib.kiev.ua> <51E7F05A.5020609@FreeBSD.org> <20130718185215.GE5991@kib.kiev.ua> <51E91277.3070309@FreeBSD.org> <20130719103025.GJ5991@kib.kiev.ua> <51E95CDD.7030702@FreeBSD.org> <20130719184243.GM5991@kib.kiev.ua> <51E99477.1030308@FreeBSD.org> <20130721071124.GY5991@kib.kiev.ua> <51EBABAB.5040808@FreeBSD.org> <20130721161854.GC5991@kib.kiev.ua> In-Reply-To: <20130721161854.GC5991@kib.kiev.ua> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Aug 2013 08:48:24 -0000 Kostik, thank you for being patient with me and explaining details of the contract and inner workings of VFS suspend. As we discussed out-of-band, unfortunately, it seems that it is impossible to implement the same contract for ZFS. The reason is that ZFS filesystems appear as many independent filesystems, but in reality they share a common pool. So suspending a single filesystem does not suspend the pool and that is contrary to current VFS suspend concept. Additionally, ZFS needs a "full" suspend mechanism that would prevent both read and write access from VFS layer. The current VFS suspend mechanism suspend writes / modifications only. I am not sure how to reconcile the differences... Here is a number of rough ideas. I will highly appreciate your opinion and suggestions. Idea #1. Add a new suspend type to VFS layer that would correspond to the needs of ZFS. This is quite laborious as it would require adding vn_start_read calls in many places. Also, making two kinds of VFS suspend play nice with each other could be non-trivial. Idea #2. This is perhaps an ugly approach, but I already have it implemented locally. The idea is to re-use / abuse vnode locking as a ZFS suspend barrier. (This can be considered to be analogous to putting vn_start_op() / vn_end_op() into vop_lock / vop_unlock). That is, ZFS would override VOP_LOCK/VOP_UNLOCK to check for internal suspension. The necessary care would be taken to respect all locking flags including LK_NOWAIT. Recursive entry would have to be supported too. Idea #3. Provide some other mechanism to expose ZFS suspension state to VFS. And then use that mechanism to avoid blocking on calls to ZFS in the strategic / sensitive places like vlrureclaim(), vtryrecycle(), etc. -- Andriy Gapon