From owner-freebsd-fs@FreeBSD.ORG Sun Aug 18 15:04:34 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8FF0B584; Sun, 18 Aug 2013 15:04:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0E3BF2B59; Sun, 18 Aug 2013 15:04:33 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id r7IF4Q1N009051; Sun, 18 Aug 2013 18:04:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua r7IF4Q1N009051 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id r7IF4QSY009050; Sun, 18 Aug 2013 18:04:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 18 Aug 2013 18:04:26 +0300 From: Konstantin Belousov To: Andriy Gapon Subject: Re: Deadlock in nullfs/zfs somewhere Message-ID: <20130818150426.GR4972@kib.kiev.ua> References: <20130718185215.GE5991@kib.kiev.ua> <51E91277.3070309@FreeBSD.org> <20130719103025.GJ5991@kib.kiev.ua> <51E95CDD.7030702@FreeBSD.org> <20130719184243.GM5991@kib.kiev.ua> <51E99477.1030308@FreeBSD.org> <20130721071124.GY5991@kib.kiev.ua> <51EBABAB.5040808@FreeBSD.org> <20130721161854.GC5991@kib.kiev.ua> <520209B0.1030402@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5+EaIICOHcajyXe8" Content-Disposition: inline In-Reply-To: <520209B0.1030402@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Aug 2013 15:04:34 -0000 --5+EaIICOHcajyXe8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 07, 2013 at 11:47:44AM +0300, Andriy Gapon wrote: >=20 > Kostik, >=20 > thank you for being patient with me and explaining details of the contrac= t and > inner workings of VFS suspend. >=20 > As we discussed out-of-band, unfortunately, it seems that it is impossibl= e to > implement the same contract for ZFS. The reason is that ZFS filesystems = appear > as many independent filesystems, but in reality they share a common pool.= So > suspending a single filesystem does not suspend the pool and that is cont= rary to > current VFS suspend concept. >=20 > Additionally, ZFS needs a "full" suspend mechanism that would prevent bot= h read > and write access from VFS layer. The current VFS suspend mechanism suspe= nd > writes / modifications only. >=20 > I am not sure how to reconcile the differences... > Here is a number of rough ideas. I will highly appreciate your opinion a= nd > suggestions. >=20 > Idea #1. > Add a new suspend type to VFS layer that would correspond to the needs of= ZFS. > This is quite laborious as it would require adding vn_start_read calls in= many > places. Also, making two kinds of VFS suspend play nice with each other = could > be non-trivial. If you mean a 'full suspend' mechanism which is to be added, as opposed to the existing 'write suspend', then yes, this is a correct approach, which would probably be useful outside ZFS as well. It's immediate application is e.g. for the unmounts. It is indeed very laborous and probably quite non-trivial, since the suspend lock should be before any filesystem-level blocking primitives, probably including vfs_busy(). >=20 > Idea #2. > This is perhaps an ugly approach, but I already have it implemented local= ly. > The idea is to re-use / abuse vnode locking as a ZFS suspend barrier. > (This can be considered to be analogous to putting vn_start_op() / vn_end= _op() > into vop_lock / vop_unlock). > That is, ZFS would override VOP_LOCK/VOP_UNLOCK to check for internal > suspension. The necessary care would be taken to respect all locking fla= gs > including LK_NOWAIT. Recursive entry would have to be supported too. Please note that nandfd used somewhat similar approach, where it caused obvious bugs last time I looked. At least, lookups were knowingly broken regarding to lock order. Devfs uses internal lock to protect the mount point, which is after vnode locks. Correcting the operation of the dm_lock required quite an efforts, look at the DEVFS_DMP_DROP etc in devfs code. If this is constrained to zfs without any effect on VFS, I do not care. >=20 > Idea #3. > Provide some other mechanism to expose ZFS suspension state to VFS. And = then > use that mechanism to avoid blocking on calls to ZFS in the strategic / > sensitive places like vlrureclaim(), vtryrecycle(), etc. --5+EaIICOHcajyXe8 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iQIcBAEBAgAGBQJSEOJ5AAoJEJDCuSvBvK1B548P/RwgoqB4aaTE0Tc5gplC4Tma cIpfIqDmN4lP9HIGDYVfTJuli/hnugs8NkK17yTrsHrjS8MbLMRUOucWIkO2pk99 3XoXF/ANpAkEGSgZH0ECmQEgK97FRvMJd25WSbdNZuT9AeN7jd327EwlaXNcDDZo I+TZZpry7IM7aICU48wBW02N5rmn5L5qNu5rLjpXkdE88beBanDwIv/9zeZlqI3p fcOQYrrL+/2mdbyGzdaNSS0AYCrLGN8V9M7XlucYxxxxkxxoMI+QHSaSxj1m1pUK LTcAdyKO1uLlXC6sLKe/2cTOtK8M+nmf60IzxyDlY0f+oAwVavhg6bSx7oShXeV+ msNbkVHXdFvTRWeHXo68FEub+MRStze8SZ2flSLOZuhly9R/0mliydONRnXilIbD 3vyPaochFQuogERVZluz+x+TawQW9mGuCrUC8oPjRpr0JbMPBhZBHCj/EZzsM96r 6DjSVMKhQuIQEG5ZRRSFORVPVKPQ67C4mznRkvyUPHf2STiJD+y/FIa2M1/8KnX0 9Q+ZaBluI+Euqocyeuk7OITrBj/InYru7MYVHzMpIvu9HC5uqyQRB4ew+C15CEuA yEJSA5wNOD4osV9BL461KDearD/9w4CsQaRTcm+hfVjpjoZJ8bW9iPY7ESW66CGt D/Axn/q581c6UwNQd2w0 =nt4a -----END PGP SIGNATURE----- --5+EaIICOHcajyXe8--