From owner-freebsd-fs@FreeBSD.ORG Mon Apr 28 16:22:51 2008 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEB14106567B; Mon, 28 Apr 2008 16:22:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from anti-4.kiev.sovam.com (anti-4.kiev.sovam.com [62.64.120.202]) by mx1.freebsd.org (Postfix) with ESMTP id 631588FC1B; Mon, 28 Apr 2008 16:22:51 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from [212.82.216.226] (helo=skuns.kiev.zoral.com.ua) by anti-4.kiev.sovam.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1JqW7t-000C1E-Fe; Mon, 28 Apr 2008 19:22:50 +0300 Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by skuns.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3SGMiaC048965 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 28 Apr 2008 19:22:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2) with ESMTP id m3SGMdL0057121; Mon, 28 Apr 2008 19:22:39 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.2/8.14.2/Submit) id m3SGMctY057120; Mon, 28 Apr 2008 19:22:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 28 Apr 2008 19:22:38 +0300 From: Kostik Belousov To: Daichi GOTO Message-ID: <20080428162238.GT18958@deviant.kiev.zoral.com.ua> References: <4811B0A0.8040702@freebsd.org> <20080426100116.GL18958@deviant.kiev.zoral.com.ua> <48159AC5.3030000@freebsd.org> <20080428132413.GS18958@deviant.kiev.zoral.com.ua> <4815E107.9030902@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FJ2D5YQYG6NL2pc1" Content-Disposition: inline In-Reply-To: <4815E107.9030902@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: ClamAV version 0.91.2, clamav-milter version 0.91.2 on skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.4 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on skuns.kiev.zoral.com.ua X-Scanner-Signature: 00e80b47c0af1091817fe870f3098f37 X-DrWeb-checked: yes X-SpamTest-Envelope-From: kostikbel@gmail.com X-SpamTest-Group-ID: 00000000 X-SpamTest-Info: Profiles 2737 [Apr 28 2008] X-SpamTest-Info: helo_type=3 X-SpamTest-Info: {received from trusted relay: not dialup} X-SpamTest-Method: none X-SpamTest-Method: Local Lists X-SpamTest-Rate: 0 X-SpamTest-Status: Not detected X-SpamTest-Status-Extended: not_detected X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0255], KAS30/Release Cc: fs@freebsd.org Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Apr 2008 16:22:51 -0000 --FJ2D5YQYG6NL2pc1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote: > Thanks for your response and explanation :) >=20 > Kostik Belousov wrote: > >On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote: > >>Kostik Belousov wrote: > >>>On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote: > >>>>Hi Konstantin :) > >>>> > >>>>To fix a unionfs issue of=20 > >>>>http://www.freebsd.org/cgi/query-pr.cgi?pr=3D109377, > >>>>we need to add new functions > >>>> > >>>> void vkernrele(struct vnode *vp); > >>>> void vkernref(struct vnode *vp); > >>>> > >>>>and one value > >>>> > >>>> int v_kernusecount; /* i ref count of kernel */ > >>>> > >>>>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c. > >>>> > >>>>Unionfs will be panic when lower fs layer is forced umounted by > >>>>"umount -f". So to avoid this issue, we've added > >>>>"v_kernusecount" value that means "a vnode count that kernel are > >>>>using". vkernrele() and vkernref() are functions that manage > >>>>"v_kernusecount" value. > >>>> > >>>>Please check those and give us an approve or some comments! > >>>There is already the vnode reference count. From your description, I= =20 > >>>cannot > >>>understand how the kernusecount would prevent the panic when forced=20 > >>>unmount > >>>is performed. Could you, please, show the actual code ? PR mentioned > >>>does not contain any patch. > >>Our patch realizes avoiding kernel panic by "umount -f" operation using= =20 > >>with > >>EBUSY process. > >> > >>On current implementation (not applied our patch), "umount -f" tries to > >>release vnode at any vnode reference count value. Since that, unionfs > >>and nullfs access invalid vnode and lead kernel panic. To prevent this > >>issue, we need a some kind of not-umount-accept-mechanism in invalid ca= se > >>(e.x. fs in unionfsed stack, it must be umounted in correct order) and > >>to realize that, current vnode reference count is not enough we are=20 > >>thinking. > >> > >>If you have any ideas to realize the same solution with current vnode > >>reference, would you please tell us your idea :) > >> > >>>The problem you described is common for the kernel code, and right way > >>>to handle it, for now, is to keep refcount _and_ check for the forced > >>>reclaim. > > > >Your patch in essence disables the forced unmount. I would object against > >such decision. >=20 > Oooooo.... OK. We understand. >=20 > >Even if taking this direction, I believe more cleaner solution would be > >to introduce a counter that disables the (forced) unmount into the > >struct mount, instead of the struct vnode. Having the counter in the > >vnode, the unmount -f behaviour is non-deterministic and depended on > >the presence of the cached vnodes of the upper layer. The mount counter > >would be incremented by unionfs cover mount. But, as I said above, this > >looks like a wrong solution. > > > >The right way to handle the forced reclaim with the current VFS is to > >add the explicit checks for the reclaimed vnodes where it is needed. The > >vnode cannot be reclaimed while the vnode lock is held. When obtaining > >the vnode lock, the reclamation can be detected. For instance, the > >vget() without LK_RETRY shall be checked for ENOENT. >=20 > At last, we want to check that vnode is released or not where > unionfs does not know. If we can do that check, our patch is > not needed for solving that issue. >=20 > Would you please give us the way to check that target vnode is > released or not before accessing it. The basic rules of our VFS are: 1. You _must_ hold the vnode unless the vnode is locked. Hold count prevents the vnode memory from being reused and guarantees the validity of the counters, v_vnlock, v_mount and vop (but please note that validity !=3D stability). E.g., v_mount may be NULLed and vop become the deadfs_vop due to reclamation. 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other words, if you have a pointer to the non-reclaimed vnode that is locked, the vnode cannot be reclaimed until the lock is freed. 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode of operations. - If you specify the LK_RETRY in the lock flags, you would get even the reclaimed vnode locked. - If you do not specified LK_RETRY, you would get ENOENT for the reclaimed vnode. [See the #1 for the reason why you must have a vnode held while calling vget() or vn_lock()]. 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode interlock locked to check the context of the v_iflag. Most filesystems, as opposed to the VFS, use the other technique to detect the reclaimed vnode, if needed. They clear the v_data in the vop_reclaim, and verification of the (v_data !=3D NULL) is enough to check for reclamatio= n. Very good example of the practical usage of the rules above are the nullfs routines null_reclaim(), null_lock() and null_nodeget(). >=20 >=20 > >You said that that nullfs is vulnerable to the problem. Could you, > >please, point me to the corresponding stack trace ? At least, the nullfs > >vop_lock() seems to carefully check the possible problems. --FJ2D5YQYG6NL2pc1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkgV+c4ACgkQC3+MBN1Mb4haKACfXxdcHAicJTki0O0Iw60E3WmG 4y8An2qfC3GYLpvDljGmgrbxKqtJY8uS =2gda -----END PGP SIGNATURE----- --FJ2D5YQYG6NL2pc1--