Date: Mon, 28 Apr 2008 19:22:38 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Daichi GOTO <daichi@freebsd.org> Cc: fs@freebsd.org Subject: Re: Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h Message-ID: <20080428162238.GT18958@deviant.kiev.zoral.com.ua> In-Reply-To: <4815E107.9030902@freebsd.org> References: <4811B0A0.8040702@freebsd.org> <20080426100116.GL18958@deviant.kiev.zoral.com.ua> <48159AC5.3030000@freebsd.org> <20080428132413.GS18958@deviant.kiev.zoral.com.ua> <4815E107.9030902@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--FJ2D5YQYG6NL2pc1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote: > Thanks for your response and explanation :) >=20 > Kostik Belousov wrote: > >On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote: > >>Kostik Belousov wrote: > >>>On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote: > >>>>Hi Konstantin :) > >>>> > >>>>To fix a unionfs issue of=20 > >>>>http://www.freebsd.org/cgi/query-pr.cgi?pr=3D109377, > >>>>we need to add new functions > >>>> > >>>> void vkernrele(struct vnode *vp); > >>>> void vkernref(struct vnode *vp); > >>>> > >>>>and one value > >>>> > >>>> int v_kernusecount; /* i ref count of kernel */ > >>>> > >>>>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c. > >>>> > >>>>Unionfs will be panic when lower fs layer is forced umounted by > >>>>"umount -f". So to avoid this issue, we've added > >>>>"v_kernusecount" value that means "a vnode count that kernel are > >>>>using". vkernrele() and vkernref() are functions that manage > >>>>"v_kernusecount" value. > >>>> > >>>>Please check those and give us an approve or some comments! > >>>There is already the vnode reference count. From your description, I= =20 > >>>cannot > >>>understand how the kernusecount would prevent the panic when forced=20 > >>>unmount > >>>is performed. Could you, please, show the actual code ? PR mentioned > >>>does not contain any patch. > >>Our patch realizes avoiding kernel panic by "umount -f" operation using= =20 > >>with > >>EBUSY process. > >> > >>On current implementation (not applied our patch), "umount -f" tries to > >>release vnode at any vnode reference count value. Since that, unionfs > >>and nullfs access invalid vnode and lead kernel panic. To prevent this > >>issue, we need a some kind of not-umount-accept-mechanism in invalid ca= se > >>(e.x. fs in unionfsed stack, it must be umounted in correct order) and > >>to realize that, current vnode reference count is not enough we are=20 > >>thinking. > >> > >>If you have any ideas to realize the same solution with current vnode > >>reference, would you please tell us your idea :) > >> > >>>The problem you described is common for the kernel code, and right way > >>>to handle it, for now, is to keep refcount _and_ check for the forced > >>>reclaim. > > > >Your patch in essence disables the forced unmount. I would object against > >such decision. >=20 > Oooooo.... OK. We understand. >=20 > >Even if taking this direction, I believe more cleaner solution would be > >to introduce a counter that disables the (forced) unmount into the > >struct mount, instead of the struct vnode. Having the counter in the > >vnode, the unmount -f behaviour is non-deterministic and depended on > >the presence of the cached vnodes of the upper layer. The mount counter > >would be incremented by unionfs cover mount. But, as I said above, this > >looks like a wrong solution. > > > >The right way to handle the forced reclaim with the current VFS is to > >add the explicit checks for the reclaimed vnodes where it is needed. The > >vnode cannot be reclaimed while the vnode lock is held. When obtaining > >the vnode lock, the reclamation can be detected. For instance, the > >vget() without LK_RETRY shall be checked for ENOENT. >=20 > At last, we want to check that vnode is released or not where > unionfs does not know. If we can do that check, our patch is > not needed for solving that issue. >=20 > Would you please give us the way to check that target vnode is > released or not before accessing it. The basic rules of our VFS are: 1. You _must_ hold the vnode unless the vnode is locked. Hold count prevents the vnode memory from being reused and guarantees the validity of the counters, v_vnlock, v_mount and vop (but please note that validity !=3D stability). E.g., v_mount may be NULLed and vop become the deadfs_vop due to reclamation. 2. The vnode lock is held when the vnode is vgone(9)'ed. In the other words, if you have a pointer to the non-reclaimed vnode that is locked, the vnode cannot be reclaimed until the lock is freed. 3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode of operations. - If you specify the LK_RETRY in the lock flags, you would get even the reclaimed vnode locked. - If you do not specified LK_RETRY, you would get ENOENT for the reclaimed vnode. [See the #1 for the reason why you must have a vnode held while calling vget() or vn_lock()]. 4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode interlock locked to check the context of the v_iflag. Most filesystems, as opposed to the VFS, use the other technique to detect the reclaimed vnode, if needed. They clear the v_data in the vop_reclaim, and verification of the (v_data !=3D NULL) is enough to check for reclamatio= n. Very good example of the practical usage of the rules above are the nullfs routines null_reclaim(), null_lock() and null_nodeget(). >=20 >=20 > >You said that that nullfs is vulnerable to the problem. Could you, > >please, point me to the corresponding stack trace ? At least, the nullfs > >vop_lock() seems to carefully check the possible problems. --FJ2D5YQYG6NL2pc1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkgV+c4ACgkQC3+MBN1Mb4haKACfXxdcHAicJTki0O0Iw60E3WmG 4y8An2qfC3GYLpvDljGmgrbxKqtJY8uS =2gda -----END PGP SIGNATURE----- --FJ2D5YQYG6NL2pc1--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080428162238.GT18958>