Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Apr 2008 19:22:38 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Daichi GOTO <daichi@freebsd.org>
Cc:        fs@freebsd.org
Subject:   Re: Approval request of some additions to src/sys/kern/vfs_subr.c and src/sys/sys/vnode.h
Message-ID:  <20080428162238.GT18958@deviant.kiev.zoral.com.ua>
In-Reply-To: <4815E107.9030902@freebsd.org>
References:  <4811B0A0.8040702@freebsd.org> <20080426100116.GL18958@deviant.kiev.zoral.com.ua> <48159AC5.3030000@freebsd.org> <20080428132413.GS18958@deviant.kiev.zoral.com.ua> <4815E107.9030902@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--FJ2D5YQYG6NL2pc1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 28, 2008 at 11:36:55PM +0900, Daichi GOTO wrote:
> Thanks for your response and explanation :)
>=20
> Kostik Belousov wrote:
> >On Mon, Apr 28, 2008 at 06:37:09PM +0900, Daichi GOTO wrote:
> >>Kostik Belousov wrote:
> >>>On Fri, Apr 25, 2008 at 07:21:20PM +0900, Daichi GOTO wrote:
> >>>>Hi Konstantin :)
> >>>>
> >>>>To fix a unionfs issue of=20
> >>>>http://www.freebsd.org/cgi/query-pr.cgi?pr=3D109377,
> >>>>we need to add new functions
> >>>>
> >>>>  void vkernrele(struct vnode *vp);
> >>>>  void vkernref(struct vnode *vp);
> >>>>
> >>>>and one value
> >>>>
> >>>>  int	v_kernusecount; /* i ref count of kernel */
> >>>>
> >>>>to src/sys/sys/vnode.h and rc/sys/kern/vfs_subr.c.
> >>>>
> >>>>Unionfs will be panic when lower fs layer is forced umounted by
> >>>>"umount -f".  So to avoid this issue, we've added
> >>>>"v_kernusecount" value that means "a vnode count that kernel are
> >>>>using".  vkernrele() and vkernref() are functions that manage
> >>>>"v_kernusecount" value.
> >>>>
> >>>>Please check those and give us an approve or some comments!
> >>>There is already the vnode reference count. From your description, I=
=20
> >>>cannot
> >>>understand how the kernusecount would prevent the panic when forced=20
> >>>unmount
> >>>is performed. Could you, please, show the actual code ? PR mentioned
> >>>does not contain any patch.
> >>Our patch realizes avoiding kernel panic by "umount -f" operation using=
=20
> >>with
> >>EBUSY process.
> >>
> >>On current implementation (not applied our patch), "umount -f" tries to
> >>release vnode at any vnode reference count value. Since that, unionfs
> >>and nullfs access invalid vnode and lead kernel panic. To prevent this
> >>issue, we need a some kind of not-umount-accept-mechanism in invalid ca=
se
> >>(e.x. fs in unionfsed stack, it must be umounted in correct order) and
> >>to realize that, current vnode reference count is not enough we are=20
> >>thinking.
> >>
> >>If you have any ideas to realize the same solution with current vnode
> >>reference, would you please tell us your idea :)
> >>
> >>>The problem you described is common for the kernel code, and right way
> >>>to handle it, for now, is to keep refcount _and_ check for the forced
> >>>reclaim.
> >
> >Your patch in essence disables the forced unmount. I would object against
> >such decision.
>=20
> Oooooo....   OK. We understand.
>=20
> >Even if taking this direction, I believe more cleaner solution would be
> >to introduce a counter that disables the (forced) unmount into the
> >struct mount, instead of the struct vnode. Having the counter in the
> >vnode, the unmount -f behaviour is non-deterministic and depended on
> >the presence of the cached vnodes of the upper layer. The mount counter
> >would be incremented by unionfs cover mount. But, as I said above, this
> >looks like a wrong solution.
> >
> >The right way to handle the forced reclaim with the current VFS is to
> >add the explicit checks for the reclaimed vnodes where it is needed. The
> >vnode cannot be reclaimed while the vnode lock is held. When obtaining
> >the vnode lock, the reclamation can be detected. For instance, the
> >vget() without LK_RETRY shall be checked for ENOENT.
>=20
> At last, we want to check that vnode is released or not where
> unionfs does not know. If we can do that check, our patch is
> not needed for solving that issue.
>=20
> Would you please give us the way to check that target vnode is
> released or not before accessing it.

The basic rules of our VFS are:
1. You _must_ hold the vnode unless the vnode is locked. Hold count
   prevents the vnode memory from being reused and guarantees the
   validity of the counters, v_vnlock, v_mount and vop (but please note
   that validity !=3D stability). E.g., v_mount may be NULLed and vop
   become the deadfs_vop due to reclamation.
2. The vnode lock is held when the vnode is vgone(9)'ed. In the other
   words, if you have a pointer to the non-reclaimed vnode that
   is locked, the vnode cannot be reclaimed until the lock is freed.
3. The verbs that lock a vnode (vget() and vn_lock(9)) have two mode
   of operations.
   - If you specify the LK_RETRY in the lock flags, you would get
     even the reclaimed vnode locked.
   - If you do not specified LK_RETRY, you would get ENOENT for the
     reclaimed vnode.
   [See the #1 for the reason why you must have a vnode held while
    calling vget() or vn_lock()].
4. The reclaimed vnode has the VI_DOOMED flag set; you must have vnode
   interlock locked to check the context of the v_iflag. Most filesystems,
   as opposed to the VFS, use the other technique to detect the reclaimed
   vnode, if needed. They clear the v_data in the vop_reclaim, and
   verification of the (v_data !=3D NULL) is enough to check for reclamatio=
n.

Very good example of the practical usage of the rules above are the
nullfs routines null_reclaim(), null_lock() and null_nodeget().

>=20
>=20
> >You said that that nullfs is vulnerable to the problem. Could you,
> >please, point me to the corresponding stack trace ? At least, the nullfs
> >vop_lock() seems to carefully check the possible problems.

--FJ2D5YQYG6NL2pc1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAkgV+c4ACgkQC3+MBN1Mb4haKACfXxdcHAicJTki0O0Iw60E3WmG
4y8An2qfC3GYLpvDljGmgrbxKqtJY8uS
=2gda
-----END PGP SIGNATURE-----

--FJ2D5YQYG6NL2pc1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080428162238.GT18958>