Date: Tue, 09 Jun 2009 13:11:30 -0500 From: Robert Noland <rnoland@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Kostik Belousov <kostikbel@gmail.com>, Yuri Pankov <yuri.pankov@gmail.com>, freebsd-current@freebsd.org, Paul Saab <ps@freebsd.org>, Matthew Fleming <matthew.fleming@isilon.com> Subject: Re: panic: knlist not locked, but should be Message-ID: <1244571090.60347.1792.camel@balrog.2hip.net> In-Reply-To: <200906091325.08293.jhb@freebsd.org> References: <20090609163005.GD75569@deviant.kiev.zoral.com.ua> <06D5F9F6F655AD4C92E28B662F7F853E02CC8A29@seaxch09.desktop.isilon.com> <20090609170025.GE75569@deviant.kiev.zoral.com.ua> <200906091325.08293.jhb@freebsd.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On Tue, 2009-06-09 at 13:25 -0400, John Baldwin wrote:
> On Tuesday 09 June 2009 1:00:25 pm Kostik Belousov wrote:
> > On Tue, Jun 09, 2009 at 09:45:49AM -0700, Matthew Fleming wrote:
> > >
> > > > This appears to be an interaction with the recent changes to use
> > > > shared vnode locks for writes on ZFS. Hmm, I think it may be ok to
> > > > use a shared vnode lock for kevents on vnodes though. The vnode
> > > > interlock should be sufficient locking for what little work the kevent
> > >
> > > > filters do. As a quick hack for now the MNT_SHARED_WRITES() stuff
> > > > could avoid using shared locks 'if (!VN_KNLIST_EMPTY(vp))', but I
> > > > think the longer term fix is to not use the vnode locks for vnode
> > > kevents, but use the interlock instead.
> > >
> > > I tried (briefly) using the interlock since Isilon's vnode lock is
> > > cluster wide (in our 6.1 based code we got away with using Giant). This
> > > got me a LOR report on the interlock:
> > >
> > > /*
> > > * kqueue/VFS interaction
> > > */
> > > { "kqueue", &lock_class_mtx_sleep },
> > > { "struct mount mtx", &lock_class_mtx_sleep },
> > > { "vnode interlock", &lock_class_mtx_sleep },
> > > { NULL, NULL },
> > >
> > > since knote() will take first the list->kl_lock and then the kqueue
> > > lock. I didn't spend any time on it, and switched to using the vnode
> > > v_lock for my purposes. But someone added that lock ordering (r166421)
> > > for a reason.
> >
> > That was me, I actually looked for the reversed order that was reported
> > several times on the list in 6.1-6.2 timeframe. Unfortunately, nothing
> > was found.
> >
> > I noted in the separate letter that read filter for vnodes needs
> > shared vnode lock anyway.
>
> So my current idea is to allow shared vnode locks and use the vnode interlock
> in the filt_vfs_* routines to protect access to the kn_* member variables.
>
> --- //depot/projects/smpng/sys/kern/vfs_subr.c 2009/06/09 15:15:22
> +++ //depot/user/jhb/lock/kern/vfs_subr.c 2009/06/09 17:20:33
> @@ -4103,8 +4103,10 @@
> vfs_knllocked(void *arg)
> {
> struct vnode *vp = arg;
> + int islocked;
>
> - return (VOP_ISLOCKED(vp) == LK_EXCLUSIVE);
> + islocked = VOP_ISLOCKED(vp);
> + return (islocked == LK_SHARED || islocked == LK_EXCLUSIVE);
> }
>
> int
> @@ -4114,6 +4116,7 @@
> struct knote *kn = ap->a_kn;
> struct knlist *knl;
>
> + ASSERT_VOP_ELOCKED(vp, "vfs_kqfilter");
> switch (kn->kn_filter) {
> case EVFILT_READ:
> kn->kn_fop = &vfsread_filtops;
> @@ -4147,6 +4150,7 @@
> {
> struct vnode *vp = (struct vnode *)kn->kn_hook;
>
> + ASSERT_VOP_ELOCKED(vp, "filt_vfsdetach");
> KASSERT(vp->v_pollinfo != NULL, ("Missing v_pollinfo"));
> knlist_remove(&vp->v_pollinfo->vpi_selinfo.si_note, kn, 0);
> }
> @@ -4157,48 +4161,65 @@
> {
> struct vnode *vp = (struct vnode *)kn->kn_hook;
> struct vattr va;
> + int retval;
>
> /*
> * filesystem is gone, so set the EOF flag and schedule
> * the knote for deletion.
> */
> if (hint == NOTE_REVOKE) {
> + VI_LOCK(vp);
> kn->kn_flags |= (EV_EOF | EV_ONESHOT);
> + VI_UNLOCK(vp);
> return (1);
> }
>
> if (VOP_GETATTR(vp, &va, curthread->td_ucred))
> return (0);
>
> + VI_LOCK(vp);
> kn->kn_data = va.va_size - kn->kn_fp->f_offset;
> - return (kn->kn_data != 0);
> + retval = (kn->kn_data != 0);
> + VI_UNLOCK(vp);
> + return (retval);
> }
>
> /*ARGSUSED*/
> static int
> filt_vfswrite(struct knote *kn, long hint)
> {
> + struct vnode *vp = (struct vnode *)kn->kn_hook;
> +
> /*
> * filesystem is gone, so set the EOF flag and schedule
> * the knote for deletion.
> */
> + VI_LOCK(vp);
> if (hint == NOTE_REVOKE)
> kn->kn_flags |= (EV_EOF | EV_ONESHOT);
>
> kn->kn_data = 0;
> + VI_UNLOCK(vp);
> return (1);
> }
>
> static int
> filt_vfsvnode(struct knote *kn, long hint)
> {
> + struct vnode *vp = (struct vnode *)kn->kn_hook;
> + int retval;
> +
> + VI_LOCK(vp);
> if (kn->kn_sfflags & hint)
> kn->kn_fflags |= hint;
> if (hint == NOTE_REVOKE) {
> kn->kn_flags |= EV_EOF;
> + VI_UNLOCK(vp);
> return (1);
> }
> - return (kn->kn_fflags != 0);
> + retval = (kn->kn_fflags != 0);
> + VI_UNLOCK(vp);
> + return (retval);
> }
>
> int
Getting this panic with the patch. I have core files if I can get you
anything.
(kgdb) #0 doadump () at pcpu.h:223
#1 0xffffffff80590503 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:419
#2 0xffffffff8059098c in panic (fmt=Variable "fmt" is not available.
)
at /usr/src/sys/kern/kern_shutdown.c:575
#3 0xffffffff80560115 in knlist_add (knl=0xffffff0067190718,
kn=0xffffff00671be528, islocked=0)
at /usr/src/sys/kern/kern_event.c:1656
#4 0xffffffff8061bc14 in vfs_kqfilter (ap=Variable "ap" is not
available.
)
at /usr/src/sys/kern/vfs_subr.c:4140
#5 0xffffffff808c1e97 in VOP_KQFILTER_APV (vop=0xffffffff80bc4580,
a=0xffffff803e661820) at vnode_if.c:1141
#6 0xffffffff8062a241 in vn_kqfilter (fp=0xffffff0056f5f820,
kn=0xffffff00671be528) at vnode_if.h:500
#7 0xffffffff80562084 in kqueue_register (kq=0xffffff0005adf000,
kev=0xffffff803e661920, td=0xffffff004054d720, waitok=1)
at /usr/src/sys/kern/kern_event.c:968
#8 0xffffffff805623f2 in kern_kevent (td=0xffffff004054d720,
fd=Variable "fd" is not available.
)
at /usr/src/sys/kern/kern_event.c:721
#9 0xffffffff80562e10 in kevent (td=0xffffff004054d720,
uap=0xffffff803e661c00) at /usr/src/sys/kern/kern_event.c:642
#10 0xffffffff80877e5d in syscall (frame=0xffffff803e661c90)
at /usr/src/sys/amd64/amd64/trap.c:984
#11 0xffffffff808527d0 in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:364
#12 0x0000000800e2672c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)
robert.
>
> --
> John Baldwin
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
--
Robert Noland <rnoland@FreeBSD.org>
FreeBSD
[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (FreeBSD)
iEYEABECAAYFAkoupdIACgkQM4TrQ4qfROP+LACeJLsW2lQNdYdTXedXmHtkfm93
9PYAn2oM1tFzKUWgQA1oJjMKgtIOTv4e
=mWRR
-----END PGP SIGNATURE-----
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1244571090.60347.1792.camel>
