Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Apr 2014 15:07:45 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Kohji Okuno <okuno.kohji@jp.panasonic.com>
Cc:        jmg@funkthat.com, freebsd-current@freebsd.org
Subject:   Re: kevent has bug?
Message-ID:  <20140402120745.GN21331@kib.kiev.ua>
In-Reply-To: <20140402.160616.1211219746022675269.okuno.kohji@jp.panasonic.com>
References:  <20140402.114516.1300054841784626892.okuno.kohji@jp.panasonic.com> <20140402061551.GB3270@funkthat.com> <20140402.160616.1211219746022675269.okuno.kohji@jp.panasonic.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--zvf7DYrKnARJlhgd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
> From: John-Mark Gurney <jmg@funkthat.com>
> Date: Tue, 1 Apr 2014 23:15:51 -0700
> > Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
> >> I think, kevent() has a bug.
> >> I tested sample programs by attached sources.
> >> This sample tests about EVFILT_SIGNAL.
> >>=20
> >> I build sample programs by the following commands.
> >> % gcc -O2 -o child child.c
> >> % gcc -O2 -o parent parent.c
> >>=20
> >> The expected result is the following.
> >> % ./parent
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20=20
> >> OK
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20=20
> >> OK
> >>=20
> >> But, sometimes the result was the following.
> >> % ./parent
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19=20
> >>=20
> >> This result means the number of times the signal has occured was
> >> incorrect.
> >=20
> > I was able to reproduce this...
> >=20
> >> In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
> >> number of times the signal has occurred since the last call to
> >> kevent(). This `data' is recorded by filt_signal() (This is f_event in
> >> struct filterops).
> >>=20
> >> The system call kevent()'s events are processed by kqueue_scan() in
> >> kern_event.c. In kqueue_scan(), kn->kn_fop->f_event() is allways
> >> called after KN_INFLUX is set to kn->kn_status.
> >>=20
> >> On the other hand, kernel events are occured by knote() in
> >> kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
> >> in kern_sig.c.) In knote(), kn->kn_fop->f_event() is called only when
> >> KN_INFLUX is not set in kn->kn_status.
> >>=20
> >> In race condition between kqueue_scan() and knote(),
> >> kn->kn_fop->f_event() from knote() may not be called, I think.
> >=20
> > Considering that both are called w/ a lock, that cannot happen..
> > KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
> > by knote...
> >=20
> >> In knote(), because the context holds knlist's lock, the context can
> >> not sleep. So, KN_INFLUX should not be set on calling
> >> kn->kn_fop->f_event() in kqueue_scan(), I think.
> >=20
> > No, it needs to be set:
> >  * Setting the KN_INFLUX flag enables you to unlock the kq that this kn=
ote
> >  * is on, and modify kn_status as if you had the KQ lock.
> >=20
> > As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
> > the knote will disappear out from under you causing you to dereference
> > possibly free'd memory..
> >=20
> > If you just tried to lock the list lock w/o unlocking the KQ lock, you
> > could end up w/ a dead lock, as you aren't maintaining lock order
> > properly..  The correct lock order if knlist -> kq...
> >=20
> >> What do you think about this issue?
> >=20
> > This is a real issue, but not due to the race you described
> > above...
>=20
> I beleave it's the result of the race.
>=20
> Could you try to add printf() in knote()?
> Please refer to attached patch.
>=20
>=20
> > I have verified on my machine that it isn't because there is a knote
> > waiting that isn't getting woken up, and the knote on my hung process
> > has data =3D=3D 0, so it definately lost one of the signals:
> > (kgdb) print $14.kq_knhash[20].slh_first[0]
> > $20 =3D {kn_link =3D {sle_next =3D 0x0}, kn_selnext =3D {sle_next =3D 0=
x0},
> >   kn_knlist =3D 0xfffff8005a9c5840, kn_tqe =3D {tqe_next =3D 0xfffff801=
fdab4500,
> >     tqe_prev =3D 0xfffff8004bb10038}, kn_kq =3D 0xfffff8004bb10000, kn_=
kevent =3D {
> >     ident =3D 20, filter =3D -6, flags =3D 32, fflags =3D 0, data =3D 0=
, udata =3D 0x0},
> >   kn_status =3D 0, kn_sfflags =3D 0, kn_sdata =3D 0, kn_ptr =3D {
> >     p_fp =3D 0xfffff8005a9c54b8, p_proc =3D 0xfffff8005a9c54b8,
> >     p_aio =3D 0xfffff8005a9c54b8, p_lio =3D 0xfffff8005a9c54b8,
> >     p_v =3D 0xfffff8005a9c54b8}, kn_fop =3D 0xffffffff81405ef0, kn_hook=
 =3D 0x0,
> >   kn_hookid =3D 0}
> >=20
> > If you want to find this yourself, you can run kgdb on a live system,
> > switch to the thread of the parent (info threads, thread XXX), and
> > do:
> > frame 7
> >=20
> > (or a frame that has td, which is struct thread *), then:
> > print *(struct kqueue *)td->td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0]=
=2Ef_data
> >=20
> > This will give you the struct kqueue * of the parent, and then:
> > print $XX.kq_knhash[0]@63
> >=20
> > to figure out where the knote is in the hash, and then you can print
> > it out yourself...
> >=20
> > I'm going to take a look at this a bit more later... I'm thinking of
> > using dtrace to collect the stacks where filt_signal is called, and
> > match them up...  dtrace might even be able to get us the note's data
> > upon return helping to make sure things got tracked properly...
> >=20
> > Thanks for finding this bug!  Hopefully we can find a solution to it..
> >=20
> > --=20
> >   John-Mark Gurney				Voice: +1 415 225 5579
> >=20
> >      "All that I will do, has been done, All that I have, has not."

> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index b3fb23d..7791447 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
>  		if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) {
>  			KQ_LOCK(kq);
>  			if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) {
> +				if (hint & NOTE_SIGNAL)
> +					printf("Aee2\n");
>  				KQ_UNLOCK(kq);
>  			} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
>  				kn->kn_status |=3D KN_INFLUX;
> @@ -1886,6 +1888,10 @@ knote(struct knlist *list, long hint, int lockflag=
s)
>  				KQ_UNLOCK(kq);
>  			}
>  		}
> +		else {
> +			if (hint & NOTE_SIGNAL)
> +				printf("Aee1\n");
> +		}
>  		kq =3D NULL;
>  	}
>  	if ((lockflags & KNF_LISTLOCKED) =3D=3D 0)

I think you are right, there is a race between setting influx before the
knote lock is owned in the kqueue_scan(), and knote() itself.  The issue
is that the influx from scan must not prevent knote() from waking up
the waiters.

Patch below fixed your test case for me, also tools/regression/kqueue did
not noticed a breakage.  I tried to describe the situation in the
comment in knote().  Also, I removed unlocked check for the KN_INFLUX
in knote, since it seems to be an optimization for rare case, and is
the race on its own.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index b3fb23d..380f1ff 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
 			continue;
 		kq =3D kn->kn_kq;
 		KQ_LOCK(kq);
-		if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) {
+		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) {
 			KQ_UNLOCK(kq);
 			continue;
 		}
@@ -1506,7 +1506,7 @@ retry:
 			KQ_LOCK(kq);
 			kn =3D NULL;
 		} else {
-			kn->kn_status |=3D KN_INFLUX;
+			kn->kn_status |=3D KN_INFLUX | KN_SCAN;
 			KQ_UNLOCK(kq);
 			if ((kn->kn_status & KN_KQUEUE) =3D=3D KN_KQUEUE)
 				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
@@ -1515,7 +1515,8 @@ retry:
 				KQ_LOCK(kq);
 				KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal);
 				kn->kn_status &=3D
-				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX);
+				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX |
+				    KN_SCAN);
 				kq->kq_count--;
 				KN_LIST_UNLOCK(kn);
 				influx =3D 1;
@@ -1545,7 +1546,7 @@ retry:
 			} else
 				TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
 		=09
-			kn->kn_status &=3D ~(KN_INFLUX);
+			kn->kn_status &=3D ~(KN_INFLUX | KN_SCAN);
 			KN_LIST_UNLOCK(kn);
 			influx =3D 1;
 		}
@@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
 	 */
 	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
 		kq =3D kn->kn_kq;
-		if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) {
+		KQ_LOCK(kq);
+		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) {
+			/*
+			 * Do not process the influx notes, except for
+			 * the influx coming from the kq unlock in the
+			 * kqueue_scan().  In the later case, we do
+			 * not interfere with the scan, since the code
+			 * fragment in kqueue_scan() locks the knlist,
+			 * and cannot proceed until we finished.
+			 */
+			KQ_UNLOCK(kq);
+		} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
+			kn->kn_status |=3D KN_INFLUX;
+			KQ_UNLOCK(kq);
+			error =3D kn->kn_fop->f_event(kn, hint);
 			KQ_LOCK(kq);
-			if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) {
-				KQ_UNLOCK(kq);
-			} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
-				kn->kn_status |=3D KN_INFLUX;
-				KQ_UNLOCK(kq);
-				error =3D kn->kn_fop->f_event(kn, hint);
-				KQ_LOCK(kq);
-				kn->kn_status &=3D ~KN_INFLUX;
-				if (error)
-					KNOTE_ACTIVATE(kn, 1);
-				KQ_UNLOCK_FLUX(kq);
-			} else {
-				kn->kn_status |=3D KN_HASKQLOCK;
-				if (kn->kn_fop->f_event(kn, hint))
-					KNOTE_ACTIVATE(kn, 1);
-				kn->kn_status &=3D ~KN_HASKQLOCK;
-				KQ_UNLOCK(kq);
-			}
+			kn->kn_status &=3D ~KN_INFLUX;
+			if (error)
+				KNOTE_ACTIVATE(kn, 1);
+			KQ_UNLOCK_FLUX(kq);
+		} else {
+			kn->kn_status |=3D KN_HASKQLOCK;
+			if (kn->kn_fop->f_event(kn, hint))
+				KNOTE_ACTIVATE(kn, 1);
+			kn->kn_status &=3D ~KN_HASKQLOCK;
+			KQ_UNLOCK(kq);
 		}
-		kq =3D NULL;
 	}
 	if ((lockflags & KNF_LISTLOCKED) =3D=3D 0)
 		list->kl_unlock(list->kl_lockarg);=20
diff --git a/sys/sys/event.h b/sys/sys/event.h
index bad8c9e..3b765c0 100644
--- a/sys/sys/event.h
+++ b/sys/sys/event.h
@@ -207,6 +207,7 @@ struct knote {
 #define KN_MARKER	0x20			/* ignore this knote */
 #define KN_KQUEUE	0x40			/* this knote belongs to a kq */
 #define KN_HASKQLOCK	0x80			/* for _inevent */
+#define	KN_SCAN		0x100			/* flux set in kqueue_scan() */
 	int			kn_sfflags;	/* saved filter flags */
 	intptr_t		kn_sdata;	/* saved data field */
 	union {

--zvf7DYrKnARJlhgd
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJTO/2QAAoJEJDCuSvBvK1BrN8P+QEPTxtboOSVhO3kxVxV2ugF
HjQzu/n10ap8sOJK5SBPrzcXva4Q0S1nnLJcO6PSuYSYrwbLoh52Rdo5Q4B64YTo
914MpXrjULiy2Q9KBWx5NitJr19DDO3GZKosPuMQqVtZg3FduQ404uTzIeqcC4eL
LhJdL9uEW0tigxQBgkZbJr8RXRtBBpwpI84OPI36bNC8LgPBc/3M0oUMwCjs3Y2K
t3vmGfw231enyme350cttp6T4E11EEfeGqkfHFw6OrnLLHQOefXG70ZNz9JmkcOJ
djzsAUoHcKppqN84JK62271NmgJ0No0aLl2smx8qd5FsyVWc3Fv69+vau6sj1vJ+
UW8QMpII+/wDwRT+JBFk7/ZqnnmDGkfD6qXcjjYPr+BTA8Q+Oz8vMo6nOIIQm9ik
IWEDsuWXnFubXkfJm7F9Q5aQVdLODAgFMIr9vgAwwenT89Iv0LRHSuiRCO8n4vMA
Zz2lQ7PetbWLUqkBOXlWFGLywKAGC5q7c5A8rSHwwt2galS39Ihb3diNfFEdk8An
S6zeCDfxNeiq2m2nS0Zw9m0UwPSomjIOY4O4biVS5FR9jyUfoH0JuOUJTAJkqLxX
PlAKD4gkE1r2w9IdZ5R1j9Xve8btYmh1y8W/wqK8lAEYhgVLODFOeC7HS7eZZWOR
tu+dGo+zuG8sAbsHq+ki
=SmD6
-----END PGP SIGNATURE-----

--zvf7DYrKnARJlhgd--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140402120745.GN21331>