Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Apr 2014 20:44:00 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Kohji Okuno <okuno.kohji@jp.panasonic.com>, freebsd-current@freebsd.org
Subject:   Re: kevent has bug?
Message-ID:  <20140402174400.GR21331@kib.kiev.ua>
In-Reply-To: <20140402164542.GC3270@funkthat.com>
References:  <20140402.114516.1300054841784626892.okuno.kohji@jp.panasonic.com> <20140402061551.GB3270@funkthat.com> <20140402.160616.1211219746022675269.okuno.kohji@jp.panasonic.com> <20140402120745.GN21331@kib.kiev.ua> <20140402164542.GC3270@funkthat.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--H25ow1gEc5p8sQPX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +030=
0:
> Well, it's not that its preventing waking up the waiter, but failing to
> register the event on the knote because of the _INFLUX flag...
Yes, I used the wrong terminology.

>=20
> > Patch below fixed your test case for me, also tools/regression/kqueue d=
id
> > not noticed a breakage.  I tried to describe the situation in the
> > comment in knote().  Also, I removed unlocked check for the KN_INFLUX
> > in knote, since it seems to be an optimization for rare case, and is
> > the race on its own.
>=20
> Comments below...
>=20
> > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> > index b3fb23d..380f1ff 100644
> > --- a/sys/kern/kern_event.c
> > +++ b/sys/kern/kern_event.c
>=20
> [...]
>=20
> > @@ -1506,7 +1506,7 @@ retry:
> >  			KQ_LOCK(kq);
> >  			kn =3D NULL;
> >  		} else {
> > -			kn->kn_status |=3D KN_INFLUX;
> > +			kn->kn_status |=3D KN_INFLUX | KN_SCAN;
> >  			KQ_UNLOCK(kq);
> >  			if ((kn->kn_status & KN_KQUEUE) =3D=3D KN_KQUEUE)
> >  				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
>=20
> Is there a reason you don't add the KN_SCAN to the other cases in
> kqueue_scan that set the _INFLUX flag?
Other cases in kqueue_scan() which set influx do the detach and drop,
so I do not see a need to ensure that note is registered.  Except I missed
one case, which you pointed out.

>=20
> [...]
>=20
> > @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockf=
lags)
> >  	 */
> >  	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
> >  		kq =3D kn->kn_kq;
> > -		if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) {
> > +		KQ_LOCK(kq);
> > +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) {
> > +			/*
> > +			 * Do not process the influx notes, except for
> > +			 * the influx coming from the kq unlock in the
> > +			 * kqueue_scan().  In the later case, we do
> > +			 * not interfere with the scan, since the code
> > +			 * fragment in kqueue_scan() locks the knlist,
> > +			 * and cannot proceed until we finished.
> > +			 */
>=20
> We might want to add a marker node, and reprocess the list from the
> marker node, because this might introduce other races in the code too...
> but the problem with that is that knote is expected to keep the list
> locked throughout the call if called w/ it already locked, and so we
> can't do that, without major work... :(
Why ? If the knlist lock is not dropped, I do not see a need for the
marker.  The patch does not introduce the sleep point for the KN_SCAN
knotes anyway.

>=20
> I added a similar comment in knote_fork:
>                  * XXX - Why do we skip the kn if it is _INFLUX?  Does th=
is
>                  * mean we will not properly wake up some notes?
>=20
> and it looks like it was true...
>=20
> So, upon reading the other _INFLUX cases, it looks like we should change
> _SCAN to be, _CHANGING or something similar, and any place we don't end
> up dropping the knote, we set this flag also...  Once such case is at
> the end of kqueue_register, just before the label done_ev_add, where we
> update the knote w/ new udata and other fields..  Or change the logic
> of the flag, and set it for all the cases we are about to drop the
> knote..
So do you prefer KN_CHANGING instead of KN_SCAN ?  I do not have any
objections against renaming the flag, but _CHANGING seems to not say
anything about the flag intent.  I would say that KN_STABLE is more
useful, or KN_INFLUX_NODEL, or whatever.

The done_ev_add case is indeed missed in my patch, thank you for noting.
The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
race is possible just by the nature of adding the knote.

>=20
> > +			KQ_UNLOCK(kq);
> > +		} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
> > +			kn->kn_status |=3D KN_INFLUX;
> > +			KQ_UNLOCK(kq);
> > +			error =3D kn->kn_fop->f_event(kn, hint);
> >  			KQ_LOCK(kq);
>=20
> I believe we can drop this unlock/lock pair as it's safe to hold the
> KQ lock over f_event, we do that in knote_fork...
The knote_fork() is for the special kinds of knote only, where we indeed kn=
ow
in advance that having the kqueue locked around f_event does not break thin=
gs.

Updated patch below.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index b3fb23d..fadb8fd 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
 			continue;
 		kq =3D kn->kn_kq;
 		KQ_LOCK(kq);
-		if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) {
+		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) {
 			KQ_UNLOCK(kq);
 			continue;
 		}
@@ -1174,7 +1174,7 @@ findkn:
 	 * but doing so will not reset any filter which has already been
 	 * triggered.
 	 */
-	kn->kn_status |=3D KN_INFLUX;
+	kn->kn_status |=3D KN_INFLUX | KN_SCAN;
 	KQ_UNLOCK(kq);
 	KN_LIST_LOCK(kn);
 	kn->kn_kevent.udata =3D kev->udata;
@@ -1197,7 +1197,7 @@ done_ev_add:
 	KQ_LOCK(kq);
 	if (event)
 		KNOTE_ACTIVATE(kn, 1);
-	kn->kn_status &=3D ~KN_INFLUX;
+	kn->kn_status &=3D ~(KN_INFLUX | KN_SCAN);
 	KN_LIST_UNLOCK(kn);
=20
 	if ((kev->flags & EV_DISABLE) &&
@@ -1506,7 +1506,7 @@ retry:
 			KQ_LOCK(kq);
 			kn =3D NULL;
 		} else {
-			kn->kn_status |=3D KN_INFLUX;
+			kn->kn_status |=3D KN_INFLUX | KN_SCAN;
 			KQ_UNLOCK(kq);
 			if ((kn->kn_status & KN_KQUEUE) =3D=3D KN_KQUEUE)
 				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
@@ -1515,7 +1515,8 @@ retry:
 				KQ_LOCK(kq);
 				KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal);
 				kn->kn_status &=3D
-				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX);
+				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX |
+				    KN_SCAN);
 				kq->kq_count--;
 				KN_LIST_UNLOCK(kn);
 				influx =3D 1;
@@ -1545,7 +1546,7 @@ retry:
 			} else
 				TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
 		=09
-			kn->kn_status &=3D ~(KN_INFLUX);
+			kn->kn_status &=3D ~(KN_INFLUX | KN_SCAN);
 			KN_LIST_UNLOCK(kn);
 			influx =3D 1;
 		}
@@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
 	 */
 	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
 		kq =3D kn->kn_kq;
-		if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) {
+		KQ_LOCK(kq);
+		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) {
+			/*
+			 * Do not process the influx notes, except for
+			 * the influx coming from the kq unlock in the
+			 * kqueue_scan().  In the later case, we do
+			 * not interfere with the scan, since the code
+			 * fragment in kqueue_scan() locks the knlist,
+			 * and cannot proceed until we finished.
+			 */
+			KQ_UNLOCK(kq);
+		} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
+			kn->kn_status |=3D KN_INFLUX;
+			KQ_UNLOCK(kq);
+			error =3D kn->kn_fop->f_event(kn, hint);
 			KQ_LOCK(kq);
-			if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) {
-				KQ_UNLOCK(kq);
-			} else if ((lockflags & KNF_NOKQLOCK) !=3D 0) {
-				kn->kn_status |=3D KN_INFLUX;
-				KQ_UNLOCK(kq);
-				error =3D kn->kn_fop->f_event(kn, hint);
-				KQ_LOCK(kq);
-				kn->kn_status &=3D ~KN_INFLUX;
-				if (error)
-					KNOTE_ACTIVATE(kn, 1);
-				KQ_UNLOCK_FLUX(kq);
-			} else {
-				kn->kn_status |=3D KN_HASKQLOCK;
-				if (kn->kn_fop->f_event(kn, hint))
-					KNOTE_ACTIVATE(kn, 1);
-				kn->kn_status &=3D ~KN_HASKQLOCK;
-				KQ_UNLOCK(kq);
-			}
+			kn->kn_status &=3D ~KN_INFLUX;
+			if (error)
+				KNOTE_ACTIVATE(kn, 1);
+			KQ_UNLOCK_FLUX(kq);
+		} else {
+			kn->kn_status |=3D KN_HASKQLOCK;
+			if (kn->kn_fop->f_event(kn, hint))
+				KNOTE_ACTIVATE(kn, 1);
+			kn->kn_status &=3D ~KN_HASKQLOCK;
+			KQ_UNLOCK(kq);
 		}
-		kq =3D NULL;
 	}
 	if ((lockflags & KNF_LISTLOCKED) =3D=3D 0)
 		list->kl_unlock(list->kl_lockarg);=20
diff --git a/sys/sys/event.h b/sys/sys/event.h
index bad8c9e..3b765c0 100644
--- a/sys/sys/event.h
+++ b/sys/sys/event.h
@@ -207,6 +207,7 @@ struct knote {
 #define KN_MARKER	0x20			/* ignore this knote */
 #define KN_KQUEUE	0x40			/* this knote belongs to a kq */
 #define KN_HASKQLOCK	0x80			/* for _inevent */
+#define	KN_SCAN		0x100			/* flux set in kqueue_scan() */
 	int			kn_sfflags;	/* saved filter flags */
 	intptr_t		kn_sdata;	/* saved data field */
 	union {

--H25ow1gEc5p8sQPX
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJTPExgAAoJEJDCuSvBvK1Bt2QP/2obx0PBss0sndCesNJfNfmC
CFU4Z/MepGSu8RcxIqPd5FJ0nLrr2ajglF88sBc+15bT/GnSJn40OqXNbADF8a8j
77vblQomryxhSMpcgQ0TX5hNZh2tFOwiPUV3n6sQ1e9frBtFvZIOvJnrebESWov/
baHe3Siku9Y7y4o5oNMsATjs27F/WbpnR2UlV/O5Rr1eh05jcwO3EeaRJegxPmIP
SCW55FE3iCxrWWnA/vkYBsRWkKxU2fI+BPImof6spzQPkKuf/uWeLX/0c8GRkWbH
Icxx+6mToR8l4reS9CLxwT7Zo+3fCsnqNmZmkkrsP2DVWCjdTrbmCL5BZf1pTyNI
V5/zBxpXi73/Lm/FDnlEsUGq9rl+YPAGmdtOxNS7NGgbmMc0mdIM+oYQjciY4uU8
xAPgdA+wNSBg8VIxqDOxRL+F3s1/gO11Byhkp35fJmpG8SvbdE2TQTcl/oUG+mcU
I9xlCvwtHnkKN+ks3n4fFb0Sr5Mz3xSVJbyFrXefUln0DzQNaJ0rmEbm/ouztbDd
T4MxoZI6L3Jh5EGSmDjcswFhUyDrrQPPspbxUC5IudQfW6AjXhNZqJW2IwcWO1RE
v1mK/ZdUVWZudCfX3i/8eaQfNM7puKCLbyiKqrGdn3wbBGywyEX8hL8//FHA+qy4
GE0JtHgSavwLH7SfYcDr
=ku4w
-----END PGP SIGNATURE-----

--H25ow1gEc5p8sQPX--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140402174400.GR21331>