From owner-freebsd-current@FreeBSD.ORG Wed Apr 2 12:07:58 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 48929BE0 for ; Wed, 2 Apr 2014 12:07:58 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C09D0833 for ; Wed, 2 Apr 2014 12:07:57 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.8/8.14.8) with ESMTP id s32C7kw9074582; Wed, 2 Apr 2014 15:07:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua s32C7kw9074582 Received: (from kostik@localhost) by tom.home (8.14.8/8.14.8/Submit) id s32C7j85074581; Wed, 2 Apr 2014 15:07:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 2 Apr 2014 15:07:45 +0300 From: Konstantin Belousov To: Kohji Okuno Subject: Re: kevent has bug? Message-ID: <20140402120745.GN21331@kib.kiev.ua> References: <20140402.114516.1300054841784626892.okuno.kohji@jp.panasonic.com> <20140402061551.GB3270@funkthat.com> <20140402.160616.1211219746022675269.okuno.kohji@jp.panasonic.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zvf7DYrKnARJlhgd" Content-Disposition: inline In-Reply-To: <20140402.160616.1211219746022675269.okuno.kohji@jp.panasonic.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: jmg@funkthat.com, freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2014 12:07:58 -0000 --zvf7DYrKnARJlhgd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote: > From: John-Mark Gurney > Date: Tue, 1 Apr 2014 23:15:51 -0700 > > Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: > >> I think, kevent() has a bug. > >> I tested sample programs by attached sources. > >> This sample tests about EVFILT_SIGNAL. > >>=20 > >> I build sample programs by the following commands. > >> % gcc -O2 -o child child.c > >> % gcc -O2 -o parent parent.c > >>=20 > >> The expected result is the following. > >> % ./parent > >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20=20 > >> OK > >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20=20 > >> OK > >>=20 > >> But, sometimes the result was the following. > >> % ./parent > >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19=20 > >>=20 > >> This result means the number of times the signal has occured was > >> incorrect. > >=20 > > I was able to reproduce this... > >=20 > >> In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the > >> number of times the signal has occurred since the last call to > >> kevent(). This `data' is recorded by filt_signal() (This is f_event in > >> struct filterops). > >>=20 > >> The system call kevent()'s events are processed by kqueue_scan() in > >> kern_event.c. In kqueue_scan(), kn->kn_fop->f_event() is allways > >> called after KN_INFLUX is set to kn->kn_status. > >>=20 > >> On the other hand, kernel events are occured by knote() in > >> kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() > >> in kern_sig.c.) In knote(), kn->kn_fop->f_event() is called only when > >> KN_INFLUX is not set in kn->kn_status. > >>=20 > >> In race condition between kqueue_scan() and knote(), > >> kn->kn_fop->f_event() from knote() may not be called, I think. > >=20 > > Considering that both are called w/ a lock, that cannot happen.. > > KN_LIST_LOCK(kn) locks the same lock that is asserted that is held > > by knote... > >=20 > >> In knote(), because the context holds knlist's lock, the context can > >> not sleep. So, KN_INFLUX should not be set on calling > >> kn->kn_fop->f_event() in kqueue_scan(), I think. > >=20 > > No, it needs to be set: > > * Setting the KN_INFLUX flag enables you to unlock the kq that this kn= ote > > * is on, and modify kn_status as if you had the KQ lock. > >=20 > > As this comment says, _INFLUX allows you to unlock the KQ w/o fear that > > the knote will disappear out from under you causing you to dereference > > possibly free'd memory.. > >=20 > > If you just tried to lock the list lock w/o unlocking the KQ lock, you > > could end up w/ a dead lock, as you aren't maintaining lock order > > properly.. The correct lock order if knlist -> kq... > >=20 > >> What do you think about this issue? > >=20 > > This is a real issue, but not due to the race you described > > above... >=20 > I beleave it's the result of the race. >=20 > Could you try to add printf() in knote()? > Please refer to attached patch. >=20 >=20 > > I have verified on my machine that it isn't because there is a knote > > waiting that isn't getting woken up, and the knote on my hung process > > has data =3D=3D 0, so it definately lost one of the signals: > > (kgdb) print $14.kq_knhash[20].slh_first[0] > > $20 =3D {kn_link =3D {sle_next =3D 0x0}, kn_selnext =3D {sle_next =3D 0= x0}, > > kn_knlist =3D 0xfffff8005a9c5840, kn_tqe =3D {tqe_next =3D 0xfffff801= fdab4500, > > tqe_prev =3D 0xfffff8004bb10038}, kn_kq =3D 0xfffff8004bb10000, kn_= kevent =3D { > > ident =3D 20, filter =3D -6, flags =3D 32, fflags =3D 0, data =3D 0= , udata =3D 0x0}, > > kn_status =3D 0, kn_sfflags =3D 0, kn_sdata =3D 0, kn_ptr =3D { > > p_fp =3D 0xfffff8005a9c54b8, p_proc =3D 0xfffff8005a9c54b8, > > p_aio =3D 0xfffff8005a9c54b8, p_lio =3D 0xfffff8005a9c54b8, > > p_v =3D 0xfffff8005a9c54b8}, kn_fop =3D 0xffffffff81405ef0, kn_hook= =3D 0x0, > > kn_hookid =3D 0} > >=20 > > If you want to find this yourself, you can run kgdb on a live system, > > switch to the thread of the parent (info threads, thread XXX), and > > do: > > frame 7 > >=20 > > (or a frame that has td, which is struct thread *), then: > > print *(struct kqueue *)td->td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0]= =2Ef_data > >=20 > > This will give you the struct kqueue * of the parent, and then: > > print $XX.kq_knhash[0]@63 > >=20 > > to figure out where the knote is in the hash, and then you can print > > it out yourself... > >=20 > > I'm going to take a look at this a bit more later... I'm thinking of > > using dtrace to collect the stacks where filt_signal is called, and > > match them up... dtrace might even be able to get us the note's data > > upon return helping to make sure things got tracked properly... > >=20 > > Thanks for finding this bug! Hopefully we can find a solution to it.. > >=20 > > --=20 > > John-Mark Gurney Voice: +1 415 225 5579 > >=20 > > "All that I will do, has been done, All that I have, has not." > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index b3fb23d..7791447 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_event.c > @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags) > if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) { > KQ_LOCK(kq); > if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) { > + if (hint & NOTE_SIGNAL) > + printf("Aee2\n"); > KQ_UNLOCK(kq); > } else if ((lockflags & KNF_NOKQLOCK) !=3D 0) { > kn->kn_status |=3D KN_INFLUX; > @@ -1886,6 +1888,10 @@ knote(struct knlist *list, long hint, int lockflag= s) > KQ_UNLOCK(kq); > } > } > + else { > + if (hint & NOTE_SIGNAL) > + printf("Aee1\n"); > + } > kq =3D NULL; > } > if ((lockflags & KNF_LISTLOCKED) =3D=3D 0) I think you are right, there is a race between setting influx before the knote lock is owned in the kqueue_scan(), and knote() itself. The issue is that the influx from scan must not prevent knote() from waking up the waiters. Patch below fixed your test case for me, also tools/regression/kqueue did not noticed a breakage. I tried to describe the situation in the comment in knote(). Also, I removed unlocked check for the KN_INFLUX in knote, since it seems to be an optimization for rare case, and is the race on its own. diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index b3fb23d..380f1ff 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid) continue; kq =3D kn->kn_kq; KQ_LOCK(kq); - if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) { + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) { KQ_UNLOCK(kq); continue; } @@ -1506,7 +1506,7 @@ retry: KQ_LOCK(kq); kn =3D NULL; } else { - kn->kn_status |=3D KN_INFLUX; + kn->kn_status |=3D KN_INFLUX | KN_SCAN; KQ_UNLOCK(kq); if ((kn->kn_status & KN_KQUEUE) =3D=3D KN_KQUEUE) KQ_GLOBAL_LOCK(&kq_global, haskqglobal); @@ -1515,7 +1515,8 @@ retry: KQ_LOCK(kq); KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal); kn->kn_status &=3D - ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX); + ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX | + KN_SCAN); kq->kq_count--; KN_LIST_UNLOCK(kn); influx =3D 1; @@ -1545,7 +1546,7 @@ retry: } else TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); =09 - kn->kn_status &=3D ~(KN_INFLUX); + kn->kn_status &=3D ~(KN_INFLUX | KN_SCAN); KN_LIST_UNLOCK(kn); influx =3D 1; } @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags) */ SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { kq =3D kn->kn_kq; - if ((kn->kn_status & KN_INFLUX) !=3D KN_INFLUX) { + KQ_LOCK(kq); + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) =3D=3D KN_INFLUX) { + /* + * Do not process the influx notes, except for + * the influx coming from the kq unlock in the + * kqueue_scan(). In the later case, we do + * not interfere with the scan, since the code + * fragment in kqueue_scan() locks the knlist, + * and cannot proceed until we finished. + */ + KQ_UNLOCK(kq); + } else if ((lockflags & KNF_NOKQLOCK) !=3D 0) { + kn->kn_status |=3D KN_INFLUX; + KQ_UNLOCK(kq); + error =3D kn->kn_fop->f_event(kn, hint); KQ_LOCK(kq); - if ((kn->kn_status & KN_INFLUX) =3D=3D KN_INFLUX) { - KQ_UNLOCK(kq); - } else if ((lockflags & KNF_NOKQLOCK) !=3D 0) { - kn->kn_status |=3D KN_INFLUX; - KQ_UNLOCK(kq); - error =3D kn->kn_fop->f_event(kn, hint); - KQ_LOCK(kq); - kn->kn_status &=3D ~KN_INFLUX; - if (error) - KNOTE_ACTIVATE(kn, 1); - KQ_UNLOCK_FLUX(kq); - } else { - kn->kn_status |=3D KN_HASKQLOCK; - if (kn->kn_fop->f_event(kn, hint)) - KNOTE_ACTIVATE(kn, 1); - kn->kn_status &=3D ~KN_HASKQLOCK; - KQ_UNLOCK(kq); - } + kn->kn_status &=3D ~KN_INFLUX; + if (error) + KNOTE_ACTIVATE(kn, 1); + KQ_UNLOCK_FLUX(kq); + } else { + kn->kn_status |=3D KN_HASKQLOCK; + if (kn->kn_fop->f_event(kn, hint)) + KNOTE_ACTIVATE(kn, 1); + kn->kn_status &=3D ~KN_HASKQLOCK; + KQ_UNLOCK(kq); } - kq =3D NULL; } if ((lockflags & KNF_LISTLOCKED) =3D=3D 0) list->kl_unlock(list->kl_lockarg);=20 diff --git a/sys/sys/event.h b/sys/sys/event.h index bad8c9e..3b765c0 100644 --- a/sys/sys/event.h +++ b/sys/sys/event.h @@ -207,6 +207,7 @@ struct knote { #define KN_MARKER 0x20 /* ignore this knote */ #define KN_KQUEUE 0x40 /* this knote belongs to a kq */ #define KN_HASKQLOCK 0x80 /* for _inevent */ +#define KN_SCAN 0x100 /* flux set in kqueue_scan() */ int kn_sfflags; /* saved filter flags */ intptr_t kn_sdata; /* saved data field */ union { --zvf7DYrKnARJlhgd Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJTO/2QAAoJEJDCuSvBvK1BrN8P+QEPTxtboOSVhO3kxVxV2ugF HjQzu/n10ap8sOJK5SBPrzcXva4Q0S1nnLJcO6PSuYSYrwbLoh52Rdo5Q4B64YTo 914MpXrjULiy2Q9KBWx5NitJr19DDO3GZKosPuMQqVtZg3FduQ404uTzIeqcC4eL LhJdL9uEW0tigxQBgkZbJr8RXRtBBpwpI84OPI36bNC8LgPBc/3M0oUMwCjs3Y2K t3vmGfw231enyme350cttp6T4E11EEfeGqkfHFw6OrnLLHQOefXG70ZNz9JmkcOJ djzsAUoHcKppqN84JK62271NmgJ0No0aLl2smx8qd5FsyVWc3Fv69+vau6sj1vJ+ UW8QMpII+/wDwRT+JBFk7/ZqnnmDGkfD6qXcjjYPr+BTA8Q+Oz8vMo6nOIIQm9ik IWEDsuWXnFubXkfJm7F9Q5aQVdLODAgFMIr9vgAwwenT89Iv0LRHSuiRCO8n4vMA Zz2lQ7PetbWLUqkBOXlWFGLywKAGC5q7c5A8rSHwwt2galS39Ihb3diNfFEdk8An S6zeCDfxNeiq2m2nS0Zw9m0UwPSomjIOY4O4biVS5FR9jyUfoH0JuOUJTAJkqLxX PlAKD4gkE1r2w9IdZ5R1j9Xve8btYmh1y8W/wqK8lAEYhgVLODFOeC7HS7eZZWOR tu+dGo+zuG8sAbsHq+ki =SmD6 -----END PGP SIGNATURE----- --zvf7DYrKnARJlhgd--