From owner-freebsd-current@FreeBSD.ORG Thu Apr 3 09:27:04 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 41548C5D for ; Thu, 3 Apr 2014 09:27:04 +0000 (UTC) Received: from smtp.mei.co.jp (smtp.mei.co.jp [133.183.100.20]) by mx1.freebsd.org (Postfix) with ESMTP id E940D6D for ; Thu, 3 Apr 2014 09:27:03 +0000 (UTC) Received: from mail-gw.jp.panasonic.com ([157.8.1.157]) by smtp.mei.co.jp (8.12.11.20060614/3.7W/kc-maile12) with ESMTP id s339QuTJ020488; Thu, 3 Apr 2014 18:26:56 +0900 (JST) Received: from epochmail.jp.panasonic.com ([157.8.1.130]) by mail.jp.panasonic.com (8.11.6p2/3.7W/kc-maili11) with ESMTP id s339QuR17644; Thu, 3 Apr 2014 18:26:56 +0900 Received: by epochmail.jp.panasonic.com (8.12.11.20060308/3.7W/lomi17) id s339QuiA018866; Thu, 3 Apr 2014 18:26:56 +0900 Received: from localhost by lomi17.jp.panasonic.com (8.12.11.20060308/3.7W) with ESMTP id s339QuNs018849; Thu, 3 Apr 2014 18:26:56 +0900 Date: Thu, 03 Apr 2014 18:26:56 +0900 (JST) Message-Id: <20140403.182656.1696050559410663288.okuno.kohji@jp.panasonic.com> To: kostikbel@gmail.com Subject: Re: kevent has bug? From: Kohji Okuno In-Reply-To: <20140402174400.GR21331@kib.kiev.ua> References: <20140402120745.GN21331@kib.kiev.ua> <20140402164542.GC3270@funkthat.com> <20140402174400.GR21331@kib.kiev.ua> Organization: Panasonic Corporation X-Mailer: Mew version 6.5 on Emacs 24.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, okuno.kohji@jp.panasonic.com X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2014 09:27:04 -0000 From: Konstantin Belousov Date: Wed, 2 Apr 2014 20:44:00 +0300 > On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote: >> Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300: >> Well, it's not that its preventing waking up the waiter, but failing to >> register the event on the knote because of the _INFLUX flag... > Yes, I used the wrong terminology. > >> >> > Patch below fixed your test case for me, also tools/regression/kqueue did >> > not noticed a breakage. I tried to describe the situation in the >> > comment in knote(). Also, I removed unlocked check for the KN_INFLUX >> > in knote, since it seems to be an optimization for rare case, and is >> > the race on its own. >> >> Comments below... >> >> > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c >> > index b3fb23d..380f1ff 100644 >> > --- a/sys/kern/kern_event.c >> > +++ b/sys/kern/kern_event.c >> >> [...] >> >> > @@ -1506,7 +1506,7 @@ retry: >> > KQ_LOCK(kq); >> > kn = NULL; >> > } else { >> > - kn->kn_status |= KN_INFLUX; >> > + kn->kn_status |= KN_INFLUX | KN_SCAN; >> > KQ_UNLOCK(kq); >> > if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE) >> > KQ_GLOBAL_LOCK(&kq_global, haskqglobal); >> >> Is there a reason you don't add the KN_SCAN to the other cases in >> kqueue_scan that set the _INFLUX flag? > Other cases in kqueue_scan() which set influx do the detach and drop, > so I do not see a need to ensure that note is registered. Except I missed > one case, which you pointed out. > >> >> [...] >> >> > @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags) >> > */ >> > SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { >> > kq = kn->kn_kq; >> > - if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) { >> > + KQ_LOCK(kq); >> > + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { >> > + /* >> > + * Do not process the influx notes, except for >> > + * the influx coming from the kq unlock in the >> > + * kqueue_scan(). In the later case, we do >> > + * not interfere with the scan, since the code >> > + * fragment in kqueue_scan() locks the knlist, >> > + * and cannot proceed until we finished. >> > + */ >> >> We might want to add a marker node, and reprocess the list from the >> marker node, because this might introduce other races in the code too... >> but the problem with that is that knote is expected to keep the list >> locked throughout the call if called w/ it already locked, and so we >> can't do that, without major work... :( > Why ? If the knlist lock is not dropped, I do not see a need for the > marker. The patch does not introduce the sleep point for the KN_SCAN > knotes anyway. > >> >> I added a similar comment in knote_fork: >> * XXX - Why do we skip the kn if it is _INFLUX? Does this >> * mean we will not properly wake up some notes? >> >> and it looks like it was true... >> >> So, upon reading the other _INFLUX cases, it looks like we should change >> _SCAN to be, _CHANGING or something similar, and any place we don't end >> up dropping the knote, we set this flag also... Once such case is at >> the end of kqueue_register, just before the label done_ev_add, where we >> update the knote w/ new udata and other fields.. Or change the logic >> of the flag, and set it for all the cases we are about to drop the >> knote.. > So do you prefer KN_CHANGING instead of KN_SCAN ? I do not have any > objections against renaming the flag, but _CHANGING seems to not say > anything about the flag intent. I would say that KN_STABLE is more > useful, or KN_INFLUX_NODEL, or whatever. > > The done_ev_add case is indeed missed in my patch, thank you for noting. > The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the > race is possible just by the nature of adding the knote. > >> >> > + KQ_UNLOCK(kq); >> > + } else if ((lockflags & KNF_NOKQLOCK) != 0) { >> > + kn->kn_status |= KN_INFLUX; >> > + KQ_UNLOCK(kq); >> > + error = kn->kn_fop->f_event(kn, hint); >> > KQ_LOCK(kq); >> >> I believe we can drop this unlock/lock pair as it's safe to hold the >> KQ lock over f_event, we do that in knote_fork... > The knote_fork() is for the special kinds of knote only, where we indeed know > in advance that having the kqueue locked around f_event does not break things. > > Updated patch below. > > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index b3fb23d..fadb8fd 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_event.c > @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid) > continue; > kq = kn->kn_kq; > KQ_LOCK(kq); > - if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) { > + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > KQ_UNLOCK(kq); > continue; > } > @@ -1174,7 +1174,7 @@ findkn: > * but doing so will not reset any filter which has already been > * triggered. > */ > - kn->kn_status |= KN_INFLUX; > + kn->kn_status |= KN_INFLUX | KN_SCAN; > KQ_UNLOCK(kq); > KN_LIST_LOCK(kn); > kn->kn_kevent.udata = kev->udata; > @@ -1197,7 +1197,7 @@ done_ev_add: > KQ_LOCK(kq); > if (event) > KNOTE_ACTIVATE(kn, 1); > - kn->kn_status &= ~KN_INFLUX; > + kn->kn_status &= ~(KN_INFLUX | KN_SCAN); > KN_LIST_UNLOCK(kn); > > if ((kev->flags & EV_DISABLE) && > @@ -1506,7 +1506,7 @@ retry: > KQ_LOCK(kq); > kn = NULL; > } else { > - kn->kn_status |= KN_INFLUX; > + kn->kn_status |= KN_INFLUX | KN_SCAN; > KQ_UNLOCK(kq); > if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE) > KQ_GLOBAL_LOCK(&kq_global, haskqglobal); > @@ -1515,7 +1515,8 @@ retry: > KQ_LOCK(kq); > KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal); > kn->kn_status &= > - ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX); > + ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX | > + KN_SCAN); > kq->kq_count--; > KN_LIST_UNLOCK(kn); > influx = 1; > @@ -1545,7 +1546,7 @@ retry: > } else > TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); > > - kn->kn_status &= ~(KN_INFLUX); > + kn->kn_status &= ~(KN_INFLUX | KN_SCAN); > KN_LIST_UNLOCK(kn); > influx = 1; > } > @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags) > */ > SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { > kq = kn->kn_kq; > - if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) { > + KQ_LOCK(kq); > + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > + /* > + * Do not process the influx notes, except for > + * the influx coming from the kq unlock in the > + * kqueue_scan(). In the later case, we do > + * not interfere with the scan, since the code > + * fragment in kqueue_scan() locks the knlist, > + * and cannot proceed until we finished. > + */ > + KQ_UNLOCK(kq); > + } else if ((lockflags & KNF_NOKQLOCK) != 0) { > + kn->kn_status |= KN_INFLUX; > + KQ_UNLOCK(kq); > + error = kn->kn_fop->f_event(kn, hint); > KQ_LOCK(kq); > - if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) { > - KQ_UNLOCK(kq); > - } else if ((lockflags & KNF_NOKQLOCK) != 0) { > - kn->kn_status |= KN_INFLUX; > - KQ_UNLOCK(kq); > - error = kn->kn_fop->f_event(kn, hint); > - KQ_LOCK(kq); > - kn->kn_status &= ~KN_INFLUX; > - if (error) > - KNOTE_ACTIVATE(kn, 1); > - KQ_UNLOCK_FLUX(kq); > - } else { > - kn->kn_status |= KN_HASKQLOCK; > - if (kn->kn_fop->f_event(kn, hint)) > - KNOTE_ACTIVATE(kn, 1); > - kn->kn_status &= ~KN_HASKQLOCK; > - KQ_UNLOCK(kq); > - } > + kn->kn_status &= ~KN_INFLUX; > + if (error) > + KNOTE_ACTIVATE(kn, 1); > + KQ_UNLOCK_FLUX(kq); > + } else { > + kn->kn_status |= KN_HASKQLOCK; > + if (kn->kn_fop->f_event(kn, hint)) > + KNOTE_ACTIVATE(kn, 1); > + kn->kn_status &= ~KN_HASKQLOCK; > + KQ_UNLOCK(kq); > } > - kq = NULL; > } > if ((lockflags & KNF_LISTLOCKED) == 0) > list->kl_unlock(list->kl_lockarg); > diff --git a/sys/sys/event.h b/sys/sys/event.h > index bad8c9e..3b765c0 100644 > --- a/sys/sys/event.h > +++ b/sys/sys/event.h > @@ -207,6 +207,7 @@ struct knote { > #define KN_MARKER 0x20 /* ignore this knote */ > #define KN_KQUEUE 0x40 /* this knote belongs to a kq */ > #define KN_HASKQLOCK 0x80 /* for _inevent */ > +#define KN_SCAN 0x100 /* flux set in kqueue_scan() */ > int kn_sfflags; /* saved filter flags */ > intptr_t kn_sdata; /* saved data field */ > union { Hi, I think, we should add KN_SCAN after knote_attach() in kqueue_register(), too. What do you think about this? Best regards, Kohji Okuno