Date: Wed, 1 Sep 2004 10:37:57 -0700 From: John-Mark Gurney <gurney_j@resnet.uoregon.edu> To: Igor Sysoev <is@rambler-co.ru> Cc: freebsd-current@freebsd.org Subject: Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread Message-ID: <20040901173757.GF29902@funkthat.com> In-Reply-To: <20040901200709.H97970@is.park.rambler.ru> References: <20040901144705.K97970@is.park.rambler.ru> <20040901155304.GD29902@funkthat.com> <20040901200709.H97970@is.park.rambler.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
--uxuisgdDHaNETlh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 20:17 +0400: > On Wed, 1 Sep 2004, John-Mark Gurney wrote: > > > Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 14:47 +0400: > > > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > > > at /usr/src/sys/kern/kern_event.c:1527 > > > 1527 knlist_remove_kq(knl, kn, islocked, 0); > > > (kgdb) p *knl > > > $1 = {kl_lock = 0x0, kl_list = {slh_first = 0x0}} > > > > > > > > > However, I do not know is it safe to test !SLIST_EMPTY(&p->p_klist) in > > > > It is possible to call SLIST_EMPTY, but you need to make you have proper > > locks held between the time you call SLIST_EMPTY, and knlist_remove... > > But I don't think that's the problem, the problem is else where... > > The problem is to test the empty p->p_klist before the calling > knlist_remove(). Nope.. The problem is that I was not testing the KN_DETACHED flag.. I haven't verified this fix for other processes, but I believe this should fix the problem... The KN_DETACHED flag implies if it is on the knlist of the object or not.. if it is not set, then the object is on the object, and needs to be removed... This fix should also mean that we can get rid of the test for KN_DETACHED in filt_procdetach... (which wasn't in filt_sigdetach).. Another "proper" fix to this panic, would be to add the if (kn->kn_status & KN_DETACHED) return; check to filt_sigdetach... There can be a situation where the list is not empty, but the knote is not on the list (already detached), and then we'd get another panic by falling off the end of the list... > > > filt_sigdetach() because in 5.3-BETA2 kqueue uses own mutex. Unfortunately, > > > I could not just now to write a small test case to allow everyone to > > > reproduce the panic but my user-level server always causes panic on exit on > > > unpatched 5.x and sometimes on unpatched 4.10. > > > > Could you print *kn? > > (kgdb) fr 14 > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > at /usr/src/sys/kern/kern_event.c:1527 > 1527 knlist_remove_kq(knl, kn, islocked, 0); > (kgdb) p *kn > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc3c780ac}, > kn_kq = 0xc3c78080, kn_kevent = {ident = 64, filter = -6, flags = 32817, > fflags = 0, data = 1, udata = 0x0}, kn_status = 27, kn_sfflags = 0, 27 == 0x1b == 0b11011: KN_ACTIVE|KN_QUEUED|KN_DETACHED|KN_INFLUX [...] > > The problem is some how that the knote is being removed from the list > > (or was never on the list), but not being marked detached... > > > > Hmmm. what are the options you are using for rfork? > > The worker process starts two worker threads created by > rfork(RFPROC|RFTHREAD|RFMEM). Each thread opens kqueue and > adds the EVFILT_SIGNAL event. > > If you like I can send to you the source tarball (I do not distribute > the server right now, because it has not the documentation). The build > process is simple. Then you need to press ^C and you will get the panic. I believe this panic may be possible w/o rfork, but I'm not possitive.. It's probably an artifact of the fact that the kq was living longer than the proc that had the signal kevent associated with it, which normally does not happen... Attached is a patch.. And let me know if it fixes your panic... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." --uxuisgdDHaNETlh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kq.detach.diff" Index: kern_event.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_event.c,v retrieving revision 1.79 diff -u -r1.79 kern_event.c --- kern_event.c 16 Aug 2004 03:08:38 -0000 1.79 +++ kern_event.c 1 Sep 2004 17:27:22 -0000 @@ -859,7 +859,8 @@ } else if (kev->flags & EV_DELETE) { kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); goto done; } @@ -1158,7 +1159,8 @@ * it _INFLUX. */ *kevp = kn->kn_kevent; - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); kn = NULL; @@ -1357,7 +1359,8 @@ ("KN_INFLUX set when not suppose to be")); kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); } @@ -1369,7 +1372,8 @@ ("KN_INFLUX set when not suppose to be")); kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); } @@ -1672,7 +1676,8 @@ } kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); influx = 1; KQ_LOCK(kq); --uxuisgdDHaNETlh8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040901173757.GF29902>