From owner-freebsd-current@FreeBSD.ORG Wed Sep 1 17:38:01 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 959A716A4D6 for ; Wed, 1 Sep 2004 17:38:01 +0000 (GMT) Received: from mail4.speakeasy.net (mail4.speakeasy.net [216.254.0.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6151843D2F for ; Wed, 1 Sep 2004 17:38:01 +0000 (GMT) (envelope-from jmg@hydrogen.funkthat.com) Received: (qmail 24757 invoked from network); 1 Sep 2004 17:38:01 -0000 Received: from gate.funkthat.com (HELO hydrogen.funkthat.com) ([69.17.45.168]) (envelope-sender ) by mail4.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 1 Sep 2004 17:38:00 -0000 Received: from hydrogen.funkthat.com (dilcle@localhost.funkthat.com [127.0.0.1])i81HbxuU081877; Wed, 1 Sep 2004 10:38:00 -0700 (PDT) (envelope-from jmg@hydrogen.funkthat.com) Received: (from jmg@localhost) by hydrogen.funkthat.com (8.12.10/8.12.10/Submit) id i81HbvnV081876; Wed, 1 Sep 2004 10:37:57 -0700 (PDT) Date: Wed, 1 Sep 2004 10:37:57 -0700 From: John-Mark Gurney To: Igor Sysoev Message-ID: <20040901173757.GF29902@funkthat.com> Mail-Followup-To: Igor Sysoev , freebsd-current@freebsd.org References: <20040901144705.K97970@is.park.rambler.ru> <20040901155304.GD29902@funkthat.com> <20040901200709.H97970@is.park.rambler.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="uxuisgdDHaNETlh8" Content-Disposition: inline In-Reply-To: <20040901200709.H97970@is.park.rambler.ru> User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.2-RELEASE i386 X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31 96 7A 22 B3 D8 56 36 F4 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html cc: freebsd-current@freebsd.org Subject: Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: John-Mark Gurney List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Sep 2004 17:38:01 -0000 --uxuisgdDHaNETlh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 20:17 +0400: > On Wed, 1 Sep 2004, John-Mark Gurney wrote: > > > Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 14:47 +0400: > > > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > > > at /usr/src/sys/kern/kern_event.c:1527 > > > 1527 knlist_remove_kq(knl, kn, islocked, 0); > > > (kgdb) p *knl > > > $1 = {kl_lock = 0x0, kl_list = {slh_first = 0x0}} > > > > > > > > > However, I do not know is it safe to test !SLIST_EMPTY(&p->p_klist) in > > > > It is possible to call SLIST_EMPTY, but you need to make you have proper > > locks held between the time you call SLIST_EMPTY, and knlist_remove... > > But I don't think that's the problem, the problem is else where... > > The problem is to test the empty p->p_klist before the calling > knlist_remove(). Nope.. The problem is that I was not testing the KN_DETACHED flag.. I haven't verified this fix for other processes, but I believe this should fix the problem... The KN_DETACHED flag implies if it is on the knlist of the object or not.. if it is not set, then the object is on the object, and needs to be removed... This fix should also mean that we can get rid of the test for KN_DETACHED in filt_procdetach... (which wasn't in filt_sigdetach).. Another "proper" fix to this panic, would be to add the if (kn->kn_status & KN_DETACHED) return; check to filt_sigdetach... There can be a situation where the list is not empty, but the knote is not on the list (already detached), and then we'd get another panic by falling off the end of the list... > > > filt_sigdetach() because in 5.3-BETA2 kqueue uses own mutex. Unfortunately, > > > I could not just now to write a small test case to allow everyone to > > > reproduce the panic but my user-level server always causes panic on exit on > > > unpatched 5.x and sometimes on unpatched 4.10. > > > > Could you print *kn? > > (kgdb) fr 14 > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > at /usr/src/sys/kern/kern_event.c:1527 > 1527 knlist_remove_kq(knl, kn, islocked, 0); > (kgdb) p *kn > $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, > kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc3c780ac}, > kn_kq = 0xc3c78080, kn_kevent = {ident = 64, filter = -6, flags = 32817, > fflags = 0, data = 1, udata = 0x0}, kn_status = 27, kn_sfflags = 0, 27 == 0x1b == 0b11011: KN_ACTIVE|KN_QUEUED|KN_DETACHED|KN_INFLUX [...] > > The problem is some how that the knote is being removed from the list > > (or was never on the list), but not being marked detached... > > > > Hmmm. what are the options you are using for rfork? > > The worker process starts two worker threads created by > rfork(RFPROC|RFTHREAD|RFMEM). Each thread opens kqueue and > adds the EVFILT_SIGNAL event. > > If you like I can send to you the source tarball (I do not distribute > the server right now, because it has not the documentation). The build > process is simple. Then you need to press ^C and you will get the panic. I believe this panic may be possible w/o rfork, but I'm not possitive.. It's probably an artifact of the fact that the kq was living longer than the proc that had the signal kevent associated with it, which normally does not happen... Attached is a patch.. And let me know if it fixes your panic... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." --uxuisgdDHaNETlh8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kq.detach.diff" Index: kern_event.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_event.c,v retrieving revision 1.79 diff -u -r1.79 kern_event.c --- kern_event.c 16 Aug 2004 03:08:38 -0000 1.79 +++ kern_event.c 1 Sep 2004 17:27:22 -0000 @@ -859,7 +859,8 @@ } else if (kev->flags & EV_DELETE) { kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); goto done; } @@ -1158,7 +1159,8 @@ * it _INFLUX. */ *kevp = kn->kn_kevent; - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); kn = NULL; @@ -1357,7 +1359,8 @@ ("KN_INFLUX set when not suppose to be")); kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); } @@ -1369,7 +1372,8 @@ ("KN_INFLUX set when not suppose to be")); kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); KQ_LOCK(kq); } @@ -1672,7 +1676,8 @@ } kn->kn_status |= KN_INFLUX; KQ_UNLOCK(kq); - kn->kn_fop->f_detach(kn); + if (!(kn->kn_status & KN_DETACHED)) + kn->kn_fop->f_detach(kn); knote_drop(kn, td); influx = 1; KQ_LOCK(kq); --uxuisgdDHaNETlh8--