Date: Wed, 1 Sep 2004 20:17:20 +0400 (MSD) From: Igor Sysoev <is@rambler-co.ru> To: John-Mark Gurney <gurney_j@resnet.uoregon.edu> Cc: freebsd-current@freebsd.org Subject: Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread Message-ID: <20040901200709.H97970@is.park.rambler.ru> In-Reply-To: <20040901155304.GD29902@funkthat.com> References: <20040901144705.K97970@is.park.rambler.ru> <20040901155304.GD29902@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 1 Sep 2004, John-Mark Gurney wrote: > Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 14:47 +0400: > > 5.3-BETA2 still may panic as described in > > http://freebsd.rambler.ru/bsdmail/freebsd-hackers_2004/msg02732.html > > Well, between then an now, I have committed kqueue locking, though > I can't say they are similar since you completely dropped the panic > message from your email... It's the same panic caused by the empty p->p_klist. As to the panic message I do not know where to get it in the kgdb. > > #11 0xc077631a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 > > #12 0xc0620018 in removechild (parent=0x0, child=0x5de) > > at /usr/src/sys/kern/subr_witness.c:1443 > > #13 0xc05e86ab in knlist_remove_kq (knl=0xc39724f4, kn=0x0, > > knlislocked=-1065428340, kqislocked=0) > > at /usr/src/sys/kern/kern_event.c:1502 > > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > > at /usr/src/sys/kern/kern_event.c:1527 > > #15 0xc060451b in filt_sigdetach (kn=0x0) at /usr/src/sys/kern/kern_sig.c:2733 > > I love how bt's can not find the value of some arguments.. :( > > > #16 0xc05e826a in kqueue_close (fp=0xc394ebb0, td=0xc3a22420) > > at /usr/src/sys/kern/kern_event.c:1372 > > #17 0xc05e5524 in fdrop_locked (fp=0xc394ebb0, td=0xc3a22420) at file.h:289 > > #18 0xc05e47b8 in fdrop (fp=0xc394ebb0, td=0xc3a22420) > > at /usr/src/sys/kern/kern_descrip.c:1897 > > #19 0xc05e478b in closef (fp=0xc394ebb0, td=0xc3a22420) > > at /usr/src/sys/kern/kern_descrip.c:1883 > > #20 0xc05e40e7 in fdfree (td=0xc3a22420) > > at /usr/src/sys/kern/kern_descrip.c:1610 > > #21 0xc05ea896 in exit1 (td=0xc3a22420, rv=0) > > at /usr/src/sys/kern/kern_exit.c:242 > > #22 0xc05ea494 in sys_exit (td=0xc3a22420, uap=0x0) > > at /usr/src/sys/kern/kern_exit.c:94 > > #23 0xc07881cf in syscall (frame= > > {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 2, tf_esi = 134873108, tf_ebp = -1077941784, tf_isp = -355476108, tf_ebx = 672658924, tf_edx = 10, tf_ecx = 672658608, tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = 672162923, tf_cs = 31, tf_eflags = 662, tf_esp = -1077941812, tf_ss = 47}) > > at /usr/src/sys/i386/i386/trap.c:1004 > > #24 0xc077636f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201 > > > > [ ... ] > > > > (kgdb) fr 15 > > #15 0xc060451b in filt_sigdetach (kn=0x0) at /usr/src/sys/kern/kern_sig.c:2733 > > 2733 knlist_remove(&p->p_klist, kn, 0); > > (kgdb) down > > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) > > at /usr/src/sys/kern/kern_event.c:1527 > > 1527 knlist_remove_kq(knl, kn, islocked, 0); > > (kgdb) p *knl > > $1 = {kl_lock = 0x0, kl_list = {slh_first = 0x0}} > > > > > > However, I do not know is it safe to test !SLIST_EMPTY(&p->p_klist) in > > It is possible to call SLIST_EMPTY, but you need to make you have proper > locks held between the time you call SLIST_EMPTY, and knlist_remove... > But I don't think that's the problem, the problem is else where... The problem is to test the empty p->p_klist before the calling knlist_remove(). > > filt_sigdetach() because in 5.3-BETA2 kqueue uses own mutex. Unfortunately, > > I could not just now to write a small test case to allow everyone to > > reproduce the panic but my user-level server always causes panic on exit on > > unpatched 5.x and sometimes on unpatched 4.10. > > Could you print *kn? (kgdb) fr 14 #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0) at /usr/src/sys/kern/kern_event.c:1527 1527 knlist_remove_kq(knl, kn, islocked, 0); (kgdb) p *kn $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc3c780ac}, kn_kq = 0xc3c78080, kn_kevent = {ident = 64, filter = -6, flags = 32817, fflags = 0, data = 1, udata = 0x0}, kn_status = 27, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {p_fp = 0xc3972380, p_proc = 0xc3972380}, kn_fop = 0xc084d320, kn_hook = 0x0} > also in frame 16: > print kq (kgdb) fr 16 #16 0xc05e826a in kqueue_close (fp=0xc394ebb0, td=0xc3a22420) at /usr/src/sys/kern/kern_event.c:1372 1372 kn->kn_fop->f_detach(kn); (kgdb) p kq $2 = (struct kqueue *) 0xc3c78080 > print *kq (kgdb) p *kq $3 = {kq_lock = {mtx_object = {lo_class = 0xc084c4dc, lo_name = 0xc07eda71 "kqueue", lo_type = 0xc07eda71 "kqueue", lo_flags = 4390912, lo_list = {tqe_next = 0xc3c78000, tqe_prev = 0xc3b46c38}, lo_witness = 0xc08bd1c0}, mtx_lock = 4, mtx_recurse = 0}, kq_refcnt = 1, kq_list = {sle_next = 0x0}, kq_head = { tqh_first = 0xc3e1d154, tqh_last = 0xc3e1d160}, kq_count = 1, kq_sel = { si_thrlist = {tqe_next = 0x0, tqe_prev = 0x0}, si_thread = 0x0, si_note = { kl_lock = 0xc3c78080, kl_list = {slh_first = 0x0}}, si_flags = 0}, kq_sigio = 0x0, kq_fdp = 0xc3b46c00, kq_state = 16, kq_knlistsize = 0, kq_knlist = 0x0, kq_knhashmask = 63, kq_knhash = 0xc3c6ac00, kq_task = { ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xc05e7908 <kqueue_task>, ta_context = 0xc3c78080}} > The problem is some how that the knote is being removed from the list > (or was never on the list), but not being marked detached... > > Hmmm. what are the options you are using for rfork? The worker process starts two worker threads created by rfork(RFPROC|RFTHREAD|RFMEM). Each thread opens kqueue and adds the EVFILT_SIGNAL event. If you like I can send to you the source tarball (I do not distribute the server right now, because it has not the documentation). The build process is simple. Then you need to press ^C and you will get the panic. Igor Sysoev http://sysoev.ru/en/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040901200709.H97970>