Date: Tue, 4 Sep 2007 10:58:52 GMT From: "Ralf S. Engelschall" <rse@FreeBSD.org> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/116077: 6.2-STABLE panic during use of multi-cast networking client Message-ID: <200709041058.l84Awqsm035092@www.freebsd.org> Resent-Message-ID: <200709041100.l84B07pE070057@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 116077 >Category: kern >Synopsis: 6.2-STABLE panic during use of multi-cast networking client >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Sep 04 11:00:07 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Ralf S. Engelschall >Release: 6.2-STABLE >Organization: Engelschall >Environment: FreeBSD en1.engelschall.com 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Aug 30 11:31:47 CEST 2007 root@en1.engelschall.com:/var/obj/usr/src/sys/EN1 i386 >Description: The following kernel panic occurred twice yesterday evening under the latest 6.2-STABLE when I ran the (multi-cast using?) Elvin event notifcation consumer client ec(1) from libelvin 4.0.3 with "ec -e elvin://10.0.0.1 'Timeout > 0'" against an Avis server under 10.0.0.1. A second after stopping the ec(1) with CTRL-C the following panic occurs more or less reproducable: Panic ----- # kgdb /boot/kernel/kernel.debug vmcore.40 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x615b2094 fault code = supervisor read, page not present instruction pointer = 0x20:0xc05e3e6f stack pointer = 0x28:0xeb4ecaf4 frame pointer = 0x28:0xeb4ecafc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 58238 (ec) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace: kdb_backtrace(100,c6e1f780,28,eb4ecab4,c,...) at 0xc056c82d = kdb_backtrace+0x29 panic(c07089bd,c072fc26,0,fffff,c7b6f89b,...) at 0xc0552bf0 = panic+0x114 trap_fatal(eb4ecab4,615b2094,c6e1f780,c79a1cb8,c,...) at 0xc06e6232 = trap_fatal+0x2ce trap_pfault(eb4ecab4,0,615b2094) at 0xc06e5f3b = trap_pfault+0x1d7 trap(c6c00008,eb4e0028,28,c0786920,ce75c8a0,...) at 0xc06e5b51 = trap+0x2fd calltrap() at 0xc06d0b1a = calltrap+0x5 --- trap 0xc, eip = 0xc05e3e6f, esp = 0xeb4ecaf4, ebp = 0xeb4ecafc --- in_delmulti(ce75c8a0) at 0xc05e3e6f = in_delmulti+0xb ip_freemoptions(ca45b880,c7bc3870,0,c7c912c8,eb4ecb38,...) at 0xc05ed8e9 = ip_freemoptions+0x21 in_pcbdetach(c7bc3870) at 0xc05e5d6c = in_pcbdetach+0x15c udp_detach(c7c912c8) at 0xc05fee96 = udp_detach+0xb6 soclose(c7c912c8) at 0xc058e710 = soclose+0xa8 soo_close(c7009ca8,c6e1f780) at 0xc057db3f = soo_close+0x63 fdrop_locked(c7009ca8,c6e1f780,ca23fb00,eb4ecc00,c052e263,...) at 0xc052fd9c = fdrop_locked+0xd0 fdrop(c7009ca8,c6e1f780,87,c0564d7f,0,...) at 0xc052fcc5 = fdrop+0x41 closef(c7009ca8,c6e1f780) at 0xc052e263 = closef+0x42f fdfree(c6e1f780) at 0xc052d253 = fdfree+0x5a3 exit1(c6e1f780,0,eb4ecd30,c06e6577,c6e1f780,...) at 0xc0536b4b = exit1+0x49b exit1(c6e1f780,eb4ecd04) at 0xc05366b0 = exit1 syscall(3b,3b,3b,83fbfc0,83e2000,...) at 0xc06e6577 = syscall+0x2bf Xint0x80_syscall() at 0xc06d0b6f = Xint0x80_syscall+0x1f --- syscall (1, FreeBSD ELF32, sys_exit), eip = 0x283ef0af, esp = 0xbfbfe44c, ebp = 0xbfbfe468 --- Uptime: 2d11h34m40s Dumping 2045 MB (7 chunks) chunk 0: 1MB (142 pages) ... ok chunk 1: 2030MB (519581 pages) 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 16 chunk 2: 15MB (3708 pages) ... ok chunk 3: 1MB (159 pages) ... ok chunk 4: 1MB (122 pages) #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h Details ------- (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc055293e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc0552c95 in panic (fmt=0xc07089bd "%s") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc06e6232 in trap_fatal (frame=0xeb4ecab4, eva=1633362068) at /usr/src/sys/i386/i386/trap.c:838 #4 0xc06e5f3b in trap_pfault (frame=0xeb4ecab4, usermode=0, eva=1633362068) at /usr/src/sys/i386/i386/trap.c:745 #5 0xc06e5b51 in trap (frame= {tf_fs = -960495608, tf_es = -347209688, tf_ds = 40, tf_edi = -1065850592, tf_esi = -831141728, tf_ebp = -347157764, tf_isp = -347157792, t #6 0xc06d0b1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc05e3e6f in in_delmulti (inm=0xce75c8a0) at /usr/src/sys/netinet/in.c:1063 #8 0xc05ed8e9 in ip_freemoptions (imo=0xca45b880) at /usr/src/sys/netinet/ip_output.c:2064 #9 0xc05e5d6c in in_pcbdetach (inp=0xc7bc3870) at /usr/src/sys/netinet/in_pcb.c:714 #10 0xc05fee96 in udp_detach (so=0x1) at /usr/src/sys/netinet/udp_usrreq.c:1071 #11 0xc058e710 in soclose (so=0xc7c912c8) at /usr/src/sys/kern/uipc_socket.c:459 #12 0xc057db3f in soo_close (fp=0xc7009ca8, td=0xc6e1f780) at /usr/src/sys/kern/sys_socket.c:317 #13 0xc052fd9c in fdrop_locked (fp=0xc7009ca8, td=0xc6e1f780) at file.h:296 #14 0xc052fcc5 in fdrop (fp=0xc7009ca8, td=0xc6e1f780) at /usr/src/sys/kern/kern_descrip.c:2113 #15 0xc052e263 in closef (fp=0xc7009ca8, td=0xc6e1f780) at /usr/src/sys/kern/kern_descrip.c:1933 #16 0xc052d253 in fdfree (td=0xc6e1f780) at /usr/src/sys/kern/kern_descrip.c:1651 #17 0xc0536b4b in exit1 (td=0xc6e1f780, rv=0) at /usr/src/sys/kern/kern_exit.c:273 #18 0xc05366b0 in sys_exit (td=0xc6e1f780, uap=0x1) at /usr/src/sys/kern/kern_exit.c:99 #19 0xc06e6577 in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 138395584, tf_esi = 138289152, tf_ebp = -1077943192, tf_isp = -347157148, tf_ebx = 675280672, #20 0xc06d0b6f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #21 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) [...] (kgdb) up #7 0xc05e3e6f in in_delmulti (inm=0xce75c8a0) at /usr/src/sys/netinet/in.c:1063 1063 ifp = inm->inm_ifp; Current language: auto; currently c (kgdb) list 1058 in_delmulti(inm) 1059 register struct in_multi *inm; 1060 { 1061 struct ifnet *ifp; 1062 1063 ifp = inm->inm_ifp; 1064 IFF_LOCKGIANT(ifp); 1065 IN_MULTI_LOCK(); 1066 in_delmulti_locked(inm, 0); 1067 IN_MULTI_UNLOCK(); (kgdb) print inm $1 = (struct in_multi *) 0xce75c8a0 (kgdb) print *inm $2 = {inm_link = {le_next = 0x0, le_prev = 0x11}, inm_addr = {s_addr = 1684566899}, inm_ifp = 0x615b203a, inm_ifma = 0x70656363, inm_timer = 1566860660, inm_state = 450304, inm_rti = 0x6d7b6} (kgdb) print inm->inm_ifp $3 = (struct ifnet *) 0x615b203a (kgdb) print *inm->inm_ifp Cannot access memory at address 0x615b203a >How-To-Repeat: Seems like some multi-cast related cleanups are broken. When in_delmulti() is called, the "inm" is still valid but the "inm->inm_ifp" already points to an invalid location. I'm not familiar with this network code, so cannot easily determine the root of the problem. Perhaps someone of us who knows the network stack good enough can look at this issue and figure out the root cause (and fix it ;-). For reproducing this one have to build http://distfiles-msn.opendarwin.org/libelvin-4.0.3.tar.gz and run its commands ec(1) and ep(1) against an Avis (see http://avsis.sf.net/) server. This is a little bit nasty as one needs Java for Avis, etc. So, sorry, I've currently no simple reproducing procedure. But perhaps the bug can already be found with the above information. I've also a corresponding "vmcore" (2GB) and a "kernel.debug" (16MB) file available in case someone wants to digg deeper. Or tell me the kgdb commands I should run against those for digging up additional information. >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709041058.l84Awqsm035092>