From owner-freebsd-net@freebsd.org Fri Jun 16 11:39:33 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB0E3BFE26A for ; Fri, 16 Jun 2017 11:39:33 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F7727BA70 for ; Fri, 16 Jun 2017 11:39:32 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v5GBdKM8010695 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 16 Jun 2017 13:39:21 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: Received: from eg.sd.rdtc.ru (eugen@localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTP id v5GBdG3C088414 for ; Fri, 16 Jun 2017 18:39:16 +0700 (+07) (envelope-from eugen@grosbein.net) To: freebsd-net@FreeBSD.org From: Eugene Grosbein Subject: stable/11: repeatable panic due to a race X-Enigmail-Draft-Status: N1110 Message-ID: <5943C364.9030001@grosbein.net> Date: Fri, 16 Jun 2017 18:39:16 +0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=3.1 required=5.0 tests=BAYES_00, DATE_IN_FUTURE_96_Q, LOCAL_FROM autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * 2.8 DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-Spam-Level: *** X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jun 2017 11:39:33 -0000 Hi! I observe repeatable panics at netgraph level while doing stress test for net/mpd5 daemon under stable/11 r317184. It connects, uses and disconnects lots of ngXX interfaces and corresponding netgraph nodes and hooks. I have good crashdump for INVARIANTS-enabled custom kernel built with debugging symbols and COPFTFLAGS=-O0. It seems that netgraph ng_iface's node hook may get disconnected and removed in parallel with ng_iface_send(). I see no locking to prevent this in the source. Here is kgdb script. Note that netgraph node's "hook" points to memory area filled with "0xdeadc0de" pattern that means it has already been freed by UMA. The machine has two physical CPU cores. Command: kgdb kernel.debug /var/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8097f249 stack pointer = 0x28:0xfffffe0239542ec0 frame pointer = 0x28:0xfffffe0239542f00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 28999 (ftpd) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0239542840 kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0239542910 vpanic() at vpanic+0x249/frame 0xfffffe02395429e0 kproc_shutdown() at kproc_shutdown/frame 0xfffffe0239542a40 trap_fatal() at trap_fatal+0x60a/frame 0xfffffe0239542b70 trap() at trap+0x97c/frame 0xfffffe0239542dd0 trap_check() at trap_check+0x15/frame 0xfffffe0239542df0 calltrap() at calltrap+0x8/frame 0xfffffe0239542df0 --- trap 0x9, rip = 0xffffffff8097f249, rsp = 0xfffffe0239542ec0, rbp = 0xfffffe0239542f00 --- ng_address_hook() at ng_address_hook+0x59/frame 0xfffffe0239542f00 ng_iface_send() at ng_iface_send+0x108/frame 0xfffffe0239542f90 ng_iface_output() at ng_iface_output+0x447/frame 0xfffffe0239543060 ip_output() at ip_output+0x1864/frame 0xfffffe0239543300 tcp_output() at tcp_output+0x2602/frame 0xfffffe02395436a0 tcp_disconnect() at tcp_disconnect+0x18e/frame 0xfffffe02395436e0 tcp_usr_disconnect() at tcp_usr_disconnect+0xe6/frame 0xfffffe0239543710 sodisconnect() at sodisconnect+0x62/frame 0xfffffe0239543740 soclose() at soclose+0x95/frame 0xfffffe02395437b0 soo_close() at soo_close+0x4d/frame 0xfffffe02395437e0 fo_close() at fo_close+0x31/frame 0xfffffe0239543810 _fdrop() at _fdrop+0x46/frame 0xfffffe0239543840 closef() at closef+0x2d7/frame 0xfffffe02395438f0 closefp() at closefp+0xde/frame 0xfffffe0239543940 kern_close() at kern_close+0xe7/frame 0xfffffe0239543990 sys_close() at sys_close+0x1f/frame 0xfffffe02395439b0 syscallenter() at syscallenter+0x4ff/frame 0xfffffe0239543a80 amd64_syscall() at amd64_syscall+0x2a/frame 0xfffffe0239543bb0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0239543bb0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x801a4033a, rsp = 0x7fffffffd0a8, rbp = 0x7fffffffd0d0 --- Uptime: 2h11m5s Dumping 544 out of 8156 MB:..3%..12%..21%..33%..42%..53%..62%..71%..83%..92% Reading symbols from /boot/modules/geom_mirror.ko...done. Loaded symbols for /boot/modules/geom_mirror.ko Reading symbols from /boot/modules/accf_http.ko...done. Loaded symbols for /boot/modules/accf_http.ko Reading symbols from /boot/modules/nvidia.ko...done. Loaded symbols for /boot/modules/nvidia.ko Reading symbols from /boot/modules/vboxdrv.ko...done. Loaded symbols for /boot/modules/vboxdrv.ko Reading symbols from /boot/modules/mmc.ko...done. Loaded symbols for /boot/modules/mmc.ko Reading symbols from /boot/modules/mmcsd.ko...done. Loaded symbols for /boot/modules/mmcsd.ko Reading symbols from /boot/modules/sdhci.ko...done. Loaded symbols for /boot/modules/sdhci.ko Reading symbols from /boot/modules/h_ertt.ko...done. Loaded symbols for /boot/modules/h_ertt.ko Reading symbols from /boot/modules/cc_chd.ko...done. Loaded symbols for /boot/modules/cc_chd.ko Reading symbols from /boot/modules/geom_sched.ko...done. Loaded symbols for /boot/modules/geom_sched.ko Reading symbols from /boot/modules/gsched_rr.ko...done. Loaded symbols for /boot/modules/gsched_rr.ko Reading symbols from /boot/modules/vboxnetflt.ko...done. Loaded symbols for /boot/modules/vboxnetflt.ko Reading symbols from /boot/modules/vboxnetadp.ko...done. Loaded symbols for /boot/modules/vboxnetadp.ko Reading symbols from /boot/modules/nullfs.ko...done. Loaded symbols for /boot/modules/nullfs.ko Reading symbols from /usr/local/modules/rtc.ko...done. Loaded symbols for /usr/local/modules/rtc.ko #0 doadump (textdump=1) at /data2/src/sys/kern/kern_shutdown.c:298 298 dumptid = curthread->td_tid; (kgdb) bt #0 doadump (textdump=1) at /data2/src/sys/kern/kern_shutdown.c:298 #1 0xffffffff807a0828 in kern_reboot (howto=260) at /data2/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff807a125f in vpanic (fmt=0xffffffff80cf5311 "%s", ap=0xfffffe0239542a20) at /data2/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff807a12d0 in panic (fmt=0xffffffff80cf5311 "%s") at /data2/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80c06a0a in trap_fatal (frame=0xfffffe0239542e00, eva=0) at /data2/src/sys/amd64/amd64/trap.c:801 #5 0xffffffff80c0604c in trap (frame=0xfffffe0239542e00) at /data2/src/sys/amd64/amd64/trap.c:549 #6 0xffffffff80c07085 in trap_check (frame=0xfffffe0239542e00) at /data2/src/sys/amd64/amd64/trap.c:602 #7 0xffffffff80bdeba3 in calltrap () at /data2/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff8097f249 in ng_address_hook (here=0x0, item=0xfffff8011ddd1f00, hook=0xfffff801f8232300, retaddr=0) at /data2/src/sys/netgraph/ng_base.c:3586 #9 0xffffffff80986548 in ng_iface_send (ifp=0xfffff801f8ef2800, m=0xfffff801f8526100, sa=2 '\002') at /data2/src/sys/netgraph/ng_iface.c:451 #10 0xffffffff80985c97 in ng_iface_output (ifp=0xfffff801f8ef2800, m=0xfffff801f8526100, dst=0xfffff8011db98720, ro=0xfffff8011db98700) at /data2/src/sys/netgraph/ng_iface.c:386 #11 0xffffffff809cae14 in ip_output (m=0xfffff801f8526100, opt=0x0, ro=0xfffff8011db98700, flags=0, imo=0x0, inp=0xfffff8011db98570) at /data2/src/sys/netinet/ip_output.c:655 #12 0xffffffff809e08d2 in tcp_output (tp=0xfffff801f8258410) at /data2/src/sys/netinet/tcp_output.c:1446 #13 0xffffffff809f5f4e in tcp_disconnect (tp=0xfffff801f8258410) at /data2/src/sys/netinet/tcp_usrreq.c:1946 #14 0xffffffff809f29a6 in tcp_usr_disconnect (so=0xfffff8011d8de360) at /data2/src/sys/netinet/tcp_usrreq.c:674 #15 0xffffffff80884072 in sodisconnect (so=0xfffff8011d8de360) at /data2/src/sys/kern/uipc_socket.c:1051 #16 0xffffffff808839b5 in soclose (so=0xfffff8011d8de360) at /data2/src/sys/kern/uipc_socket.c:869 #17 0xffffffff8084d67d in soo_close (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000) at /data2/src/sys/kern/sys_socket.c:334 #18 0xffffffff8072bee1 in fo_close (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000) at file.h:346 #19 0xffffffff80726b86 in _fdrop (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000) at /data2/src/sys/kern/kern_descrip.c:2849 #20 0xffffffff8072b1f7 in closef (fp=0xfffff8011df23b40, td=0xfffff801f8c5d000) at /data2/src/sys/kern/kern_descrip.c:2430 #21 0xffffffff8072768e in closefp (fdp=0xfffff80007104000, fd=6, fp=0xfffff8011df23b40, td=0xfffff801f8c5d000, holdleaders=0) at /data2/src/sys/kern/kern_descrip.c:1191 #22 0xffffffff80728417 in kern_close (td=0xfffff801f8c5d000, fd=6) at /data2/src/sys/kern/kern_descrip.c:1239 #23 0xffffffff8072831f in sys_close (td=0xfffff801f8c5d000, uap=0xfffffe0239543b58) at /data2/src/sys/kern/kern_descrip.c:1218 #24 0xffffffff80c07b7f in syscallenter (td=0xfffff801f8c5d000, sa=0xfffffe0239543b48) at subr_syscall.c:135 #25 0xffffffff80c0741a in amd64_syscall (td=0xfffff801f8c5d000, traced=0) at /data2/src/sys/amd64/amd64/trap.c:902 #26 0xffffffff80bdee8b in Xfast_syscall () at /data2/src/sys/amd64/amd64/exception.S:396 #27 0x0000000801a4033a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) frame 8 #8 0xffffffff8097f249 in ng_address_hook (here=0x0, item=0xfffff8011ddd1f00, hook=0xfffff801f8232300, retaddr=0) at /data2/src/sys/netgraph/ng_base.c:3586 3586 NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) || (kgdb) l 3581 * that the peer is still connected (even if invalid,) we know 3582 * that the peer node is present, though maybe invalid. 3583 */ 3584 TOPOLOGY_RLOCK(); 3585 if ((hook == NULL) || NG_HOOK_NOT_VALID(hook) || 3586 NG_HOOK_NOT_VALID(peer = NG_HOOK_PEER(hook)) || 3587 NG_NODE_NOT_VALID(peernode = NG_PEER_NODE(hook))) { 3588 NG_FREE_ITEM(item); 3589 TRAP_ERROR(); 3590 TOPOLOGY_RUNLOCK(); (kgdb) p *hook $1 = { hk_name = 0xfffff801f8232300 "ÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞP¯\003\201ÿÿÿÿÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­ÞÞÀ­Þ"..., hk_private = 0xdeadc0dedeadc0de, hk_flags = -559038242, hk_type = -559038242, hk_peer = 0xdeadc0dedeadc0de, hk_node = 0xdeadc0dedeadc0de, hk_hooks = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, hk_rcvmsg = 0xdeadc0dedeadc0de, hk_rcvdata = 0xdeadc0dedeadc0de, hk_refs = -559038242} (kgdb)