Date: Wed, 23 Sep 2015 00:01:47 +0200 From: Palle Girgensohn <girgen@FreeBSD.org> To: Julien Charbon <jch@freebsd.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-net@freebsd.org, Hans Petter Selasky <hps@selasky.org> Subject: Re: Kernel panics in tcp_twclose Message-ID: <FE73FCDB-308E-4732-BD48-31C781B4D42F@FreeBSD.org> In-Reply-To: <5601CF2D.9030307@freebsd.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch@freebsd.org>: > > > Hi Palle, > > On 22/09/15 22:58, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch@freebsd.org>: >>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>>>> <girgen@FreeBSD.org>: >>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>>> <girgen@FreeBSD.org>: >>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, Konstantin Belousov >>>>>>> wrote: >>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>>>>> wrote: >>>>>>>>> [...] >>>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>>> kernel with these options: >>>>>>> >>>>>>> options DDB options DEADLKRES options >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>> >>>>>>> To see where the original fault is coming from. >>>>>> [...] >>>>>> >>>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>>> stable/10, or could 287621 and 287780 help? >>>>>> >>>>>> I'll add the debugging suggested options right away. >>>>>> >>>>>> Palle >>>>> >>>>> I have a new core dump from ^/stable/10 with: >>>>> >>>>> options DDB options DEADLKRES options INVARIANTS >>>>> options INVARIANT_SUPPORT options WITNESS options >>>>> WITNESS_SKIPSPIN >>>> >>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>>> software, covered by the GNU General Public License, and you are >>>> welcome to change it and/or distribute copies of it under certain >>>> conditions. Type "show copying" to see the conditions. There is >>>> absolutely no warranty for GDB. Type "show warranty" for >>>> details. This GDB was configured as "amd64-marcel-freebsd"... >>>> >>>> Unread portion of the kernel message buffer: panic: tcp_detach: >>>> INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 16 KDB: stack >>>> backtrace: db_trace_self_wrapper() at >>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at >>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at >>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at >>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>>> FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = >>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic >>>> Uptime: 18h57m59s Dumping 23085 out of 98263 >>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>> >>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/zfs.ko.symbols Reading symbols from >>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/netgraph.ko.symbols Reading symbols from >>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from >>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/accf_data.ko.symbols Reading symbols from >>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/accf_http.ko.symbols Reading symbols from >>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ums.ko.symbols Reading symbols from >>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from >>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at >>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 >>>> doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 in >>>> kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 >>>> #2 0xffffffff8094b845 in vpanic (fmt=<value optimized out>, >>>> ap=<value optimized out>) at >>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>>> kassert_panic (fmt=<value optimized out>) at >>>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in >>>> tcp_usr_detach (so=<value optimized out>) at >>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in >>>> sofree (so=0xfffff801dd302000) at >>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>> soclose (so=<value optimized out>) at >>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 >>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, >>>> td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>>> #9 0xffffffff808feb5d in closefp (fdp=0xfffff80b20cce000, >>>> fd=<value optimized out>, fp=0xfffff802a593db40, >>>> td=0xfffff80eebc894a0, holdleaders=<value optimized out>) at >>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in >>>> amd64_syscall (td=0xfffff80eebc894a0, traced=0) at >>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at >>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>>> in ?? () Previous frame inner to this frame (corrupt stack?) >>>> Current language: auto; currently minimal >>> >>> Thanks for the information. As I suspected the initial error was >>> elsewhere than tcp_twclose(), never got this assertion before: >>> >>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL >>> >>> from here: >>> >>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>> struct tcpcb *tp; >>> >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>> >>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); >>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket != so")); >>> >>> tp = intotcpcb(inp); >>> >>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && " >>> "INP_DROPPED && tp != NULL")); >>> >>> Let me check if I could find a path that could lead to this >>> unexpected case. Unexpected because: INP_DROPPED and >>> inp->inp_ppcb is set to NULL are set at same time here: >>> >>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so; >>> struct inpcb *inp; >>> >>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, >>> ("tcp_twclose: inp_ppcb != tw")); >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ >>> INP_WLOCK_ASSERT(inp); >>> >>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; >>> in_pcbdrop(inp); ... >>> >>> Interesting and by the way could you try: >>> >>> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads >>> >>> To see if other thread are also in TCP stack at the same time, and >>> if one of this thread is referencing the same inp. >>> >> >> BTW, this backtrace looks quite different from the previous ones? Is >> it a different problem, or just a different way to reveal the same >> problem? > > Having a different backstrace is expected, the first backstrace was > quite deep in the stack, and kernel panic-ed quite late after the > original issue. With the kernel debug options, the kernel stopped at > the first suspicious fact. > > And I would say that I still does not understand how it is possible to > reach that state, it is like kernel exclusive lock stopped to work and > allow several thread to work on the same locked inp at same time. > (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird). > > I will ask you the kernel and the core dump but off list to avoid too > much spam in -net. > > -- > Julien > Great, I'll send it you promptly. Palle
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FE73FCDB-308E-4732-BD48-31C781B4D42F>