Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Sep 2015 00:05:07 +0200
From:      Palle Girgensohn <girgen@pingpong.net>
To:        Julien Charbon <jch@freebsd.org>
Cc:        George Neville-Neil <gnn@FreeBSD.org>, freebsd-net@freebsd.org
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net>
In-Reply-To: <5602F044.5010606@freebsd.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi,

> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch@freebsd.org>:
> 
> 
> Hi,
> 
> On 23/09/15 20:26, Palle Girgensohn wrote:
>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon <jch@freebsd.org>:
>>> On 23/09/15 16:36, Palle Girgensohn wrote:
>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch@freebsd.org>: On
>>>>> 22/09/15 22:58, Palle Girgensohn wrote:
>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch@freebsd.org>: 
>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn 
>>>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn 
>>>>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon 
>>>>>>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, Konstantin
>>>>>>>>>>> Belousov wrote:
>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien
>>>>>>>>>>>> Charbon wrote:
>>>>>>>>>>>>> [...]
>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2 
>>>>>>>>>>> kernel with these options:
>>>>>>>>>>> 
>>>>>>>>>>> options        DDB options        DEADLKRES options 
>>>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options
>>>>>>>>>>> WITNESS options        WITNESS_SKIPSPIN
>>>>>>>>>>> 
>>>>>>>>>>> To see where the original fault is coming from.
>>>>>>>>>> [...]
>>>>>>>>>> 
>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean" 
>>>>>>>>>> stable/10, or could 287621 and 287780 help?
>>>>>>>>>> 
>>>>>>>>>> I'll add the debugging suggested options right away.
>>>>>>>>>> 
>>>>>>>>>> Palle
>>>>>>>>> 
>>>>>>>>> I have a new core dump from ^/stable/10 with:
>>>>>>>>> 
>>>>>>>>> options        DDB options        DEADLKRES options
>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options WITNESS
>>>>>>>>> options WITNESS_SKIPSPIN
>>>>>>>> 
>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] 
>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free 
>>>>>>>> software, covered by the GNU General Public License, and you
>>>>>>>> are welcome to change it and/or distribute copies of it under
>>>>>>>> certain conditions. Type "show copying" to see the
>>>>>>>> conditions. There is absolutely no warranty for GDB.  Type
>>>>>>>> "show warranty" for details. This GDB was configured as
>>>>>>>> "amd64-marcel-freebsd"...
>>>>>>>> 
>>>>>>>> Unread portion of the kernel message buffer: panic:
>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid =
>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at 
>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 
>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame
>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at
>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940
>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame
>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame
>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame
>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame
>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame
>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame
>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at 
>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall()
>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall
>>>>>>>> (6, FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = 
>>>>>>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic 
>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263 
>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>>>>> 
>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. 
>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading
>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded
>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from
>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols
>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from
>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for
>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for 
>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for
>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for 
>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for
>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for
>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for 
>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for
>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from 
>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for 
>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0  doadump (textdump=1) at 
>>>>>>>> pcpu.h:219 219        __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt
>>>>>>>> #0 doadump (textdump=1) at pcpu.h:219 #1  0xffffffff8094b337
>>>>>>>> in kern_reboot (howto=260) at
>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845
>>>>>>>> in vpanic (fmt=<value optimized out>, ap=<value optimized
>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3
>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=<value optimized
>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4
>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=<value optimized
>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5
>>>>>>>> 0xffffffff809cd291 in sofree (so=0xfffff801dd302000) at 
>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6  0xffffffff809cdb00 in
>>>>>>>> soclose (so=<value optimized out>) at 
>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7  0xffffffff808fe659 in
>>>>>>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 
>>>>>>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, 
>>>>>>>> td=0xfffff80eebc894a0) at
>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d
>>>>>>>> in closefp (fdp=0xfffff80b20cce000, fd=<value optimized out>,
>>>>>>>> fp=0xfffff802a593db40, td=0xfffff80eebc894a0,
>>>>>>>> holdleaders=<value optimized out>) at 
>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a
>>>>>>>> in amd64_syscall (td=0xfffff80eebc894a0, traced=0) at 
>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall ()
>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12
>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this
>>>>>>>> frame (corrupt stack?) Current language:  auto; currently
>>>>>>>> minimal
>>>>>>> 
>>>>>>> Thanks for the information.  As I suspected the initial error
>>>>>>> was elsewhere than tcp_twclose(), never got this assertion
>>>>>>> before:
>>>>>>> 
>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
>>>>>>> 
>>>>>>> from here:
>>>>>>> 
>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) {
>>>>>>> struct tcpcb *tp;
>>>>>>> 
>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>>>>>> 
>>>>>>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); 
>>>>>>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket !=
>>>>>>> so"));
>>>>>>> 
>>>>>>> tp = intotcpcb(inp);
>>>>>>> 
>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & 
>>>>>>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT
>>>>>>> && " "INP_DROPPED && tp != NULL"));
>>>>>>> 
>>>>>>> Let me check if I could find a path that could lead to this 
>>>>>>> unexpected case.  Unexpected because:  INP_DROPPED and 
>>>>>>> inp->inp_ppcb is set to NULL are set at same time here:
>>>>>>> 
>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket
>>>>>>> *so; struct inpcb *inp;
>>>>>>> 
>>>>>>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), 
>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, 
>>>>>>> ("tcp_twclose: inp_ppcb != tw")); 
>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);      /* in_pcbfree() */ 
>>>>>>> INP_WLOCK_ASSERT(inp);
>>>>>>> 
>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; 
>>>>>>> in_pcbdrop(inp); ...
>>>>>>> 
>>>>>>> Interesting [...]
>>>> 
>>>> Just a quick update. Julien is pursuing this off list with a core
>>>> dump and we are now waiting for a new core dump with the first
>>>> KASSERT removed. this is on a stable/10 kernel.
>>> 
>>> By the way Palle could you also run below Dtrace script to see where
>>> this tcp_close() in INP_TIMEWAIT comes from:
>>> 
>>> $ cat tcp-close-tw.d
>>> fbt::tcp_close:entry
>>> /args[0]->t_inpcb->inp_flags & 0x01000000/
>>> {
>>> @s1[stack()] = count()
>>> }
>>> 
>>> tick-1sec {
>>> printa(@s1);
>>> }
>>> $ sudo dtrace -s tcp-close-tw.d
>>> 
>>> And share any backtraces reported in this dtrace script output.
>>> 
>>> George, could you check if this dtrace script makes sense for you, and
>>> if you have any improvements to add, I am quite a rookie in Dtrace scripts.
>> 
>> Shall I let the dtrace script run continuously until the machine crashes? Or just run it once?
> 
> Continuously until the machine crashes.  You can report any backstrace
> outputs like:
> 
>              kernel`tcp_usr_close+0x86
>              kernel`soclose+0xe4
>              kernel`_fdrop+0x29
>              kernel`closef+0x237
>              kernel`closefp+0x95
>              kernel`amd64_syscall+0x357
>              kernel`0xffffffff80c83c4b
>                1
> 
> before the machine crashes.   But I expect the problematic case
> detection with Dtrace to be quickly followed by the crash.  Will see.
> 
> --
> Julien
> 

Kernels and userland are updated to 10.2-p3 with the patch removing the suspicous KASSERT. 

dtrace running continously redirecting to a log file.

now we're just waiting... :)

Palle






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54767991-9D3B-4ECB-A07E-CFA21A54BBDD>