Date: Thu, 24 Sep 2015 09:03:23 +0200 From: Julien Charbon <jch@freebsd.org> To: Palle Girgensohn <girgen@FreeBSD.org> Cc: freebsd-net@freebsd.org Subject: Re: Kernel panics in tcp_twclose Message-ID: <5603A03B.4060002@freebsd.org> In-Reply-To: <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 24/09/15 08:55, Palle Girgensohn wrote: >> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >> <girgen@pingpong.net>: >>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>> <girgen@pingpong.net>: >>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch@freebsd.org>:=20 >>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon >>>>>> <jch@freebsd.org>: On 23/09/15 16:36, Palle Girgensohn >>>>>> wrote: >>>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon >>>>>>>> <jch@freebsd.org>: On 22/09/15 22:58, Palle Girgensohn >>>>>>>> wrote: >>>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon >>>>>>>>>> <jch@freebsd.org>: On 22/09/15 18:49, Palle >>>>>>>>>> Girgensohn wrote: >>>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>>>>>> <girgen@FreeBSD.org>: >>>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>>>>>>>>>> <girgen@FreeBSD.org>: >>>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>>>>>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, >>>>>>>>>>>>>> Konstantin Belousov wrote: >>>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, >>>>>>>>>>>>>>> Julien Charbon wrote: >>>>>>>>>>>>>>>> [...] >>>>>>>>>>>>>> - Second, if issue is still in stable/10, >>>>>>>>>>>>>> compile 10.2 kernel with these options: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> options DDB options DEADLKRES >>>>>>>>>>>>>> options INVARIANTS options >>>>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options >>>>>>>>>>>>>> WITNESS_SKIPSPIN >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> To see where the original fault is coming >>>>>>>>>>>>>> from. >>>>>>>>>>>>> [...] >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I'll try stable/10 now. Would you suggest a >>>>>>>>>>>>> "clean" stable/10, or could 287621 and 287780 >>>>>>>>>>>>> help? >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I'll add the debugging suggested options >>>>>>>>>>>>> right away. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Palle >>>>>>>>>>>>=20 >>>>>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>>>>=20 >>>>>>>>>>>> options DDB options DEADLKRES >>>>>>>>>>>> options INVARIANTS options >>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options >>>>>>>>>>>> WITNESS_SKIPSPIN >>>>>>>>>>>=20 >>>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 >>>>>>>>>>> [FreeBSD] Copyright 2004 Free Software >>>>>>>>>>> Foundation, Inc. GDB is free software, covered by >>>>>>>>>>> the GNU General Public License, and you are >>>>>>>>>>> welcome to change it and/or distribute copies of >>>>>>>>>>> it under certain conditions. Type "show copying" >>>>>>>>>>> to see the conditions. There is absolutely no >>>>>>>>>>> warranty for GDB. Type "show warranty" for >>>>>>>>>>> details. This GDB was configured as=20 >>>>>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>>>>=20 >>>>>>>>>>> Unread portion of the kernel message buffer: >>>>>>>>>>> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && >>>>>>>>>>> tp !=3D NULL cpuid =3D 16 KDB: stack backtrace: >>>>>>>>>>> db_trace_self_wrapper() at=20 >>>>>>>>>>> db_trace_self_wrapper+0x2b/frame >>>>>>>>>>> 0xfffffe183d9e97e0 kdb_backtrace() at >>>>>>>>>>> kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>>>>>>>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 >>>>>>>>>>> kassert_panic() at kassert_panic+0x139/frame >>>>>>>>>>> 0xfffffe183d9e9940 tcp_usr_detach() at >>>>>>>>>>> tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 >>>>>>>>>>> sofree() at sofree+0x1f1/frame 0xfffffe183d9e99a0 >>>>>>>>>>> soclose() at soclose+0x3a0/frame=20 >>>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame=20 >>>>>>>>>>> 0xfffffe183d9e9a10 closef() at >>>>>>>>>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() >>>>>>>>>>> at closefp+0x9d/frame 0xfffffe183d9e9ae0 >>>>>>>>>>> amd64_syscall() at amd64_syscall+0x25a/frame >>>>>>>>>>> 0xfffffe183d9e9bf0 Xfast_syscall() at >>>>>>>>>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- >>>>>>>>>>> syscall (6, FreeBSD ELF64, sys_close), rip =3D >>>>>>>>>>> 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D >>>>>>>>>>> 0x7ffff91c8680 --- KDB: enter: panic Uptime: >>>>>>>>>>> 18h57m59s Dumping 23085 out of 98263=20 >>>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>=20 Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols >>>>>>>>>>> Reading symbols from >>>>>>>>>>> /boot/kernel/zfs.ko.symbols...done. Loaded=20 >>>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. >>>>>>>>>>> Loaded symbols for >>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_bridge.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/netgraph.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_eiface.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_ether.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/accf_data.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/accf_http.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ums.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_socket.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/fdescfs.ko.symbols #0 >>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 219 >>>>>>>>>>> __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 >>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 #1 >>>>>>>>>>> 0xffffffff8094b337 in kern_reboot (howto=3D260) at=20 >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 >>>>>>>>>>> 0xffffffff8094b845 in vpanic (fmt=3D<value >>>>>>>>>>> optimized out>, ap=3D<value optimized out>) at >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:758 #3=20 >>>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D<value >>>>>>>>>>> optimized out>) at >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:646 #4=20 >>>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D<value >>>>>>>>>>> optimized out>) at >>>>>>>>>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5=20 >>>>>>>>>>> 0xffffffff809cd291 in sofree >>>>>>>>>>> (so=3D0xfffff801dd302000) at=20 >>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 >>>>>>>>>>> 0xffffffff809cdb00 in soclose (so=3D<value >>>>>>>>>>> optimized out>) at=20 >>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 >>>>>>>>>>> 0xffffffff808fe659 in _fdrop >>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8 >>>>>>>>>>> 0xffffffff80901092 in closef >>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0) >>>>>>>>>>> at /usr/src/sys/kern/kern_descrip.c:2338 #9 >>>>>>>>>>> 0xffffffff808feb5d in closefp >>>>>>>>>>> (fdp=3D0xfffff80b20cce000, fd=3D<value optimized >>>>>>>>>>> out>, fp=3D0xfffff802a593db40, >>>>>>>>>>> td=3D0xfffff80eebc894a0, holdleaders=3D<value >>>>>>>>>>> optimized out>) at=20 >>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 >>>>>>>>>>> 0xffffffff80d7bc3a in amd64_syscall >>>>>>>>>>> (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in >>>>>>>>>>> Xfast_syscall () at >>>>>>>>>>> /usr/src/sys/amd64/amd64/exception.S:396 #12=20 >>>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner >>>>>>>>>>> to this frame (corrupt stack?) Current language: >>>>>>>>>>> auto; currently minimal >>>>>>>>>>=20 >>>>>>>>>> Thanks for the information. As I suspected the >>>>>>>>>> initial error was elsewhere than tcp_twclose(), >>>>>>>>>> never got this assertion before: >>>>>>>>>>=20 >>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D >>>>>>>>>> NULL >>>>>>>>>>=20 >>>>>>>>>> from here: >>>>>>>>>>=20 >>>>>>>>>> static void tcp_detach(struct socket *so, struct >>>>>>>>>> inpcb *inp) { struct tcpcb *tp; >>>>>>>>>>=20 >>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >>>>>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>>>>=20 >>>>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D >>>>>>>>>> inp")); KASSERT(inp->inp_socket =3D=3D so, >>>>>>>>>> ("tcp_detach: inp_socket !=3D so")); >>>>>>>>>>=20 >>>>>>>>>> tp =3D intotcpcb(inp); >>>>>>>>>>=20 >>>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if >>>>>>>>>> (inp->inp_flags & INP_DROPPED) { KASSERT(tp =3D=3D >>>>>>>>>> NULL, ("tcp_detach: INP_TIMEWAIT && " "INP_DROPPED >>>>>>>>>> && tp !=3D NULL")); >>>>>>>>>>=20 >>>>>>>>>> Let me check if I could find a path that could lead >>>>>>>>>> to this unexpected case. Unexpected because: >>>>>>>>>> INP_DROPPED and inp->inp_ppcb is set to NULL are >>>>>>>>>> set at same time here: >>>>>>>>>>=20 >>>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { >>>>>>>>>> struct socket *so; struct inpcb *inp; >>>>>>>>>>=20 >>>>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & >>>>>>>>>> INP_TIMEWAIT), ("tcp_twclose: !timewait")); >>>>>>>>>> KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb >>>>>>>>>> !=3D tw")); INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >>>>>>>>>> /* in_pcbfree() */ INP_WLOCK_ASSERT(inp); >>>>>>>>>>=20 >>>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL; >>>>>>>>>> in_pcbdrop(inp); ... >>>>>>>>>>=20 >>>>>>>>>> Interesting [...] >>>>>>>=20 >>>>>>> Just a quick update. Julien is pursuing this off list >>>>>>> with a core dump and we are now waiting for a new core >>>>>>> dump with the first KASSERT removed. this is on a >>>>>>> stable/10 kernel. >>>>>>=20 >>>>>> By the way Palle could you also run below Dtrace script to >>>>>> see where this tcp_close() in INP_TIMEWAIT comes from: >>>>>>=20 >>>>>> $ cat tcp-close-tw.d fbt::tcp_close:entry=20 >>>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ { @s1[stack()] =3D >>>>>> count() } >>>>>>=20 >>>>>> tick-1sec { printa(@s1); } $ sudo dtrace -s tcp-close-tw.d >>>>>>=20 >>>>>> And share any backtraces reported in this dtrace script >>>>>> output. >>>>>>=20 >>>>>> George, could you check if this dtrace script makes sense >>>>>> for you, and if you have any improvements to add, I am >>>>>> quite a rookie in Dtrace scripts. >>>>>=20 >>>>> Shall I let the dtrace script run continuously until the >>>>> machine crashes? Or just run it once? >>>>=20 >>>> Continuously until the machine crashes. You can report any >>>> backstrace outputs like: >>>>=20 >>>> kernel`tcp_usr_close+0x86 kernel`soclose+0xe4=20 >>>> kernel`_fdrop+0x29 kernel`closef+0x237 kernel`closefp+0x95=20 >>>> kernel`amd64_syscall+0x357 kernel`0xffffffff80c83c4b 1 >>>>=20 >>>> before the machine crashes. But I expect the problematic >>>> case detection with Dtrace to be quickly followed by the crash. >>>> Will see. >>>>=20 >>>> -- Julien >>>>=20 >>>=20 >>> Kernels and userland are updated to 10.2-p3 with the patch >>> removing the suspicous KASSERT. >>>=20 >>> dtrace running continously redirecting to a log file. >>>=20 >>> now we're just waiting... :) >>>=20 >>> Palle >>=20 >> Is the dtrace correct? >>=20 >> $ sort -u dtrace.out >>=20 >> 0 59779 :tick-1sec CPU ID >> FUNCTION:NAME $ wc -l dtrace.out 56233 dtrace.out >>=20 >>=20 >> All it does is write >>=20 >> 0 59779 :tick-1sec >>=20 >> once a second. It is right, it tries to display the guilty backstrace every second, but this is a rare event. 0 59779 :tick-1sec is printed when you have no backstrace yet. > Just had a crash. Unfortunately, the kernel was stuck at the db> > prompt, and the remote keyboard was unresponsive (HP ILO, not > impressed). So I had to reset the power and never got a core dump... >=20 > panic: tcp_tw_2msl_stop: inp should not be released here > cpuid =3D 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame > 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 > kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 > tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 > tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame > 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame > 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame > 0xfffffe175acd18f0 softclock_call_cc() at > softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at > softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers() > at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 > ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 > fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 > --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- > KDB: enter: panic > [ thread pid 12 tid 100043 ] > Stopped at kdb_enter+0x3e: movq $0,kdb_why > db> Thanks a log for this backstrace. This is what at expected, when tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one extra time that leads to: tcp_tw_2msl_stop: inp should not be released here Let me try to come with a tentative fix for this case. > Is there a way to configure the kernel to get all the gory debug > stuff without it dropping to the debug prompt on panic? I'd rather > see it just dump core and restart automatically. You can set: options KDB_UNATTENDED or debug.debugger_on_panic=3D0 in /etc/sysctl.conf both are equivalent. More details here: https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-opti= ons.html Thanks. -- Julien --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWA6BBAAoJEKVlQ5Je6dhx5i4IALr7OhuK7NvVESFUK+9t7ygo lluCrPhEA+z/t1t2C4HutWnVEucLW9Kh0lBjg2qodcvVHVl0wZHN+T1IySaIfhbf dFAxUcWAS3xGQgsPwSuT4UU1uFmqrYtmhOmt0GJVmywLzfKN4AL9toK7uQKknHzW HZM0K+vbzT1pz8RrD+DBSrQrWTrDk1adF94oi5Z+01MssGZ0ljcyeStsxba3Wg3N 9pYKfSSXKdld6qpjdj2di6EmevS12LVyUGt755ke9Bd1r09DXhFbZ2hXkDf+NQAN QQswXp+/DGVBXtc8L+AiAJhuu8oPUqWc38jdNe0HGRa7naEsdgLWC4OkSchzwcE= =msZs -----END PGP SIGNATURE----- --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5603A03B.4060002>