From owner-freebsd-net@freebsd.org Fri Sep 25 14:19:27 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7991A08E88 for ; Fri, 25 Sep 2015 14:19:27 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 4CD32160D; Fri, 25 Sep 2015 14:19:27 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 9838FD955; Fri, 25 Sep 2015 16:19:26 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> Date: Fri, 25 Sep 2015 16:19:26 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9529CF41-E4B9-4AC5-9703-945EC35924BC@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 14:19:27 -0000 > 25 sep 2015 kl. 16:14 skrev Palle Girgensohn : >=20 >>=20 >> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn : >>=20 >>=20 >>> 24 sep 2015 kl. 09:57 skrev Julien Charbon : >>>=20 >>>=20 >>> Hi -net, >>>=20 >>> On 24/09/15 09:03, Julien Charbon wrote: >>>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>>>> : >>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>>>> : >>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20= >>>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>>>> removing the suspicous KASSERT. >>>>>>> dtrace running continously redirecting to a log file. >>>>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>>>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>>>> impressed). So I had to reset the power and never got a core = dump... >>>>>=20 >>>>> panic: tcp_tw_2msl_stop: inp should not be released here >>>>> cpuid =3D 0 >>>>> KDB: stack backtrace: >>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame = 0xfffffe175acd1790 >>>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>>>> 0xfffffe175acd18f0 softclock_call_cc() at >>>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>>>> softclock+0x47/frame 0xfffffe175acd19f0 = intr_event_execute_handlers() >>>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>>>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >>>>> KDB: enter: panic >>>>> [ thread pid 12 tid 100043 ] >>>>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>>>> db> >>>>=20 >>>> Thanks a log for this backstrace. This is what at expected, when >>>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be = called one >>>> extra time that leads to: >>>>=20 >>>> tcp_tw_2msl_stop: inp should not be released here >>>>=20 >>>> Let me try to come with a tentative fix for this case. >>>=20 >>> See joined my tentative patch for these case. It is only a first >>> tentative patch as I am still waiting on -net feedbacks on what = should >>> be the rule here. >>>=20 >>> By the way: >>>=20 >>> - I see nothing specific to VIMAGE here >>>=20 >>> - Anyone aware of tcp_close() (or tcp_drop()) calls = modified/introduced >>> recently in 10.2 that could explained why this issue only appears = only now? >>>=20 >>> -- >>> Julien >>> >>=20 >>=20 >> Running a machine with the patch now (it just crashed and rebooted = with the new kernel). >>=20 >> Hoping it will have a "soothing" effect... ;-) >>=20 >>=20 >> dtrace running as previously. No output yet, though. >>=20 >>=20 >=20 > Hello -net & Julien! >=20 > First of, loud cheers and a big *thank you* to Julien for helping us = get our systems to stop crashing. This really means a lot to us! Thank = you! >=20 > We have been running more than 24 hours with no crash, so I'm getting = more and more confident that the change acually makes the system stable. >=20 > Dtrace still shows nothing. >=20 > Palle Secondly, is this error related? This is *not* VIMAGE, *not* jail. It is = a binary installed GENERIC from freebsd-update. 10.1-RELEASE-p19. It = just crashed today, and we did not get any core dump, but I found this = core.txt from a crash in August that I was not aware of (I was on = holiday then... :) Since it is installed binary, I have no kernel.debug. ... panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 = clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 = clashing cpuid =3D 1 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff8099c180 at sbdroprecord_locked+0 #3 0xffffffff80ac8c9c at tcp_output+0xdbc #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045 #5 0xffffffff80ac2e04 at tcp_input+0xd04 #6 0xffffffff80a54fc7 at ip_input+0x97 #7 0xffffffff809f4f73 at swi_net+0x143 #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab #9 0xffffffff808fb396 at ithread_loop+0x96 #10 0xffffffff808f8b6a at fork_exit+0x9a #11 0xffffffff80d0b67e at fork_trampoline+0xe Uptime: 21d0h54m53s Dumping 2005 out of 32709 = MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/oce.ko.symbols...done. Loaded symbols for /boot/kernel/oce.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=3D) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=3D) at pcpu.h:219 #1 0xffffffff80927da2 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff80928164 in panic (fmt=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8099c180 in sbsndptr (sb=3D,=20 off=3D, len=3D,=20 moff=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1011 #4 0xffffffff80ac8c9c in tcp_output (tp=3D0xfffff80312ef5800) at /usr/src/sys/netinet/tcp_output.c:870 #5 0xffffffff80ac6a95 in tcp_do_segment (m=3D,=20 th=3D, so=3D,=20 tp=3D, drop_hdrlen=3D, = tlen=3D0,=20 iptos=3D, ti_locked=3DCannot access memory at = address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:3018 #6 0xffffffff80ac2e04 in tcp_input (m=3D,=20 off0=3D) at = /usr/src/sys/netinet/tcp_input.c:1377 #7 0xffffffff80a54fc7 in ip_input (m=3D0xfffff800b4516600) at /usr/src/sys/netinet/ip_input.c:734 #8 0xffffffff809f4f73 in swi_net (arg=3D0xffffffff81988880) at /usr/src/sys/net/netisr.c:765 #9 0xffffffff808faf4b in intr_event_execute_handlers ( p=3D, ie=3D0xfffff800093ac600) at /usr/src/sys/kern/kern_intr.c:1263 #10 0xffffffff808fb396 in ithread_loop (arg=3D0xfffff80009388e40) at /usr/src/sys/kern/kern_intr.c:1276 #11 0xffffffff808f8b6a in fork_exit ( callout=3D0xffffffff808fb300 , arg=3D0xfffff80009388e40,= =20 frame=3D0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996 #12 0xffffffff80d0b67e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #13 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb)=20