Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Sep 2015 16:19:26 +0200
From:      Palle Girgensohn <girgen@FreeBSD.org>
To:        Julien Charbon <jch@FreeBSD.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <9529CF41-E4B9-4AC5-9703-945EC35924BC@FreeBSD.org>
In-Reply-To: <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> 25 sep 2015 kl. 16:14 skrev Palle Girgensohn <girgen@FreeBSD.org>:
>=20
>>=20
>> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn <girgen@FreeBSD.org>:
>>=20
>>=20
>>> 24 sep 2015 kl. 09:57 skrev Julien Charbon <jch@FreeBSD.org>:
>>>=20
>>>=20
>>> Hi -net,
>>>=20
>>> On 24/09/15 09:03, Julien Charbon wrote:
>>>> On 24/09/15 08:55, Palle Girgensohn wrote:
>>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn
>>>>>> <girgen@pingpong.net>:
>>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn
>>>>>>> <girgen@pingpong.net>:
>>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch@freebsd.org>:=20=

>>>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>>>> Kernels and userland are updated to 10.2-p3 with the patch
>>>>>>> removing the suspicous KASSERT.
>>>>>>> dtrace running continously redirecting to a log file.
>>>>> Just had a crash. Unfortunately, the kernel was stuck at the db>
>>>>> prompt, and the remote keyboard was unresponsive (HP ILO, not
>>>>> impressed). So I had to reset the power and never got a core =
dump...
>>>>>=20
>>>>> panic: tcp_tw_2msl_stop: inp should not be released here
>>>>> cpuid =3D 0
>>>>> KDB: stack backtrace:
>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame =
0xfffffe175acd1790
>>>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800
>>>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850
>>>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame
>>>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame
>>>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame
>>>>> 0xfffffe175acd18f0 softclock_call_cc() at
>>>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at
>>>>> softclock+0x47/frame 0xfffffe175acd19f0 =
intr_event_execute_handlers()
>>>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30
>>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70
>>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0
>>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0
>>>>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 ---
>>>>> KDB: enter: panic
>>>>> [ thread pid 12 tid 100043 ]
>>>>> Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>>>>> db>
>>>>=20
>>>> Thanks a log for this backstrace.  This is what at expected, when
>>>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be =
called one
>>>> extra time that leads to:
>>>>=20
>>>> tcp_tw_2msl_stop: inp should not be released here
>>>>=20
>>>> Let me try to come with a tentative fix for this case.
>>>=20
>>> See joined my tentative patch for these case.  It is only a first
>>> tentative patch as I am still waiting on -net feedbacks on what =
should
>>> be the rule here.
>>>=20
>>> By the way:
>>>=20
>>> - I see nothing specific to VIMAGE here
>>>=20
>>> - Anyone aware of tcp_close() (or tcp_drop()) calls =
modified/introduced
>>> recently in 10.2 that could explained why this issue only appears =
only now?
>>>=20
>>> --
>>> Julien
>>> <tcp-close-fix-v1.patch>
>>=20
>>=20
>> Running a machine with the patch now (it just crashed and rebooted =
with the new kernel).
>>=20
>> Hoping it will have a "soothing" effect... ;-)
>>=20
>>=20
>> dtrace running as previously. No output yet, though.
>>=20
>>=20
>=20
> Hello -net & Julien!
>=20
> First of, loud cheers and a big *thank you* to Julien for helping us =
get our systems to stop crashing. This really means a lot to us! Thank =
you!
>=20
> We have been running more than 24 hours with no crash, so I'm getting =
more and more confident that the change acually makes the system stable.
>=20
> Dtrace still shows nothing.
>=20
> Palle


Secondly, is this error related? This is *not* VIMAGE, *not* jail. It is =
a binary installed GENERIC from freebsd-update. 10.1-RELEASE-p19. It =
just crashed today, and we did not get any core dump, but I found this =
core.txt from a crash in August that I was not aware of (I was on =
holiday then... :)

Since it is installed binary, I have no kernel.debug.

...

panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 =
clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you =
are
welcome to change it and/or distribute copies of it under certain =
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for =
details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 =
clashing
cpuid =3D 1
KDB: stack backtrace:
#0 0xffffffff80963000 at kdb_backtrace+0x60
#1 0xffffffff80928125 at panic+0x155
#2 0xffffffff8099c180 at sbdroprecord_locked+0
#3 0xffffffff80ac8c9c at tcp_output+0xdbc
#4 0xffffffff80ac6a95 at tcp_do_segment+0x3045
#5 0xffffffff80ac2e04 at tcp_input+0xd04
#6 0xffffffff80a54fc7 at ip_input+0x97
#7 0xffffffff809f4f73 at swi_net+0x143
#8 0xffffffff808faf4b at intr_event_execute_handlers+0xab
#9 0xffffffff808fb396 at ithread_loop+0x96
#10 0xffffffff808f8b6a at fork_exit+0x9a
#11 0xffffffff80d0b67e at fork_trampoline+0xe
Uptime: 21d0h54m53s
Dumping 2005 out of 32709 =
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/kernel/oce.ko.symbols...done.
Loaded symbols for /boot/kernel/oce.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
Loaded symbols for /boot/kernel/linprocfs.ko.symbols
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
#0  doadump (textdump=3D<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=3D<value optimized out>) at pcpu.h:219
#1  0xffffffff80927da2 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff80928164 in panic (fmt=3D<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8099c180 in sbsndptr (sb=3D<value optimized out>,=20
    off=3D<value optimized out>, len=3D<value optimized out>,=20
    moff=3D<value optimized out>) at =
/usr/src/sys/kern/uipc_sockbuf.c:1011
#4  0xffffffff80ac8c9c in tcp_output (tp=3D0xfffff80312ef5800)
    at /usr/src/sys/netinet/tcp_output.c:870
#5  0xffffffff80ac6a95 in tcp_do_segment (m=3D<value optimized out>,=20
    th=3D<value optimized out>, so=3D<value optimized out>,=20
    tp=3D<value optimized out>, drop_hdrlen=3D<value optimized out>, =
tlen=3D0,=20
    iptos=3D<value optimized out>, ti_locked=3DCannot access memory at =
address 0x1
)
    at /usr/src/sys/netinet/tcp_input.c:3018
#6  0xffffffff80ac2e04 in tcp_input (m=3D<value optimized out>,=20
    off0=3D<value optimized out>) at =
/usr/src/sys/netinet/tcp_input.c:1377
#7  0xffffffff80a54fc7 in ip_input (m=3D0xfffff800b4516600)
    at /usr/src/sys/netinet/ip_input.c:734
#8  0xffffffff809f4f73 in swi_net (arg=3D0xffffffff81988880)
    at /usr/src/sys/net/netisr.c:765
#9  0xffffffff808faf4b in intr_event_execute_handlers (
    p=3D<value optimized out>, ie=3D0xfffff800093ac600)
    at /usr/src/sys/kern/kern_intr.c:1263
#10 0xffffffff808fb396 in ithread_loop (arg=3D0xfffff80009388e40)
    at /usr/src/sys/kern/kern_intr.c:1276
#11 0xffffffff808f8b6a in fork_exit (
    callout=3D0xffffffff808fb300 <ithread_loop>, arg=3D0xfffff80009388e40,=
=20
    frame=3D0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996
#12 0xffffffff80d0b67e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:606
#13 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)=20







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9529CF41-E4B9-4AC5-9703-945EC35924BC>