Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Sep 2015 23:59:09 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Palle Girgensohn <girgen@FreeBSD.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-net@freebsd.org, Hans Petter Selasky <hps@selasky.org>
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <5601CF2D.9030307@freebsd.org>
In-Reply-To: <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--busGMoBpmdTluNL5AR0vCxaduPQvGv3At
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi Palle,

On 22/09/15 22:58, Palle Girgensohn wrote:
>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch@freebsd.org>:
>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20
>>>> <girgen@FreeBSD.org>:
>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20
>>>>> <girgen@FreeBSD.org>:
>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20
>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, Konstantin Belousov=20
>>>>>> wrote:
>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon=20
>>>>>>> wrote:
>>>>>>>> [...]
>>>>>> - Second, if issue is still in stable/10, compile 10.2
>>>>>> kernel with these options:
>>>>>>=20
>>>>>> options        DDB options        DEADLKRES options=20
>>>>>> INVARIANTS options        INVARIANT_SUPPORT options WITNESS
>>>>>> options        WITNESS_SKIPSPIN
>>>>>>=20
>>>>>> To see where the original fault is coming from.
>>>>> [...]
>>>>>=20
>>>>> I'll try stable/10 now. Would you suggest a "clean"
>>>>> stable/10, or could 287621 and 287780 help?
>>>>>=20
>>>>> I'll add the debugging suggested options right away.
>>>>>=20
>>>>> Palle
>>>>=20
>>>> I have a new core dump from ^/stable/10 with:
>>>>=20
>>>> options        DDB options        DEADLKRES options INVARIANTS
>>>> options        INVARIANT_SUPPORT options WITNESS options
>>>> WITNESS_SKIPSPIN
>>>=20
>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc. GDB is free
>>> software, covered by the GNU General Public License, and you are
>>> welcome to change it and/or distribute copies of it under certain
>>> conditions. Type "show copying" to see the conditions. There is
>>> absolutely no warranty for GDB.  Type "show warranty" for
>>> details. This GDB was configured as "amd64-marcel-freebsd"...
>>>=20
>>> Unread portion of the kernel message buffer: panic: tcp_detach:=20
>>> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack=20
>>> backtrace: db_trace_self_wrapper() at=20
>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0
>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890
>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic()
>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach()
>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at=20
>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at=20
>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at=20
>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at
>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at
>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at
>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at
>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6,
>>> FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D
>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic
>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20
>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>=20
>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols
>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/zfs.ko.symbols Reading symbols from=20
>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from=20
>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20
>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20
>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20
>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20
>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20
>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20
>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/ums.ko.symbols Reading symbols from=20
>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20
>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20
>>> /boot/kernel/fdescfs.ko.symbols #0  doadump (textdump=3D1) at=20
>>> pcpu.h:219 219		__asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0=20
>>> doadump (textdump=3D1) at pcpu.h:219 #1  0xffffffff8094b337 in=20
>>> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451
>>> #2 0xffffffff8094b845 in vpanic (fmt=3D<value optimized out>,
>>> ap=3D<value optimized out>) at
>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in
>>> kassert_panic (fmt=3D<value optimized out>) at=20
>>> /usr/src/sys/kern/kern_shutdown.c:646 #4  0xffffffff80b1ee59 in=20
>>> tcp_usr_detach (so=3D<value optimized out>) at=20
>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5  0xffffffff809cd291 in=20
>>> sofree (so=3D0xfffff801dd302000) at=20
>>> /usr/src/sys/kern/uipc_socket.c:747 #6  0xffffffff809cdb00 in=20
>>> soclose (so=3D<value optimized out>) at=20
>>> /usr/src/sys/kern/uipc_socket.c:849 #7  0xffffffff808fe659 in=20
>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20
>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20
>>> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338
>>> #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000,
>>> fd=3D<value optimized out>, fp=3D0xfffff802a593db40,
>>> td=3D0xfffff80eebc894a0, holdleaders=3D<value optimized out>) at=20
>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in=20
>>> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20
>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at=20
>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a
>>> in ?? () Previous frame inner to this frame (corrupt stack?)
>>> Current language:  auto; currently minimal
>>=20
>> Thanks for the information.  As I suspected the initial error was=20
>> elsewhere than tcp_twclose(), never got this assertion before:
>>=20
>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL
>>=20
>> from here:
>>=20
>> static void tcp_detach(struct socket *so, struct inpcb *inp) {=20
>> struct tcpcb *tp;
>>=20
>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>=20
>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20
>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D so"))=
;
>>=20
>> tp =3D intotcpcb(inp);
>>=20
>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &
>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && "=
=20
>> "INP_DROPPED && tp !=3D NULL"));
>>=20
>> Let me check if I could find a path that could lead to this=20
>> unexpected case.  Unexpected because:  INP_DROPPED and
>> inp->inp_ppcb is set to NULL are set at same time here:
>>=20
>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so;=20
>> struct inpcb *inp;
>>=20
>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),
>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,
>> ("tcp_twclose: inp_ppcb !=3D tw"));=20
>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);      /* in_pcbfree() */=20
>> INP_WLOCK_ASSERT(inp);
>>=20
>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20
>> in_pcbdrop(inp); ...
>>=20
>> Interesting and by the way could you try:
>>=20
>> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads
>>=20
>> To see if other thread are also in TCP stack at the same time, and
>> if one of this thread is referencing the same inp.
>>=20
>=20
> BTW, this backtrace looks quite different from the previous ones? Is
> it a different problem, or just a different way to reveal the same
> problem?

 Having a different backstrace is expected, the first backstrace was
quite deep in the stack, and kernel panic-ed quite late after the
original issue.  With the kernel debug options, the kernel stopped at
the first suspicious fact.

 And I would say that I still does not understand how it is possible to
reach that state, it is like kernel exclusive lock stopped to work and
allow several thread to work on the same locked inp at same time.
(Stack overflow, VIMAGE memory corruption, UMA issue? ... weird).

 I will ask you the kernel and the core dump but off list to avoid too
much spam in -net.

--
Julien


--busGMoBpmdTluNL5AR0vCxaduPQvGv3At
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJWAc82AAoJEKVlQ5Je6dhxYfgIANdO8mB3Uz3Bi3L/qQCiAbzb
g/hYJDr7bYeIT3kpAYRCAsBP+qt0M58ArE0npx0Emj419rCqfhcg6Ld2FEu19B1+
0L3N/511c+kR9aJnZ9rJnm4qkHQIvboZpu4bV+8QZ9QNhzYJ8vjiwjnb4Z92xbbq
Qo5jvLxnyhmvHdgBlUUZup/eq2nHL8zD0GI1WX5pT9Q7ekgGu7jG3E3qPWEVzX6H
j0cLrxnAIZEnbEnrgNZtsjDVvM1S5IzwVGpkwkKtvUbkEr42KRI8xHpU1cDL0if1
NCnxm4kJ+YKUoKTBYqa+7mPJb+GoS5x2oSqh5C5/rfoNYRDKoGj9RAnw9YauVz4=
=2++u
-----END PGP SIGNATURE-----

--busGMoBpmdTluNL5AR0vCxaduPQvGv3At--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5601CF2D.9030307>