Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Sep 2015 10:55:37 +0200
From:      Palle Girgensohn <girgen@pingpong.net>
To:        Julien Charbon <jch@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Adrian Chadd <adrian@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <F9751ED1-8521-455B-8613-A98367820FBB@pingpong.net>
In-Reply-To: <55FFBFBC.30905@freebsd.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> <55FFBFBC.30905@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> 21 sep 2015 kl. 10:28 skrev Julien Charbon <jch@freebsd.org>:
>=20
>=20
> Hi Palle,
>=20
> On 18/09/15 22:42, Palle Girgensohn wrote:
>>> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov
>>> <kostikbel@gmail.com>:
>>>=20
>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:=20
>>>> Hi Palle,
>>>>=20
>>>>> On 18/09/15 11:12, Palle Girgensohn wrote: We see daily panics
>>>>> on our production systems (web server, apache running MPM
>>>>> event, openjdk8. Kernel with VIMAGE. Jails using netgraph=20
>>>>> interfaces [not epair]).
>>>>>=20
>>>>> The problem started after the summer. Normal port upgrades
>>>>> seems to be the only difference. The problem occurs with
>>>>> 10.2-p2 kernel as well as 10.1-p4 and 10.1-p15.
>>>>>=20
>>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175
>>>>>=20
>>>>> Any ideas?
>>>>=20
>>>> Thanks for you detailed report.  I am not aware of any
>>>> tcp_twclose() related issues (without VIMAGE) since FreeBSD 10.0
>>>> (does not mean there are none).  Few interesting facts (at least
>>>> for me):
>>>>=20
>>>> - Your crash happens when unlocking a inp exclusive lock with
>>>> INP_WUNLOCK()
>>>>=20
>>>> - Something is already wrong before calling turnstile_broadcast()
>>>> as it is called with ts =3D NULL:
>>> In the kernel without witness this is a 99%-sure indication of
>>> attempt to unlock not owned lock.
>>>=20
>>>>=20
>>>> turnstile_broadcast (ts=3D0x0, queue=3D1) at=20
>>>> /usr/src/sys/kern/subr_turnstile.c:838 __rw_wunlock_hard () at
>>>> /usr/src/sys/kern/kern_rwlock.c:988 tcp_twclose () at
>>>> /usr/src/sys/netinet/tcp_timewait.c:540 tcp_tw_2msl_scan () at
>>>> /usr/src/sys/netinet/tcp_timewait.c:748 tcp_slowtimo () at
>>>> /usr/src/sys/netinet/tcp_timer.c:198
>>>>=20
>>>> I won't go to far here as I am not expert enough in VIMAGE, but
>>>> one question anyway:
>>>>=20
>>>> - Can you correlate this kernel panic to a particular event?
>>>> Like for example a VIMAGE/VNET jail destruction.
>>>>=20
>>>> I will test that on my side on a 10.2 machine.
>>=20
>> I just got a response from adrian@ where he seems to remember that it
>> has all been fixed in head.
>>=20
>> I would really prefer not to run a head kernel in production unless I
>> have to, so the question is if it is possible to pin down the
>> specific fixes for this problem? Any suggestions?
>>=20
>> Thanks for all the help so far!
>=20
> On my side, all issues we have found in TCP stack are currently both
> fixed in 10.2 and HEAD.  The remaining differences are only =
performance
> improvements that are solely in HEAD.  adrian@ might have more details
> on fixes he has in mind.

Hi, 10.2 gives us the same sort of crash as 10.1.

Vi are now testing releng/10.1 with these two patches merged:

https://svnweb.freebsd.org/changeset/base/287261

https://svnweb.freebsd.org/changeset/base/287780


We have yet to see a crash, so it is looking vaguelly promising, but we =
have to wait and see.

Palle

PS. I've failed to mention that except VIMAGE +jails, the jail host is =
an NFS client as well. They NFS shares are mounted from the jail host, =
not the jails (since that is not possible anyway). DS.







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F9751ED1-8521-455B-8613-A98367820FBB>