Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Sep 2015 22:42:30 +0200
From:      Palle Girgensohn <girgen@pingpong.net>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Julien Charbon <jch@freebsd.org>, Palle Girgensohn <girgen@FreeBSD.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net>
In-Reply-To: <20150918160605.GN67105@kib.kiev.ua>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help




> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov <kostikbel@gmail.com>:
>=20
>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:
>> Hi Palle,
>>=20
>>> On 18/09/15 11:12, Palle Girgensohn wrote:
>>> We see daily panics on our production systems (web server, apache
>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using netgraph
>>> interfaces [not epair]).
>>>=20
>>> The problem started after the summer. Normal port upgrades seems to
>>> be the only difference. The problem occurs with 10.2-p2 kernel as
>>> well as 10.1-p4 and 10.1-p15.
>>>=20
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175
>>>=20
>>> Any ideas?
>>=20
>> Thanks for you detailed report.  I am not aware of any tcp_twclose()
>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean there
>> are none).  Few interesting facts (at least for me):
>>=20
>> - Your crash happens when unlocking a inp exclusive lock with INP_WUNLOCK=
()
>>=20
>> - Something is already wrong before calling turnstile_broadcast() as it
>> is called with ts =3D NULL:
> In the kernel without witness this is a 99%-sure indication of attempt to
> unlock not owned lock.
>=20
>>=20
>> turnstile_broadcast (ts=3D0x0, queue=3D1) at
>> /usr/src/sys/kern/subr_turnstile.c:838
>> __rw_wunlock_hard () at /usr/src/sys/kern/kern_rwlock.c:988
>> tcp_twclose () at /usr/src/sys/netinet/tcp_timewait.c:540
>> tcp_tw_2msl_scan () at /usr/src/sys/netinet/tcp_timewait.c:748
>> tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:198
>>=20
>> I won't go to far here as I am not expert enough in VIMAGE, but one
>> question anyway:
>>=20
>> - Can you correlate this kernel panic to a particular event?  Like for
>> example a VIMAGE/VNET jail destruction.
>>=20
>> I will test that on my side on a 10.2 machine.
>>=20
>> --
>> Julien
>>=20
>=20
>=20


Hi,

I just got a response from adrian@ where he seems to remember that it has al=
l been fixed in head.=20

I would really prefer not to run a head kernel in production unless I have t=
o, so the question is if it is possible to pin down the specific fixes for t=
his problem? Any suggestions?

Thanks for all the help so far!

Palle=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9A234106-62EC-49C9-954A-2DA8315E9B4A>