From owner-freebsd-net@freebsd.org Fri Sep 18 20:42:31 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED49C9CF572 for ; Fri, 18 Sep 2015 20:42:31 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id B0E171392; Fri, 18 Sep 2015 20:42:31 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from [10.0.1.13] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 9F182EEE8; Fri, 18 Sep 2015 22:42:30 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn X-Mailer: iPhone Mail (12H321) In-Reply-To: <20150918160605.GN67105@kib.kiev.ua> Date: Fri, 18 Sep 2015 22:42:30 +0200 Cc: Julien Charbon , Palle Girgensohn , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> To: Konstantin Belousov X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Sep 2015 20:42:32 -0000 > 18 sep 2015 kl. 18:06 skrev Konstantin Belousov : >=20 >> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: >> Hi Palle, >>=20 >>> On 18/09/15 11:12, Palle Girgensohn wrote: >>> We see daily panics on our production systems (web server, apache >>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using netgraph >>> interfaces [not epair]). >>>=20 >>> The problem started after the summer. Normal port upgrades seems to >>> be the only difference. The problem occurs with 10.2-p2 kernel as >>> well as 10.1-p4 and 10.1-p15. >>>=20 >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>=20 >>> Any ideas? >>=20 >> Thanks for you detailed report. I am not aware of any tcp_twclose() >> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean there >> are none). Few interesting facts (at least for me): >>=20 >> - Your crash happens when unlocking a inp exclusive lock with INP_WUNLOCK= () >>=20 >> - Something is already wrong before calling turnstile_broadcast() as it >> is called with ts =3D NULL: > In the kernel without witness this is a 99%-sure indication of attempt to > unlock not owned lock. >=20 >>=20 >> turnstile_broadcast (ts=3D0x0, queue=3D1) at >> /usr/src/sys/kern/subr_turnstile.c:838 >> __rw_wunlock_hard () at /usr/src/sys/kern/kern_rwlock.c:988 >> tcp_twclose () at /usr/src/sys/netinet/tcp_timewait.c:540 >> tcp_tw_2msl_scan () at /usr/src/sys/netinet/tcp_timewait.c:748 >> tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:198 >>=20 >> I won't go to far here as I am not expert enough in VIMAGE, but one >> question anyway: >>=20 >> - Can you correlate this kernel panic to a particular event? Like for >> example a VIMAGE/VNET jail destruction. >>=20 >> I will test that on my side on a 10.2 machine. >>=20 >> -- >> Julien >>=20 >=20 >=20 Hi, I just got a response from adrian@ where he seems to remember that it has al= l been fixed in head.=20 I would really prefer not to run a head kernel in production unless I have t= o, so the question is if it is possible to pin down the specific fixes for t= his problem? Any suggestions? Thanks for all the help so far! Palle=