From owner-freebsd-net@freebsd.org Mon Sep 21 08:55:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8F4BA0619A for ; Mon, 21 Sep 2015 08:55:46 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 43DF21484; Mon, 21 Sep 2015 08:55:45 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from [10.0.1.25] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 5C226C8BC; Mon, 21 Sep 2015 10:55:38 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <55FFBFBC.30905@freebsd.org> Date: Mon, 21 Sep 2015 10:55:37 +0200 Cc: Konstantin Belousov , Adrian Chadd , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> <55FFBFBC.30905@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 08:55:46 -0000 > 21 sep 2015 kl. 10:28 skrev Julien Charbon : >=20 >=20 > Hi Palle, >=20 > On 18/09/15 22:42, Palle Girgensohn wrote: >>> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov >>> : >>>=20 >>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:=20 >>>> Hi Palle, >>>>=20 >>>>> On 18/09/15 11:12, Palle Girgensohn wrote: We see daily panics >>>>> on our production systems (web server, apache running MPM >>>>> event, openjdk8. Kernel with VIMAGE. Jails using netgraph=20 >>>>> interfaces [not epair]). >>>>>=20 >>>>> The problem started after the summer. Normal port upgrades >>>>> seems to be the only difference. The problem occurs with >>>>> 10.2-p2 kernel as well as 10.1-p4 and 10.1-p15. >>>>>=20 >>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>>=20 >>>>> Any ideas? >>>>=20 >>>> Thanks for you detailed report. I am not aware of any >>>> tcp_twclose() related issues (without VIMAGE) since FreeBSD 10.0 >>>> (does not mean there are none). Few interesting facts (at least >>>> for me): >>>>=20 >>>> - Your crash happens when unlocking a inp exclusive lock with >>>> INP_WUNLOCK() >>>>=20 >>>> - Something is already wrong before calling turnstile_broadcast() >>>> as it is called with ts =3D NULL: >>> In the kernel without witness this is a 99%-sure indication of >>> attempt to unlock not owned lock. >>>=20 >>>>=20 >>>> turnstile_broadcast (ts=3D0x0, queue=3D1) at=20 >>>> /usr/src/sys/kern/subr_turnstile.c:838 __rw_wunlock_hard () at >>>> /usr/src/sys/kern/kern_rwlock.c:988 tcp_twclose () at >>>> /usr/src/sys/netinet/tcp_timewait.c:540 tcp_tw_2msl_scan () at >>>> /usr/src/sys/netinet/tcp_timewait.c:748 tcp_slowtimo () at >>>> /usr/src/sys/netinet/tcp_timer.c:198 >>>>=20 >>>> I won't go to far here as I am not expert enough in VIMAGE, but >>>> one question anyway: >>>>=20 >>>> - Can you correlate this kernel panic to a particular event? >>>> Like for example a VIMAGE/VNET jail destruction. >>>>=20 >>>> I will test that on my side on a 10.2 machine. >>=20 >> I just got a response from adrian@ where he seems to remember that it >> has all been fixed in head. >>=20 >> I would really prefer not to run a head kernel in production unless I >> have to, so the question is if it is possible to pin down the >> specific fixes for this problem? Any suggestions? >>=20 >> Thanks for all the help so far! >=20 > On my side, all issues we have found in TCP stack are currently both > fixed in 10.2 and HEAD. The remaining differences are only = performance > improvements that are solely in HEAD. adrian@ might have more details > on fixes he has in mind. Hi, 10.2 gives us the same sort of crash as 10.1. Vi are now testing releng/10.1 with these two patches merged: https://svnweb.freebsd.org/changeset/base/287261 https://svnweb.freebsd.org/changeset/base/287780 We have yet to see a crash, so it is looking vaguelly promising, but we = have to wait and see. Palle PS. I've failed to mention that except VIMAGE +jails, the jail host is = an NFS client as well. They NFS shares are mounted from the jail host, = not the jails (since that is not possible anyway). DS.