From owner-freebsd-net@FreeBSD.ORG Fri Nov 21 15:32:58 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 86082FE; Fri, 21 Nov 2014 15:32:58 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 5C11FF57; Fri, 21 Nov 2014 15:32:58 +0000 (UTC) Received: from c0198.aw.cl.cam.ac.uk (c0198.aw.cl.cam.ac.uk [128.232.100.198]) by cyrus.watson.org (Postfix) with ESMTPSA id B18BF46B2A; Fri, 21 Nov 2014 10:32:56 -0500 (EST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: VIMAGE UDP memory leak fix From: "Robert N. M. Watson" In-Reply-To: <20141121162042.449b22dc@x23> Date: Fri, 21 Nov 2014 15:32:53 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <072B7B0F-4DE3-4D37-BC94-1DEA38CF3B12@FreeBSD.org> References: <20141121002937.4f82daea@x23> <9300CB5F-6140-4C49-B026-EB69B0E8B37E@FreeBSD.org> <20141121120201.6c77ea5b@x23> <20141121162042.449b22dc@x23> To: Marko Zec X-Mailer: Apple Mail (2.1878.6) Cc: Craig Rodrigues , FreeBSD Net , "Bjoern A. Zeeb" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 15:32:58 -0000 On 21 Nov 2014, at 15:20, Marko Zec wrote: >> Bjoern and I chatted for the last twenty or so minutes about the >> code, and believe that as things stand, it is *not* safe to turn off >> UMA_ZONE_NOFREE for TCP due to a teardown race in TCP that has been >> known about and discussed for several years, but is some work to >> resolve and that we've not yet found time to do so. The XXXRW's in >> tcp_timer.c are related to this. We're pondering ways to fix it but >> think this is not something that can be rushed. >=20 > OK fair enough - thanks a lot for looking into this! >=20 > Skimming through a bunch of hosts with moderately loaded hosts with > reasonably high uptime I couldn't find one where = net.inet.tcp.timer_race > was not zero. A ny suggestions how to best reproduce the race(s) in > tcp_timer.c? They would likely occur only on very highly loaded hosts, as they = require race conditions to arise between TCP timers and TCP close. I = think I did manage to reproduce it at one stage, and left the counter in = to see if we could spot it in production, and I have had (multiple) = reports of it in deployed systems. I'm not sure it's worth trying to = reproduce them, given that knowledge -- we should simply fix them. Robert=