From owner-svn-src-all@freebsd.org Mon Apr 11 11:11:10 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC525B0CF09; Mon, 11 Apr 2016 11:11:10 +0000 (UTC) (envelope-from zec@fer.hr) Received: from mail.fer.hr (mail.fer.hr [161.53.72.233]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.fer.hr", Issuer "TERENA SSL CA 3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4AB011061; Mon, 11 Apr 2016 11:11:09 +0000 (UTC) (envelope-from zec@fer.hr) Received: from x23 (161.53.63.210) by MAIL.fer.hr (161.53.72.233) with Microsoft SMTP Server (TLS) id 14.3.279.2; Mon, 11 Apr 2016 13:09:50 +0200 Date: Mon, 11 Apr 2016 13:10:05 +0200 From: Marko Zec To: "Bjoern A. Zeeb" CC: John Baldwin , , , Subject: Re: svn commit: r297742 - head/sys/netinet Message-ID: <20160411131005.0f9feac3@x23> In-Reply-To: References: <201604091205.u39C5Oks041429@repo.freebsd.org> <5630207.6cr5Ycyqbh@ralph.baldwin.cx> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.1) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [161.53.63.210] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Apr 2016 11:11:10 -0000 On Sat, 9 Apr 2016 18:31:24 +0000 "Bjoern A. Zeeb" wrote: > On Sat, 9 Apr 2016, John Baldwin wrote: > > > On Saturday, April 09, 2016 12:05:24 PM Bjoern A. Zeeb wrote: > >> Author: bz > >> Date: Sat Apr 9 12:05:23 2016 > >> New Revision: 297742 > >> URL: https://svnweb.freebsd.org/changeset/base/297742 > >> > >> Log: > >> Mfp: r296310,r296343 > >> > >> It looks like as with the safety belt of DELAY() fastened (*) we > >> can completely tear down and free all memory for TCP (after > >> r281599). > >> > >> (*) in theory a few ticks should be good enough to make sure the > >> timers are all really gone. Could we use a better matric here and > >> check a tcbcb count as an optimization? > > > > In theory, no amount of DELAY() is ever enough to close a > > theoretical race window. In practice you might get lucky, but you > > might also panic and > > Yes I do understand. Thus saying a better metric should do the right > thing. I am aware of the consequences of removing type stability. > Should there be reports I'll make sure that DELAY becomes something > more proper sooner than later. > > > > trash user data. In the rest of the tree, we tend to prefer > > marking items as NOFREE instead of this approach putting a priority > > on stability and reliability over memory efficiency. > > > > For all of the zones that you removed NOFREE from, do you know why > > that was added in the first place (e.g. which stale pointers to > > pcbs could be referenced after free)? Did you verify that those > > conditions have been fixed? > > I did check. I did check a few years ago (and I think you had > reviewed that; maybe it was trouble). And the TCP bits here were > the last ones that were problematic back then. With the changes from > r281599 this should no longer be a problem. > > As for the others, a few years ago Andre already removed the NOFREE > and we unconditionally made him back the change out, which was a > mistake as otherwise some of these zones would have been "clean" for > years. Others have had KASSERTs ensuring that on VNET stack they were > actually empty. Just for the record, another proposal which could have solved this cleanly years ago but was swiftly rejected short of proper discussion, was to de-virtualize memory zones used throughout the network stack. The original thrust behind my decision to virtualize the zones was to catch any inter-vnet leaks during the initial VNETization efforts. Today, roughly 10 years after the vnet / vimage changes hit the tree, during which period no inter-vnet zone leaks have been identified, I see no reason to keep (most of) the zones separated in multiple instances. I've heard arguments that maintaining separate zones may be good for parallelization due to separation of pcbinfo locks, but such claims have never been supported by benchmark runs or real-life anecdotal evidence. Another argument which was brought up to reject my proposal, that separate zones provide or could provide inherent per-vnet resource limiting was also and still is extremely thin, since the number of items in most networking zones is limited to maxsockets, which is a shared / global OS limit. OTOH, it's obvious that maintaining separate zones is not only wasteful on memory, but likely can contribute to increased D-cache trashing with traffic spread among multiple vnets. And finally, as John pointed out, the DELAY(tick) commited here and deemed a "safety belt" is highly dubious, since sleeping for a single tick provides no real guarantees against potential races, but rather works as an umbrella to mask them in some (un)lucky cases. Nevertheless, I'm glad Bjoern took time to push the vnet framework further to the point where it could be finally enabled by default, for which I hope was the main objective behind this sweep, since I find it absurd that Linux has had vnet-like capability available by default for years, while we not only pioneered the concept but had it merged in the main tree years before Linux folks even started their efforts. Marko