From owner-freebsd-current@FreeBSD.ORG Mon Dec 6 03:45:31 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9921C16A4CE for ; Mon, 6 Dec 2004 03:45:31 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id F2A8543D48 for ; Mon, 6 Dec 2004 03:45:30 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 31303 invoked by uid 89); 6 Dec 2004 03:45:28 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 6 Dec 2004 03:45:28 -0000 Received: (qmail 31283 invoked by uid 89); 6 Dec 2004 03:45:28 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 6 Dec 2004 03:45:28 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id iB63jN9k059246; Sun, 5 Dec 2004 22:45:23 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Robert Watson In-Reply-To: References: Content-Type: text/plain Message-Id: <1102304723.56978.113.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Sun, 05 Dec 2004 22:45:23 -0500 Content-Transfer-Encoding: 7bit cc: Barney Wolff cc: "current@freebsd.org" Subject: Re: mbuf count negative X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Dec 2004 03:45:31 -0000 On Sun, 2004-12-05 at 14:08, Robert Watson wrote: > On Sun, 5 Dec 2004, Sean McNeil wrote: > > > > It replaces non-atomic maintenance of the counters with atomic > > > maintenance. However, this adds measurably to the cost of allocation, so > > > I've been reluctant to commit it. The counters maintained by UMA are > > > likely sufficient to generate the desired mbuf output now that we have > > > mbuma, but I haven't had an opportunity to walk through the details of it. > > > I hope to do so once I get closer to merging patches to use critical > > > sections to protect UMA per-cpu caches, since I need to redo parts of the > > > sysctl code then anyway. You might want to give this patch, or one much > > > like it, a spin to confirm that the race is the one I think it is. The > > > race in updating mbuf allocator statistics is one I hope to get fixed > > > prior to 5.4. > > > > Since they appear to not be required for actual system use (by the fact > > that it being negative doesn't cause problems), could the counts be > > computed for display instead? > > This is pretty much what UMA does with its per-CPU caches. It pulls and > pushes statistics from the caches in a couple of situations: > > - When pulling a new bucket into or out of the cache, it has to acquire > the zone mutex, so also pushes statistics. > - When a timer fires every few seconds, all the caches are checked to > update the global zone statistics. > - When the sysctl runs, it replicates the logic in the timer code to also > update the zone statistics for display. > > And you can already extract pretty much all of the interesting allocation > information for mbufs from vmstat -z as the mbufs are now stored using > UMA. In the critical section protected version of the code, I haven't yet > decided if the timers should run per-cpu, and/or how the sysctl should > coalesce the information for display. I hope to have much of this > resolved shortly. My current leaning is that a small amount of localized > and temporary inconsistency in the stats isn't a problem, so simply doing > a set of lockless reads across the per-cpu caches to update stats for > presentation should be fine, and that we can probably drop the timer > updates of statistics since the cache bucket balancing keeps things pretty > in sync. > > I haven't committed the move to critical sections yet as it's currently a > performance pessimization for the UP case, as entering a critical section > on UP is more expensive than acquiring a mutex. John Baldwin has patches > that remedy this, but hasn't yet merged them (there's also an instability > with them I've seen). I know that Stephen Uphoff has also been > investigating this issue. I am wondering if critical sections are overkill for this application since the interrupt blocking capability is not needed. Perhaps a mutex_lock_local, mutex_unlock_local that work on per CPU mutexes and require the current thread to be pinned to a CPU would be more appropriate. The functions would have the same functionality as mutex_lock/mutex_unlock but without using atomic operations. The "local" mutex would be a clone of the UP sleep mutex and the SMP sleep mutex and the local mutex could even be used together in any order. (With working priority inheritance) I think I will try this in my p4 mutex branch in the next days. A "do not preempt" section (Interrupt enabled) would be another possibility. Stephan