Date: Sun, 05 Dec 2004 22:45:23 -0500 From: Stephan Uphoff <ups@tree.com> To: Robert Watson <rwatson@FreeBSD.org> Cc: "current@freebsd.org" <current@FreeBSD.org> Subject: Re: mbuf count negative Message-ID: <1102304723.56978.113.camel@palm.tree.com> In-Reply-To: <Pine.NEB.3.96L.1041205190411.33974G-100000@fledge.watson.org> References: <Pine.NEB.3.96L.1041205190411.33974G-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2004-12-05 at 14:08, Robert Watson wrote: > On Sun, 5 Dec 2004, Sean McNeil wrote: > > > > It replaces non-atomic maintenance of the counters with atomic > > > maintenance. However, this adds measurably to the cost of allocation, so > > > I've been reluctant to commit it. The counters maintained by UMA are > > > likely sufficient to generate the desired mbuf output now that we have > > > mbuma, but I haven't had an opportunity to walk through the details of it. > > > I hope to do so once I get closer to merging patches to use critical > > > sections to protect UMA per-cpu caches, since I need to redo parts of the > > > sysctl code then anyway. You might want to give this patch, or one much > > > like it, a spin to confirm that the race is the one I think it is. The > > > race in updating mbuf allocator statistics is one I hope to get fixed > > > prior to 5.4. > > > > Since they appear to not be required for actual system use (by the fact > > that it being negative doesn't cause problems), could the counts be > > computed for display instead? > > This is pretty much what UMA does with its per-CPU caches. It pulls and > pushes statistics from the caches in a couple of situations: > > - When pulling a new bucket into or out of the cache, it has to acquire > the zone mutex, so also pushes statistics. > - When a timer fires every few seconds, all the caches are checked to > update the global zone statistics. > - When the sysctl runs, it replicates the logic in the timer code to also > update the zone statistics for display. > > And you can already extract pretty much all of the interesting allocation > information for mbufs from vmstat -z as the mbufs are now stored using > UMA. In the critical section protected version of the code, I haven't yet > decided if the timers should run per-cpu, and/or how the sysctl should > coalesce the information for display. I hope to have much of this > resolved shortly. My current leaning is that a small amount of localized > and temporary inconsistency in the stats isn't a problem, so simply doing > a set of lockless reads across the per-cpu caches to update stats for > presentation should be fine, and that we can probably drop the timer > updates of statistics since the cache bucket balancing keeps things pretty > in sync. > > I haven't committed the move to critical sections yet as it's currently a > performance pessimization for the UP case, as entering a critical section > on UP is more expensive than acquiring a mutex. John Baldwin has patches > that remedy this, but hasn't yet merged them (there's also an instability > with them I've seen). I know that Stephen Uphoff has also been > investigating this issue. I am wondering if critical sections are overkill for this application since the interrupt blocking capability is not needed. Perhaps a mutex_lock_local, mutex_unlock_local that work on per CPU mutexes and require the current thread to be pinned to a CPU would be more appropriate. The functions would have the same functionality as mutex_lock/mutex_unlock but without using atomic operations. The "local" mutex would be a clone of the UP sleep mutex and the SMP sleep mutex and the local mutex could even be used together in any order. (With working priority inheritance) I think I will try this in my p4 mutex branch in the next days. A "do not preempt" section (Interrupt enabled) would be another possibility. Stephan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1102304723.56978.113.camel>