Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Nov 1999 02:41:23 -0800 (PST)
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Charles Randall <crandall@matchlogic.com>, freebsd-smp@FreeBSD.ORG
Subject:   Re: RE: Big Giant Lock progress?
Message-ID:  <Pine.BSF.4.05.9911180217540.12797-100000@fw.wintelcom.net>
In-Reply-To: <199911180828.AAA79093@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 18 Nov 1999, Matthew Dillon wrote:

> :From: Matthew Dillon [mailto:dillon@apollo.backplane.com]
> :> Alfred got caught up in real life work so it's been on hold for a while.
> :
> :Has anyone profiled an SMP kernel in a standard role (Web server, NFS
> :server, development machine, etc) and compared the points of BGL contention
> :with the ease (or difficulty) of more fine-grained locking in those areas?
> :
> :In other words, have the "bang for the buck" areas been identified?
> :
> :Charles
> 
>     There are three major areas of interest:
> 
> 	* parallelizing within the network stack
> 
> 	* parallelizing network interrupts and the 
> 	  network stack
> 
> 	* parallelizing the cached read/write data
> 	  path, so the supervisor can copy data
> 	  to user processes on several cpu's at
> 	  once.

A fourth area that i'm most interested in for a base for the
other work is working on the low level routines that require
locking that's not visible, the big example is malloc which
splhigh()s while in use.

Although my coding hands have been busy it doesn't stop me
from thinking about this stuff in my sleep. :)

I've been thinking of something along the line of per-processor
pools of resources with high and low watermarks to determine when
to borrow/return from a global pool.  This ought to reduce malloc
contention quite a bit.  Since a CPU never has to worry about any
other CPU toucing its memory pools it doesn't need to lock anything
unless the private pool is exhausted.  This is discussed in Vahalia's
book in re the Dynix allocator.

I've also spoken to Alan Cox (from Linux) and he's explained that
the way Linux deals with malloc from device drivers (*) is that
there is an 'atomic' memory pool from which drivers grab memory
from.

(*) the problem of interrupts for those just joining the discusion
is that they may cause a recursive attempt on a lock.  With
the bgl it's ok (exclusive counting semaphore) but with plain
spinlocks it leads to deadlock if the CPU holding the lock (on
let's say the malloc pool) is interrupted and the interrupt then
tries to spin on the lock already held.

I like this idea quite a bit (back to atomic memory pools), combined
with a per-cpu pool of memory that can be grabbed in an atomic
fashion we can reduce a major contention problem as well as interrupt
allocations.

The problem is possible pre-mature out-of-memory-situations, but
I'm confident that tuning the high/low watermarks for allocations
and atomic pools can make that a rare occurrance.

Fifth:
It's also very important that the scheduler becomes MP safe.

>     A whole lot of groundwork needs to happen before
>     we can do any of this stuff, though.  A previous
>     attempt to optimizing just #3 in uiomove did not
>     produce very good results, mainly oweing to the
>     bgl being held too long in other places.

I wasn't around when this was attempted, did the code only
touch the BGL when the amount to copy was greater than let's
say 2k?  Or was the bgl toggled on every uiomove?

>     There are also some neat optimizations that 
>     can be done, especially with the simplelocks.
>     For example, when unlocking a simplelock you do
>     not need to used a locked instruction or even
>     a cmpexg instruction, because you already own
>     the lock and nobody else can mess with it.
>     Nor do you need to use a locked assembly instruction
>     when bumping the ref count on a simplelock you
>     already hold.  I think I am going to commit those even 
>     without Alfred's work, once I separate them out
>     and have a little time, because they at least double 
>     the speed of the simplelocks.

It'd be great to get that code committed asap, it's really a keen
observation and the benefit are immediate and un-obtrusive.

-Alfred

> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9911180217540.12797-100000>