From owner-freebsd-smp  Thu Dec  7 19:50:18 2000
From owner-freebsd-smp@FreeBSD.ORG  Thu Dec  7 19:50:16 2000
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP
	id 994E537B400; Thu,  7 Dec 2000 19:50:15 -0800 (PST)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.9.3/8.9.3) id UAA22273;
	Thu, 7 Dec 2000 20:45:38 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpdAAApOaWDR; Thu Dec  7 20:45:30 2000
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id UAA03298;
	Thu, 7 Dec 2000 20:50:04 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200012080350.UAA03298@usr08.primenet.com>
Subject: Re: Netgraph and SMP
To: msmith@FreeBSD.ORG (Mike Smith)
Date: Fri, 8 Dec 2000 03:50:04 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert), smp@FreeBSD.ORG
In-Reply-To: <200012080332.eB83WtF00456@mass.osd.bsdi.com> from "Mike Smith" at Dec 07, 2000 07:32:55 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: tlambert@usr08.primenet.com
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > In Solaris, the entry into the driver would hold a reference,
> > which would result in the reference count being incremented.
> > Only modules with a 0 reference count can be unloaded.  This
> > same mechanism is used for vnodes, and for modules on which
> > other modules depend.  It works well, ans is very light weight.
> 
> The whole problem is that it *isn't* very light weight.
> 
> The reference count has to be atomic, which means that it ping-pongs 
> around from CPU to CPU, causing a lot of extra cache traffic.
> 
> OTOH, there's not much we can do about this short of going looking for 
> better multi-CPU reference count implementations once we have time to 
> worry about performance.

Actually, you can just put it in non-cacheable memory, and the
penalty will only be paid by the CPU(s) doing the referencing.

This means a clock multiplier worth of cycles, though, to get
it in and out of main memory from the CPU.  Back when all this
started, clock multipliers weren't 1/5th the problem they pose
today...

Still, for a very large number of CPUs, this would work fine
for all but frequently contended objects.

I think that it is making more and more sense to lock interrupts
to a single CPU.

What happens if you write to a page that's marked non-cachable
on the CPU on which you are running, but cacheable on another
CPU?  Does it do the right thing, and update the cache on the
caching CPU?  If so, locking the interrupt processing for each
card to a particular CPU could be very worthwhile, since you
would never take the hit, unless you were doing something
extraordinary.

BTW, you would want to grab a ref, check for the heavy lock,
and back off it it were held.  The unload would want to grab
the heavy lock, grab a ref, and do its work when everyone is
backed off (ref = 1).  This ordering would ensure the least
overhead for the normal case (no heavy lock, ref > 0).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message