Date: Fri, 8 Dec 2000 20:52:20 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: msmith@FreeBSD.ORG (Mike Smith) Cc: tlambert@primenet.com (Terry Lambert), smp@FreeBSD.ORG Subject: Re: Netgraph and SMP Message-ID: <200012082052.NAA22447@usr01.primenet.com> In-Reply-To: <200012080533.eB85XRN00458@mass.osd.bsdi.com> from "Mike Smith" at Dec 07, 2000 09:33:27 PM
next in thread | previous in thread | raw e-mail | index | archive | help
> > Actually, you can just put it in non-cacheable memory, and the > > penalty will only be paid by the CPU(s) doing the referencing. > > Yes. And you'll pay the penalty *all* the time. At least when the > ping-pong is going on, there will be times when you'll hit the counter > valid in your own cache. Marking it uncacheable (or even write-back > cacheable) is worse. The absolute worst thing you can do on a multiprocessor system is contend shared resources, either stalling another CPU or causing a cache invalidation. > > Still, for a very large number of CPUs, this would work fine > > for all but frequently contended objects. > > Er. We're talking about an object which is susceptible to being *very* > frequently contended. Right. Which is why you break the contention domain so that it is _not_ contending between CPUs. That way, only one CPU will pay the penalty. In the UP case, you can decide to not mark the page non-cacheable. > > I think that it is making more and more sense to lock interrupts > > to a single CPU. > > No, it's not. Stop this nonsense. It's not even practical on some of > the platforms we're looking at. NT does it on every platform on which it runs. It significantly beat both Linux and FreeBSD in the Ziff Davis benchmarks you and Jordan attended, using this configuration. For the platforms where it's not possible, I agree: you eat the synchronization overhead. BTW: aren't some of these platforms MEI, and not MESI? > > What happens if you write to a page that's marked non-cachable > > on the CPU on which you are running, but cacheable on another > > CPU? Does it do the right thing, and update the cache on the > > caching CPU? > > Er, what are you smoking Terry? You never 'update' the cache on another > processor; the other processor snoops your cache/memory activity and > invalidates its own cache based on your broadcasts. Let me explain the model: You mark the page cacheable on the processor that will contend the resource at interrupt time, and you make it uncacheable on other processors. The question is whether the write through is immediate or delayed (if delayed, the main memory value could be incorrect when examined by another CPU), and whether a write to main memory by a CPU without it cached will result in an proper invalidation in the CPU that has it cached (if so, then the approach will work). What this gives you is no inter-CPU contention, unless the main memory location is written by a processor that is not the interrupt processor for the lock for the driver being held. In that case, the only invalidation is against a single CPU, not multiple CPUs. This isn't necessarily a strategy limited to locked interrupts. By allocating lock regions in cache line lengths, you can then practically guarantee that, on a heavily loaded system, the invalidation triggered by a CPU will, at most, invalidate the cache line stored in _one_ other CPU (the last one prior to take to the interrupt, assuming that it is moving around). On a less heavily loaded system, the cache line for the lock region may be valid in multiple CPUs (not having been recycled), in which case you will take additional invalidation overhead. But doing so is less problematic, since you can afford the overhead when the system is less heavily loaded. This is really a "virtually non-cacheable" approach. If you can lock specific interrupts to a particular CPU, as NT locked one network card per CPU in the Ziff-Davis tests, then you achieve the same thing: the contended region is not ever referenced by the other CPUs, except under exceptional conditions, like driver unload, and it is effectively not cached on them as a result, even if the pages are marked cacheable. Does that make more sense? > > If so, locking the interrupt processing for each > > card to a particular CPU could be very worthwhile, since you > > would never take the hit, unless you were doing something > > extraordinary. > > With the way our I/O structure is currently laid out, this blows because > you end up serialising everything. Not everything; just interrupts for a single card: the locking would occur at an interrupt granularity; I'm not talking about ASMP here. You also only end up serializing them to a single CPU; as long as your load is reasonably distributed, the CPU won't be doing other work, at the time. Also the locking need not be literal: it could be nothing more than a "strong affinity", requiring extra effort to implement any migration. This is, I believe, how NT does it. PS: Considering all this, it makes sense to me to perhaps consider ensuring that a disk interrupt for a DMA completion be handled on the CPU responsible for the network card that will be sending the data that the completion signals availability on. NetWare does something similar, in that its threads are based on voluntary, not involuntary preemption. Admitted, this means true ASMP, but it's hard to argue with NetWare's file server performance... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012082052.NAA22447>