From owner-freebsd-smp Tue Nov 23 9:52:25 1999 Delivered-To: freebsd-smp@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 329BA14C05 for ; Tue, 23 Nov 1999 09:52:15 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id JAA10135; Tue, 23 Nov 1999 09:51:29 -0800 (PST) (envelope-from dillon) Date: Tue, 23 Nov 1999 09:51:29 -0800 (PST) From: Matthew Dillon Message-Id: <199911231751.JAA10135@apollo.backplane.com> To: Peter Wemm , Tommy Hallgren , freebsd-smp@FreeBSD.ORG Subject: more on... Re: Matt's new unlock optimiazation References: <19991123140128.3A7D41C6D@overcee.netplex.com.au> <199911231703.JAA09896@apollo.backplane.com> Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org ::> ::> The subject is: spin_unlock optimization(i386) :: ::A bit worrying, to say the least, especially coming from Linus (even moreso ::in light of his work at transmeta and what they're doing with/to Intel cpu's). :: ::Cheers, ::-Peter : : hmm. I was under the impression that the Pentium serialized writes : by reserving locations through their caches. But knowing Intel, Linus : is probably right. : : Sometimes I wish I could just take a gun to the Pentium. : : But this isn't a big deal, we should simply be able to do a locked : write into the per-cpu area to synchronize just before we release : the lock. This is still going to be a whole lot more efficient then : trying to lock a write to the shared lock, because we will almost certainly : already own that memory location. : : I'll run some tests and commit a solution Nobody commit anything. No : matter what, we still get the benefit of the recursion lock optimization : which is actually the more important one. Ok, there's a problem but I don't believe you have to use a locked instruction to get around it. All you should need to do is synchronize the instruction stream. I remember from somewhere that 'NOP' (which is really just and xchg instruction) does this. But I am not sure, I am going to have to do some more research. I did test and am correct about the cache line ownership change overhead. On an SMP box, with two competing processors, using a locked instruction on the *same* physical memory location results in 3x the overhead whereas the same locked instruction on different memory locations are more efficient. So if I can't find a definitive way to do instruction synchronization, we will simply do a dummy locked instruction into the per-cpu area. With cmpxchgl test3:/home/dillon# ./lock shared 165 nS/loop test3:/home/dillon# ./lock private 53 nS/loop With just xchgl test3:/home/dillon# ./lock shared 160 nS/loop test3:/home/dillon# ./lock private 47 nS/loop -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message