Date: Tue, 23 Nov 1999 09:51:29 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Peter Wemm <peter@netplex.com.au>, Tommy Hallgren <thallgren@yahoo.com>, freebsd-smp@FreeBSD.ORG Subject: more on... Re: Matt's new unlock optimiazation Message-ID: <199911231751.JAA10135@apollo.backplane.com> References: <19991123140128.3A7D41C6D@overcee.netplex.com.au> <199911231703.JAA09896@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
::> ::> The subject is: spin_unlock optimization(i386) :: ::A bit worrying, to say the least, especially coming from Linus (even moreso ::in light of his work at transmeta and what they're doing with/to Intel cpu's). :: ::Cheers, ::-Peter : : hmm. I was under the impression that the Pentium serialized writes : by reserving locations through their caches. But knowing Intel, Linus : is probably right. : : Sometimes I wish I could just take a gun to the Pentium. : : But this isn't a big deal, we should simply be able to do a locked : write into the per-cpu area to synchronize just before we release : the lock. This is still going to be a whole lot more efficient then : trying to lock a write to the shared lock, because we will almost certainly : already own that memory location. : : I'll run some tests and commit a solution Nobody commit anything. No : matter what, we still get the benefit of the recursion lock optimization : which is actually the more important one. Ok, there's a problem but I don't believe you have to use a locked instruction to get around it. All you should need to do is synchronize the instruction stream. I remember from somewhere that 'NOP' (which is really just and xchg instruction) does this. But I am not sure, I am going to have to do some more research. I did test and am correct about the cache line ownership change overhead. On an SMP box, with two competing processors, using a locked instruction on the *same* physical memory location results in 3x the overhead whereas the same locked instruction on different memory locations are more efficient. So if I can't find a definitive way to do instruction synchronization, we will simply do a dummy locked instruction into the per-cpu area. With cmpxchgl test3:/home/dillon# ./lock shared 165 nS/loop test3:/home/dillon# ./lock private 53 nS/loop With just xchgl test3:/home/dillon# ./lock shared 160 nS/loop test3:/home/dillon# ./lock private 47 nS/loop -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911231751.JAA10135>