Date: Fri, 13 Dec 1996 01:53:31 -0500 From: Bakul Shah <bakul@plexuscom.com> To: Michael Hancock <michaelh@cet.co.jp> Cc: FreeBSD Hackers <Hackers@freebsd.org> Subject: Re: Implementing atomic DCAS on Intel for NBS Message-ID: <199612130653.BAA17932@chai.plexuscom.com> In-Reply-To: Your message of "Fri, 13 Dec 1996 11:44:25 %2B0900." <Pine.SV4.3.95.961213112940.12235A-100000@parkplace.cet.co.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
> It doesn't look like the P5 has DCAS which is important for NBS because it > wants to atomically manipulate a structure and update it's version > number. P5/P6 have CMPXCHG. cmpxchg src, dst is defined to be if (accumulator==dst) { zeroflag = 1; dst = src; } else { zeroflag = 0; accumulator = dst; } When src is a register & dst is a memory addr, this instruction is the same as compare-and-swap (CAS). The processor implements using locked bus cycles so this is guaranteed to be atomic. Here is some crude, untested C+asm goop for a cas() function: /* * \begin{critical section} * if (*addr == old) { *addr = new; return 1; } else return 0; * \end{critical section} */ int cas(int* addr, register int old, register int new) { int r = 0; asm volatile("movl %0,%%eax": "=r"(old)); asm volatile("cmpxchg %0,%1": "=r"(new): "r"(addr)); asm volatile("je L1"); r = 1; asm("L1:"); return r; } Undoubtely this can be optmized and turned into an inline function if we can get rid of that L1 label. Anyway, you can implement DCAS in terms of CAS. Doing so for an MP kernel is easy because all the kernel threads can use the same global shared variable to guard the DCAS code. volatile int global_dcas_lock = 0; ... dcas(int *addr1, int* addr2, int old1, int old2, int new1, int new2) { int r; while (cas(&global_dcas_lock, 0, p->p_pid) == 0) continue; r = *addr1 == old1 && *addr2 == old2; if (r) { *addr1 == new1; *addr2 == new2; } global_dcas_lock = 0; return r; } Note that effectively we have reduced the number of locks to 1 and this lock is held for a very short duration. This is why the priority inversion problem does not even arise. If you want a user level DCAS, you can do something similar. But the user process can get preempted in the middle. So at the very least you have to provide enough info to the kernel so that it can roll you back out of a partial dcas(). If you don't want to use such a global variable, it gets quite a bit hairy. Proof left as an exercise. Structures like singly linked list, trees and so forth can be updated with just DCAS using the version numer idea. But updating a doubly linked list will require CAS3 (deletion will require updating version number, which may be stored in a list head, this->prev->next, this->next->prev). In general you may need vcas(), to atomically update N words. The above implementation easily extends to vcas(). Comments about NBS: In general, when the number of words read exceeds the number of written words (for shared data structures), NBS will be more efficient than lockIng. Deadlock simply can not happen because atmost there is just one shared lock. So in an MP kernel instead of a few (or one!) very coarse grained lock)s( to guard shared data structures you can use NBS and increase concurrency. You still have to identify shared data structures and figure out the boundaries of your transactions but intuitively this seems easier than increasing kernel concurrency by replacing a few coarse grain locks with many fine grained locks. Unfortunately I don't have time to test this theory :-( -- bakul
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612130653.BAA17932>