From owner-svn-src-head@FreeBSD.ORG Sat Jun 29 13:16:44 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7E7BAE2; Sat, 29 Jun 2013 13:16:44 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 3AC66166A; Sat, 29 Jun 2013 13:16:43 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r5TDGTEU068759 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 29 Jun 2013 06:16:34 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51CEDE2B.60204@freebsd.org> Date: Sat, 29 Jun 2013 21:16:27 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: John Baldwin Subject: Re: svn commit: r252346 - head/share/man/man9 References: <201306281633.r5SGXjFU017827@svn.freebsd.org> In-Reply-To: <201306281633.r5SGXjFU017827@svn.freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jun 2013 13:16:44 -0000 thanks! On 6/29/13 12:33 AM, John Baldwin wrote: > Author: jhb > Date: Fri Jun 28 16:33:45 2013 > New Revision: 252346 > URL: http://svnweb.freebsd.org/changeset/base/252346 > > Log: > Make a pass over this page to correct and clarify a few things as well as > some general word-smithing. > - Don't claim that adaptive mutexes have a timeout (they don't). > - Don't treat pool mutexes as a separate primitive in a few places. > - Describe sleepable read-mostly locks as a separate lock type and add > them to the various tables. > - Don't claim that sx locks are less efficient. That hasn't been true in > a few years now. > - Describe lockmanager locks next to sx locks since they are very similar > in terms of rules, etc., and so that all the lock primitives are > grouped together before the non-lock primitives. > - Similarly, move the section on Giant after the description of all the > non-lock primitives to preserve grouping. > - Condition variables work on several types of locks, not just mutexes. > - Add a bit of language to compare/contrast condition variables with > sleep/wakeup. > - Add a note about why pause(9) is unique. > - Add some language to define bounded vs unbounded sleeps and explain > why they are treated separately (bounded sleeps only need CPU time > to make forward progress). > - Don't state that using mtx_sleep() is a bad idea. It is in fact rather > necessary. > - Rework the interaction table a bit. First, it did not include really > include sleepable rmlocks and it left out lockmgr entirely. To get > things to fit, combine similar lock types into the same column / row, > and explicitly state what "sleep" means. The notes about recursion > and lock order were also a bit banal (lock order is always important, > not just in the few places annotated here), so remove them. In > particular, the lock order note would need to be on just about every > cell. If we want to document recursion I think a better approach > would be a separate table summarizing the recursion rules for each > lock as having too many notes clutters the table. > - Tweak the tables to use less indentation so everything still fits with > the added columns. > - Correct a few cells in the context mode table. > - Use mdoc markup instead of explicit markup in a few places. > > Requested by: julian > MFC after: 2 weeks > > Modified: > head/share/man/man9/locking.9 > > Modified: head/share/man/man9/locking.9 > ============================================================================== > --- head/share/man/man9/locking.9 Fri Jun 28 16:24:14 2013 (r252345) > +++ head/share/man/man9/locking.9 Fri Jun 28 16:33:45 2013 (r252346) > @@ -33,53 +33,52 @@ > .Sh DESCRIPTION > The > .Em FreeBSD > -kernel is written to run across multiple CPUs and as such requires > -several different synchronization primitives to allow the developers > -to safely access and manipulate the many data types required. > +kernel is written to run across multiple CPUs and as such provides > +several different synchronization primitives to allow developers > +to safely access and manipulate many data types. > .Ss Mutexes > -Mutexes (also erroneously called "sleep mutexes") are the most commonly used > +Mutexes (also called "blocking mutexes") are the most commonly used > synchronization primitive in the kernel. > A thread acquires (locks) a mutex before accessing data shared with other > threads (including interrupt threads), and releases (unlocks) it afterwards. > If the mutex cannot be acquired, the thread requesting it will wait. > -Mutexes are by default adaptive, meaning that > +Mutexes are adaptive by default, meaning that > if the owner of a contended mutex is currently running on another CPU, > -then a thread attempting to acquire the mutex will briefly spin > -in the hope that the owner is only briefly holding it, > -and might release it shortly. > -If the owner does not do so, the waiting thread proceeds to yield the processor, > -allowing other threads to run. > -If the owner is not currently actually running then the spin step is skipped. > +then a thread attempting to acquire the mutex will spin rather than yielding > +the processor. > Mutexes fully support priority propagation. > .Pp > See > .Xr mutex 9 > for details. > -.Ss Spin mutexes > -Spin mutexes are variation of basic mutexes; the main difference between > -the two is that spin mutexes never yield the processor - instead, they spin, > -waiting for the thread holding the lock, > -(which must be running on another CPU), to release it. > -Spin mutexes disable interrupts while the held so as to not get pre-empted. > -Since disabling interrupts is expensive, they are also generally slower. > -Spin mutexes should be used only when necessary, e.g. to protect data shared > +.Ss Spin Mutexes > +Spin mutexes are a variation of basic mutexes; the main difference between > +the two is that spin mutexes never block. > +Instead, they spin while waiting for the lock to be released. > +Note that a thread that holds a spin mutex must never yield its CPU to > +avoid deadlock. > +Unlike ordinary mutexes, spin mutexes disable interrupts when acquired. > +Since disabling interrupts can be expensive, they are generally slower to > +acquire and release. > +Spin mutexes should be used only when absolutely necessary, > +e.g. to protect data shared > with interrupt filter code (see > .Xr bus_setup_intr 9 > -for details). > -.Ss Pool mutexes > -With most synchronization primitives, such as mutexes, programmer must > -provide a piece of allocated memory to hold the primitive. > +for details), > +or for scheduler internals. > +.Ss Mutex Pools > +With most synchronization primitives, such as mutexes, the programmer must > +provide memory to hold the primitive. > For example, a mutex may be embedded inside the structure it protects. > -Pool mutex is a variant of mutex without this requirement - to lock or unlock > -a pool mutex, one uses address of the structure being protected with it, > -not the mutex itself. > -Pool mutexes are seldom used. > +Mutex pools provide a preallocated set of mutexes to avoid this > +requirement. > +Note that mutexes from a pool may only be used as leaf locks. > .Pp > See > .Xr mtx_pool 9 > for details. > -.Ss Reader/writer locks > -Reader/writer locks allow shared access to protected data by multiple threads, > +.Ss Reader/Writer Locks > +Reader/writer locks allow shared access to protected data by multiple threads > or exclusive access by a single thread. > The threads with shared access are known as > .Em readers > @@ -91,26 +90,16 @@ since it may modify protected data. > Reader/writer locks can be treated as mutexes (see above and > .Xr mutex 9 ) > with shared/exclusive semantics. > -More specifically, regular mutexes can be > -considered to be equivalent to a write-lock on an > -.Em rw_lock. > -The > -.Em rw_lock > -locks have priority propagation like mutexes, but priority > -can be propagated only to an exclusive holder. > +Reader/writer locks support priority propagation like mutexes, > +but priority is propagated only to an exclusive holder. > This limitation comes from the fact that shared owners > are anonymous. > -Another important property is that shared holders of > -.Em rw_lock > -can recurse, but exclusive locks are not allowed to recurse. > -This ability should not be used lightly and > -.Em may go away. > .Pp > See > .Xr rwlock 9 > for details. > -.Ss Read-mostly locks > -Mostly reader locks are similar to > +.Ss Read-Mostly Locks > +Read-mostly locks are similar to > .Em reader/writer > locks but optimized for very infrequent write locking. > .Em Read-mostly > @@ -122,21 +111,41 @@ data structure. > See > .Xr rmlock 9 > for details. > +.Ss Sleepable Read-Mostly Locks > +Sleepable read-mostly locks are a variation on read-mostly locks. > +Threads holding an exclusive lock may sleep, > +but threads holding a shared lock may not. > +Priority is propagated to shared owners but not to exclusive owners. > .Ss Shared/exclusive locks > Shared/exclusive locks are similar to reader/writer locks; the main difference > -between them is that shared/exclusive locks may be held during unbounded sleep > -(and may thus perform an unbounded sleep). > -They are inherently less efficient than mutexes, reader/writer locks > -and read-mostly locks. > -They do not support priority propagation. > -They should be considered to be closely related to > -.Xr sleep 9 . > -They could in some cases be > -considered a conditional sleep. > +between them is that shared/exclusive locks may be held during unbounded sleep. > +Acquiring a contested shared/exclusive lock can perform an unbounded sleep. > +These locks do not support priority propagation. > .Pp > See > .Xr sx 9 > for details. > +.Ss Lockmanager locks > +Lockmanager locks are sleepable shared/exclusive locks used mostly in > +.Xr VFS 9 > +.Po > +as a > +.Xr vnode 9 > +lock > +.Pc > +and in the buffer cache > +.Po > +.Xr BUF_LOCK 9 > +.Pc . > +They have features other lock types do not have such as sleep > +timeouts, blocking upgrades, > +writer starvation avoidance, draining, and an interlock mutex, > +but this makes them complicated to both use and implement; > +for this reason, they should be avoided. > +.Pp > +See > +.Xr lock 9 > +for details. > .Ss Counting semaphores > Counting semaphores provide a mechanism for synchronizing access > to a pool of resources. > @@ -149,43 +158,21 @@ See > .Xr sema 9 > for details. > .Ss Condition variables > -Condition variables are used in conjunction with mutexes to wait for > -conditions to occur. > -A thread must hold the mutex before calling the > -.Fn cv_wait* , > +Condition variables are used in conjunction with locks to wait for > +a condition to become true. > +A thread must hold the associated lock before calling one of the > +.Fn cv_wait , > functions. > -When a thread waits on a condition, the mutex > -is atomically released before the thread yields the processor, > -then reacquired before the function call returns. > +When a thread waits on a condition, the lock > +is atomically released before the thread yields the processor > +and reacquired before the function call returns. > +Condition variables may be used with blocking mutexes, > +reader/writer locks, read-mostly locks, and shared/exclusive locks. > .Pp > See > .Xr condvar 9 > for details. > -.Ss Giant > -Giant is an instance of a mutex, with some special characteristics: > -.Bl -enum > -.It > -It is recursive. > -.It > -Drivers can request that Giant be locked around them > -by not marking themselves MPSAFE. > -Note that infrastructure to do this is slowly going away as non-MPSAFE > -drivers either became properly locked or disappear. > -.It > -Giant must be locked first before other locks. > -.It > -It is OK to hold Giant while performing unbounded sleep; in such case, > -Giant will be dropped before sleeping and picked up after wakeup. > -.It > -There are places in the kernel that drop Giant and pick it back up > -again. > -Sleep locks will do this before sleeping. > -Parts of the network or VM code may do this as well, depending on the > -setting of a sysctl. > -This means that you cannot count on Giant keeping other code from > -running if your code sleeps, even if you want it to. > -.El > -.Ss Sleep/wakeup > +.Ss Sleep/Wakeup > The functions > .Fn tsleep , > .Fn msleep , > @@ -194,7 +181,12 @@ The functions > .Fn wakeup , > and > .Fn wakeup_one > -handle event-based thread blocking. > +also handle event-based thread blocking. > +Unlike condition variables, > +arbitrary addresses may be used as wait channels and an dedicated > +structure does not need to be allocated. > +However, care must be taken to ensure that wait channel addresses are > +unique to an event. > If a thread must wait for an external event, it is put to sleep by > .Fn tsleep , > .Fn msleep , > @@ -214,9 +206,10 @@ the thread is being put to sleep. > All threads sleeping on a single > .Fa chan > are woken up later by > -.Fn wakeup , > -often called from inside an interrupt routine, to indicate that the > -resource the thread was blocking on is available now. > +.Fn wakeup > +.Pq often called from inside an interrupt routine > +to indicate that the > +event the thread was blocking on has occurred. > .Pp > Several of the sleep functions including > .Fn msleep , > @@ -232,122 +225,168 @@ includes the > flag, then the lock will not be reacquired before returning. > The lock is used to ensure that a condition can be checked atomically, > and that the current thread can be suspended without missing a > -change to the condition, or an associated wakeup. > +change to the condition or an associated wakeup. > In addition, all of the sleep routines will fully drop the > .Va Giant > mutex > -(even if recursed) > +.Pq even if recursed > while the thread is suspended and will reacquire the > .Va Giant > -mutex before the function returns. > +mutex > +.Pq restoring any recursion > +before the function returns. > .Pp > -See > -.Xr sleep 9 > -for details. > -.Ss Lockmanager locks > -Shared/exclusive locks, used mostly in > -.Xr VFS 9 , > -in particular as a > -.Xr vnode 9 > -lock. > -They have features other lock types do not have, such as sleep timeout, > -writer starvation avoidance, draining, and interlock mutex, but this makes them > -complicated to implement; for this reason, they are deprecated. > +The > +.Fn pause > +function is a special sleep function that waits for a specified > +amount of time to pass before the thread resumes execution. > +This sleep cannot be terminated early by either an explicit > +.Fn wakeup > +or a signal. > .Pp > See > -.Xr lock 9 > +.Xr sleep 9 > for details. > +.Ss Giant > +Giant is a special mutex used to protect data structures that do not > +yet have their own locks. > +Since it provides semantics akin to the old > +.Xr spl 9 > +interface, > +Giant has special characteristics: > +.Bl -enum > +.It > +It is recursive. > +.It > +Drivers can request that Giant be locked around them > +by not marking themselves MPSAFE. > +Note that infrastructure to do this is slowly going away as non-MPSAFE > +drivers either became properly locked or disappear. > +.It > +Giant must be locked before other non-sleepable locks. > +.It > +Giant is dropped during unbounded sleeps and reacquired after wakeup. > +.It > +There are places in the kernel that drop Giant and pick it back up > +again. > +Sleep locks will do this before sleeping. > +Parts of the network or VM code may do this as well. > +This means that you cannot count on Giant keeping other code from > +running if your code sleeps, even if you want it to. > +.El > .Sh INTERACTIONS > -The primitives interact and have a number of rules regarding how > +The primitives can interact and have a number of rules regarding how > they can and can not be combined. > -Many of these rules are checked using the > -.Xr witness 4 > -code. > -.Ss Bounded vs. unbounded sleep > -The following primitives perform bounded sleep: > - mutexes, pool mutexes, reader/writer locks and read-mostly locks. > -.Pp > -The following primitives may perform an unbounded sleep: > -shared/exclusive locks, counting semaphores, condition variables, sleep/wakeup and lockmanager locks. > -.Pp > +Many of these rules are checked by > +.Xr witness 4 . > +.Ss Bounded vs. Unbounded Sleep > +A bounded sleep > +.Pq or blocking > +is a sleep where the only resource needed to resume execution of a thread > +is CPU time for the owner of a lock that the thread is waiting to acquire. > +An unbounded sleep > +.Po > +often referred to as simply > +.Dq sleeping > +.Pc > +is a sleep where a thread is waiting for an external event or for a condition > +to become true. > +In particular, > +since there is always CPU time available, > +a dependency chain of threads in bounded sleeps should always make forward > +progress. > +This requires that no thread in a bounded sleep is waiting for a lock held > +by a thread in an unbounded sleep. > +To avoid priority inversions, > +a thread in a bounded sleep lends its priority to the owner of the lock > +that it is waiting for. > +.Pp > +The following primitives perform bounded sleeps: > +mutexes, reader/writer locks and read-mostly locks. > +.Pp > +The following primitives perform unbounded sleeps: > +sleepable read-mostly locks, shared/exclusive locks, lockmanager locks, > +counting semaphores, condition variables, and sleep/wakeup. > +.Ss General Principles > +.Bl -bullet > +.It > It is an error to do any operation that could result in yielding the processor > while holding a spin mutex. > +.It > +It is an error to do any operation that could result in unbounded sleep > +while holding any primitive from the 'bounded sleep' group. > +For example, it is an error to try to acquire a shared/exclusive lock while > +holding a mutex, or to try to allocate memory with M_WAITOK while holding a > +reader/writer lock. > .Pp > -As a general rule, it is an error to do any operation that could result > -in unbounded sleep while holding any primitive from the 'bounded sleep' group. > -For example, it is an error to try to acquire shared/exclusive lock while > -holding mutex, or to try to allocate memory with M_WAITOK while holding > -read-write lock. > -.Pp > -As a special case, it is possible to call > +Note that the lock passed to one of the > .Fn sleep > or > -.Fn mtx_sleep > -while holding a single mutex. > -It will atomically drop that mutex and reacquire it as part of waking up. > -This is often a bad idea because it generally relies on the programmer having > -good knowledge of all of the call graph above the place where > -.Fn mtx_sleep > -is being called and assumptions the calling code has made. > -Because the lock gets dropped during sleep, one must re-test all > -the assumptions that were made before, all the way up the call graph to the > -place where the lock was acquired. > -.Pp > +.Fn cv_wait > +functions is dropped before the thread enters the unbounded sleep and does > +not violate this rule. > +.It > It is an error to do any operation that could result in yielding of > the processor when running inside an interrupt filter. > -.Pp > +.It > It is an error to do any operation that could result in unbounded sleep when > running inside an interrupt thread. > +.El > .Ss Interaction table > The following table shows what you can and can not do while holding > -one of the synchronization primitives discussed: > -.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent > -.It Em " You want:" Ta spin-mtx Ta mutex Ta rwlock Ta rmlock Ta sx Ta sleep > -.It Em "You have: " Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ > -.It spin mtx Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3 > -.It mutex Ta \&ok Ta \&ok-1 Ta \&ok Ta \&ok Ta \&no Ta \&no-3 > -.It rwlock Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&no Ta \&no-3 > -.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&no-5 Ta \&no-5 > -.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&no-2 Ta \&ok-4 > +one of the locking primitives discussed. Note that > +.Dq sleep > +includes > +.Fn sema_wait , > +.Fn sema_timedwait , > +any of the > +.Fn cv_wait > +functions, > +and any of the > +.Fn sleep > +functions. > +.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n > +.It Em " You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep > +.It Em "You have: " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------ > +.It spin mtx Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1 > +.It mutex/rw Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 > +.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 > +.It sleep rm Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3 > +.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3 > +.It lockmgr Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok > .El > .Pp > .Em *1 > -Recursion is defined per lock. > -Lock order is important. > +There are calls that atomically release this primitive when going to sleep > +and reacquire it on wakeup > +.Po > +.Fn mtx_sleep , > +.Fn rw_sleep , > +.Fn msleep_spin , > +etc. > +.Pc . > .Pp > .Em *2 > -Readers can recurse though writers can not. > -Lock order is important. > +These cases are only allowed while holding a write lock on a sleepable > +read-mostly lock. > .Pp > .Em *3 > -There are calls that atomically release this primitive when going to sleep > -and reacquire it on wakeup (e.g. > -.Fn mtx_sleep , > -.Fn rw_sleep > -and > -.Fn msleep_spin ) . > -.Pp > -.Em *4 > -Though one can sleep holding an sx lock, one can also use > -.Fn sx_sleep > -which will atomically release this primitive when going to sleep and > +Though one can sleep while holding this lock, > +one can also use a > +.Fn sleep > +function to atomically release this primitive when going to sleep and > reacquire it on wakeup. > .Pp > -.Em *5 > -.Em Read-mostly > -locks can be initialized to support sleeping while holding a write lock. > -See > -.Xr rmlock 9 > -for details. > +Note that non-blocking try operations on locks are always permitted. > .Ss Context mode table > The next table shows what can be used in different contexts. > At this time this is a rather easy to remember table. > -.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent > -.It Em "Context:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep > +.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n > +.It Em "Context:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep > .It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no > -.It interrupt thread: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no > -.It callout: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no > -.It syscall: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok > +.It interrupt thread: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no > +.It callout: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no > +.It system call: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok > .El > .Sh SEE ALSO > .Xr witness 4 , >