From owner-freebsd-smp@FreeBSD.ORG Fri Aug 31 08:13:06 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 47E2216A418; Fri, 31 Aug 2007 08:13:06 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 38F0813C458; Fri, 31 Aug 2007 08:13:05 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 288901A4D89; Fri, 31 Aug 2007 00:10:48 -0700 (PDT) Date: Fri, 31 Aug 2007 00:10:48 -0700 From: Alfred Perlstein To: smp@freebsd.org Message-ID: <20070831071048.GF87451@elvis.mu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="PPYy/fEw/8QCHSq3" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: attilio@freebsd.org Subject: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2007 08:13:06 -0000 --PPYy/fEw/8QCHSq3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi guys, Some work here at work was approved for sharing with community so I'm posting it here in hope of a review. We run some pretty good stress testing on our code, so I think it's pretty solid. My only concern is that I've tried my best to preserve kernel source API, but not binary compat though a few simple #defines. I can make binary compat, in albeit a somewhat confusing manner, but that will require some rototilling and weird renaming of calls to the sleepq and turnstile code. In short, I'd rather not, but I will if you think it's something that should be done. There's also a few placeholders for lock profiling which I will very likely be backporting shortly as well. Patch is attached. Comments/questions? -- - Alfred Perlstein --PPYy/fEw/8QCHSq3 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="netsmp_rwlock_freebsd6_08312007.diff" Index: conf/NOTES =================================================================== RCS file: /cvs/ncvs/src/sys/conf/NOTES,v retrieving revision 1.1325.2.36 diff -u -r1.1325.2.36 NOTES --- conf/NOTES 8 Jul 2007 15:30:28 -0000 1.1325.2.36 +++ conf/NOTES 31 Aug 2007 00:39:59 -0000 @@ -189,12 +189,26 @@ # to disable it. options NO_ADAPTIVE_MUTEXES +# ADAPTIVE_RWLOCKS changes the behavior of reader/writer locks to spin +# if the thread that currently owns the rwlock is executing on another +# CPU. This behaviour is enabled by default, so this option can be used +# to disable it. +options NO_ADAPTIVE_RWLOCKS + + # ADAPTIVE_GIANT causes the Giant lock to also be made adaptive when # running without NO_ADAPTIVE_MUTEXES. Normally, because Giant is assumed # to be held for extended periods, contention on Giant will cause a thread # to sleep rather than spinning. options ADAPTIVE_GIANT + +# ADAPTIVE_SX changes the behavior of sx locks to spin if the thread +# that currently owns the lock is executing on another CPU. Note that +# in addition to enabling this option, individual sx locks must be +# initialized with the SX_ADAPTIVESPIN flag. +options ADAPTIVE_SX + # MUTEX_NOINLINE forces mutex operations to call functions to perform each # operation rather than inlining the simple cases. This can be used to # shrink the size of the kernel text segment. Note that this behavior is @@ -207,6 +221,20 @@ # priority waiter. options MUTEX_WAKE_ALL +# RWLOCK_NOINLINE forces rwlock operations to call functions to perform each +# operation rather than inlining the simple cases. This can be used to +# shrink the size of the kernel text segment. Note that this behavior is +# already implied by the INVARIANT_SUPPORT, INVARIANTS, KTR, LOCK_PROFILING, +# and WITNESS options. +options RWLOCK_NOINLINE + +# SX_NOINLINE forces sx lock operations to call functions to perform each +# operation rather than inlining the simple cases. This can be used to +# shrink the size of the kernel text segment. Note that this behavior is +# already implied by the INVARIANT_SUPPORT, INVARIANTS, KTR, LOCK_PROFILING, +# and WITNESS options. +options SX_NOINLINE + # SMP Debugging Options: # # PREEMPTION allows the threads that are in the kernel to be preempted Index: conf/files =================================================================== RCS file: /cvs/ncvs/src/sys/conf/files,v retrieving revision 1.1031.2.67 diff -u -r1.1031.2.67 files --- conf/files 23 Aug 2007 22:30:14 -0000 1.1031.2.67 +++ conf/files 31 Aug 2007 00:39:59 -0000 @@ -1312,6 +1312,7 @@ kern/kern_proc.c standard kern/kern_prot.c standard kern/kern_resource.c standard +kern/kern_rwlock.c standard kern/kern_sema.c standard kern/kern_shutdown.c standard kern/kern_sig.c standard Index: conf/options =================================================================== RCS file: /cvs/ncvs/src/sys/conf/options,v retrieving revision 1.510.2.21 diff -u -r1.510.2.21 options --- conf/options 8 Jul 2007 15:30:28 -0000 1.510.2.21 +++ conf/options 31 Aug 2007 00:39:59 -0000 @@ -60,7 +60,9 @@ # Miscellaneous options. ADAPTIVE_GIANT opt_adaptive_mutexes.h +ADAPTIVE_SX NO_ADAPTIVE_MUTEXES opt_adaptive_mutexes.h +NO_ADAPTIVE_RWLOCKS ALQ AUDIT opt_global.h CODA_COMPAT_5 opt_coda.h @@ -517,6 +519,8 @@ MSIZE opt_global.h REGRESSION opt_global.h RESTARTABLE_PANICS opt_global.h +RWLOCK_NOINLINE opt_global.h +SX_NOINLINE opt_global.h VFS_BIO_DEBUG opt_global.h # These are VM related options Index: dev/acpica/acpi_ec.c =================================================================== RCS file: /cvs/ncvs/src/sys/dev/acpica/acpi_ec.c,v retrieving revision 1.65.2.2 diff -u -r1.65.2.2 acpi_ec.c --- dev/acpica/acpi_ec.c 11 May 2006 17:41:00 -0000 1.65.2.2 +++ dev/acpica/acpi_ec.c 31 Aug 2007 01:20:08 -0000 @@ -144,6 +144,7 @@ #include #include #include +#include #include #include Index: kern/kern_ktrace.c =================================================================== RCS file: /cvs/ncvs/src/sys/kern/kern_ktrace.c,v retrieving revision 1.101.2.5 diff -u -r1.101.2.5 kern_ktrace.c --- kern/kern_ktrace.c 6 Sep 2006 21:43:59 -0000 1.101.2.5 +++ kern/kern_ktrace.c 31 Aug 2007 00:39:59 -0000 @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include cvs diff: kern/kern_rwlock.c is a new entry, no comparison available Index: kern/kern_sx.c =================================================================== RCS file: /cvs/ncvs/src/sys/kern/kern_sx.c,v retrieving revision 1.25.2.4 diff -u -r1.25.2.4 kern_sx.c --- kern/kern_sx.c 17 Aug 2006 19:53:06 -0000 1.25.2.4 +++ kern/kern_sx.c 31 Aug 2007 01:48:11 -0000 @@ -1,12 +1,14 @@ /*- - * Copyright (C) 2001 Jason Evans . All rights reserved. + * Copyright (c) 2007 Attilio Rao + * Copyright (c) 2001 Jason Evans + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice(s), this list of conditions and the following disclaimer as - * the first lines of this file unmodified other than the possible + * the first lines of this file unmodified other than the possible * addition of one or more copyright notices. * 2. Redistributions in binary form must reproduce the above copyright * notice(s), this list of conditions and the following disclaimer in the @@ -26,39 +28,90 @@ */ /* - * Shared/exclusive locks. This implementation assures deterministic lock - * granting behavior, so that slocks and xlocks are interleaved. + * Shared/exclusive locks. This implementation attempts to ensure + * deterministic lock granting behavior, so that slocks and xlocks are + * interleaved. * * Priority propagation will not generally raise the priority of lock holders, * so should not be relied upon in combination with sx locks. */ -#include -__FBSDID("$FreeBSD: src/sys/kern/kern_sx.c,v 1.25.2.4 2006/08/17 19:53:06 jhb Exp $"); - +#include "opt_adaptive_sx.h" #include "opt_ddb.h" +#include +__FBSDID("$FreeBSD: src/sys/kern/kern_sx.c,v 1.54 2007/07/06 13:20:44 attilio Exp $"); + #include -#include #include -#include -#include #include #include #include +#include #include +#include + +#ifdef ADAPTIVE_SX +#include +#endif #ifdef DDB #include +#endif + +#if !defined(SMP) && defined(ADAPTIVE_SX) +#error "You must have SMP to enable the ADAPTIVE_SX option" +#endif + +CTASSERT(((SX_ADAPTIVESPIN | SX_RECURSE) & LO_CLASSFLAGS) == + (SX_ADAPTIVESPIN | SX_RECURSE)); + +/* Handy macros for sleep queues. */ +#define SQ_EXCLUSIVE_QUEUE 0 +#define SQ_SHARED_QUEUE 1 +/* + * Variations on DROP_GIANT()/PICKUP_GIANT() for use in this file. We + * drop Giant anytime we have to sleep or if we adaptively spin. + */ +#define GIANT_DECLARE \ + int _giantcnt = 0; \ + WITNESS_SAVE_DECL(Giant) \ + +#define GIANT_SAVE() do { \ + if (mtx_owned(&Giant)) { \ + WITNESS_SAVE(&Giant.mtx_object, Giant); \ + while (mtx_owned(&Giant)) { \ + _giantcnt++; \ + mtx_unlock(&Giant); \ + } \ + } \ +} while (0) + +#define GIANT_RESTORE() do { \ + if (_giantcnt > 0) { \ + mtx_assert(&Giant, MA_NOTOWNED); \ + while (_giantcnt--) \ + mtx_lock(&Giant); \ + WITNESS_RESTORE(&Giant.mtx_object, Giant); \ + } \ +} while (0) + +/* + * Returns true if an exclusive lock is recursed. It assumes + * curthread currently has an exclusive lock. + */ +#define sx_recursed(sx) ((sx)->sx_recurse != 0) + +#ifdef DDB static void db_show_sx(struct lock_object *lock); #endif struct lock_class lock_class_sx = { - "sx", - LC_SLEEPLOCK | LC_SLEEPABLE | LC_RECURSABLE | LC_UPGRADABLE, + .lc_name = "sx", + .lc_flags = LC_SLEEPLOCK | LC_SLEEPABLE | LC_RECURSABLE | LC_UPGRADABLE, #ifdef DDB - db_show_sx + .lc_ddb_show = db_show_sx, #endif }; @@ -75,243 +128,724 @@ } void -sx_init(struct sx *sx, const char *description) +sx_init_flags(struct sx *sx, const char *description, int opts) { + struct lock_object *lock; + int flags; - sx->sx_lock = mtx_pool_find(mtxpool_lockbuilder, sx); - sx->sx_cnt = 0; - cv_init(&sx->sx_shrd_cv, description); - sx->sx_shrd_wcnt = 0; - cv_init(&sx->sx_excl_cv, description); - sx->sx_excl_wcnt = 0; - sx->sx_xholder = NULL; - lock_init(&sx->sx_object, &lock_class_sx, description, NULL, - LO_WITNESS | LO_RECURSABLE | LO_SLEEPABLE | LO_UPGRADABLE); + MPASS((opts & ~(SX_QUIET | SX_RECURSE | SX_NOWITNESS | SX_DUPOK | + SX_NOPROFILE | SX_ADAPTIVESPIN)) == 0); + + bzero(sx, sizeof(*sx)); + + flags = LO_RECURSABLE | LO_SLEEPABLE | LO_UPGRADABLE; + if (opts & SX_DUPOK) + flags |= LO_DUPOK; + if (!(opts & SX_NOWITNESS)) + flags |= LO_WITNESS; + if (opts & SX_QUIET) + flags |= LO_QUIET; + + flags |= opts & (SX_ADAPTIVESPIN | SX_RECURSE); + sx->sx_lock = SX_LOCK_UNLOCKED; + sx->sx_recurse = 0; + lock = &sx->lock_object; + lock->lo_class = &lock_class_sx; + lock->lo_flags = flags; + lock->lo_name = lock->lo_type = description; + LOCK_LOG_INIT(lock, opts); + WITNESS_INIT(lock); } void sx_destroy(struct sx *sx) { + LOCK_LOG_DESTROY(&sx->lock_object, 0); - KASSERT((sx->sx_cnt == 0 && sx->sx_shrd_wcnt == 0 && sx->sx_excl_wcnt == - 0), ("%s (%s): holders or waiters\n", __func__, - sx->sx_object.lo_name)); - - sx->sx_lock = NULL; - cv_destroy(&sx->sx_shrd_cv); - cv_destroy(&sx->sx_excl_cv); - - lock_destroy(&sx->sx_object); + KASSERT(sx->sx_lock == SX_LOCK_UNLOCKED, ("sx lock still held")); + KASSERT(sx->sx_recurse == 0, ("sx lock still recursed")); + sx->sx_lock = SX_LOCK_DESTROYED; + WITNESS_DESTROY(&sx->lock_object); } -void -_sx_slock(struct sx *sx, const char *file, int line) +int +_sx_slock(struct sx *sx, int opts, const char *file, int line) { + int error = 0; - mtx_lock(sx->sx_lock); - KASSERT(sx->sx_xholder != curthread, - ("%s (%s): slock while xlock is held @ %s:%d\n", __func__, - sx->sx_object.lo_name, file, line)); - WITNESS_CHECKORDER(&sx->sx_object, LOP_NEWORDER, file, line); - - /* - * Loop in case we lose the race for lock acquisition. - */ - while (sx->sx_cnt < 0) { - sx->sx_shrd_wcnt++; - cv_wait(&sx->sx_shrd_cv, sx->sx_lock); - sx->sx_shrd_wcnt--; + MPASS(curthread != NULL); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_slock() of destroyed sx @ %s:%d", file, line)); + WITNESS_CHECKORDER(&sx->lock_object, LOP_NEWORDER, file, line); + error = __sx_slock(sx, opts, file, line); + if (!error) { + LOCK_LOG_LOCK("SLOCK", &sx->lock_object, 0, 0, file, line); + WITNESS_LOCK(&sx->lock_object, 0, file, line); + curthread->td_locks++; } - /* Acquire a shared lock. */ - sx->sx_cnt++; - - LOCK_LOG_LOCK("SLOCK", &sx->sx_object, 0, 0, file, line); - WITNESS_LOCK(&sx->sx_object, 0, file, line); - curthread->td_locks++; - - mtx_unlock(sx->sx_lock); + return (error); } int _sx_try_slock(struct sx *sx, const char *file, int line) { + uintptr_t x; - mtx_lock(sx->sx_lock); - if (sx->sx_cnt >= 0) { - sx->sx_cnt++; - LOCK_LOG_TRY("SLOCK", &sx->sx_object, 0, 1, file, line); - WITNESS_LOCK(&sx->sx_object, LOP_TRYLOCK, file, line); + x = sx->sx_lock; + KASSERT(x != SX_LOCK_DESTROYED, + ("sx_try_slock() of destroyed sx @ %s:%d", file, line)); + if ((x & SX_LOCK_SHARED) && atomic_cmpset_acq_ptr(&sx->sx_lock, x, + x + SX_ONE_SHARER)) { + LOCK_LOG_TRY("SLOCK", &sx->lock_object, 0, 1, file, line); + WITNESS_LOCK(&sx->lock_object, LOP_TRYLOCK, file, line); curthread->td_locks++; - mtx_unlock(sx->sx_lock); return (1); - } else { - LOCK_LOG_TRY("SLOCK", &sx->sx_object, 0, 0, file, line); - mtx_unlock(sx->sx_lock); - return (0); } + + LOCK_LOG_TRY("SLOCK", &sx->lock_object, 0, 0, file, line); + return (0); } -void -_sx_xlock(struct sx *sx, const char *file, int line) +int +_sx_xlock(struct sx *sx, int opts, const char *file, int line) { + int error = 0; - mtx_lock(sx->sx_lock); - - /* - * With sx locks, we're absolutely not permitted to recurse on - * xlocks, as it is fatal (deadlock). Normally, recursion is handled - * by WITNESS, but as it is not semantically correct to hold the - * xlock while in here, we consider it API abuse and put it under - * INVARIANTS. - */ - KASSERT(sx->sx_xholder != curthread, - ("%s (%s): xlock already held @ %s:%d", __func__, - sx->sx_object.lo_name, file, line)); - WITNESS_CHECKORDER(&sx->sx_object, LOP_NEWORDER | LOP_EXCLUSIVE, file, + MPASS(curthread != NULL); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_xlock() of destroyed sx @ %s:%d", file, line)); + WITNESS_CHECKORDER(&sx->lock_object, LOP_NEWORDER | LOP_EXCLUSIVE, file, line); - - /* Loop in case we lose the race for lock acquisition. */ - while (sx->sx_cnt != 0) { - sx->sx_excl_wcnt++; - cv_wait(&sx->sx_excl_cv, sx->sx_lock); - sx->sx_excl_wcnt--; + error = __sx_xlock(sx, curthread, opts, file, line); + if (!error) { + LOCK_LOG_LOCK("XLOCK", &sx->lock_object, 0, sx->sx_recurse, + file, line); + WITNESS_LOCK(&sx->lock_object, LOP_EXCLUSIVE, file, line); + curthread->td_locks++; } - MPASS(sx->sx_cnt == 0); - - /* Acquire an exclusive lock. */ - sx->sx_cnt--; - sx->sx_xholder = curthread; - - LOCK_LOG_LOCK("XLOCK", &sx->sx_object, 0, 0, file, line); - WITNESS_LOCK(&sx->sx_object, LOP_EXCLUSIVE, file, line); - curthread->td_locks++; - - mtx_unlock(sx->sx_lock); + return (error); } int _sx_try_xlock(struct sx *sx, const char *file, int line) { + int rval; - mtx_lock(sx->sx_lock); - if (sx->sx_cnt == 0) { - sx->sx_cnt--; - sx->sx_xholder = curthread; - LOCK_LOG_TRY("XLOCK", &sx->sx_object, 0, 1, file, line); - WITNESS_LOCK(&sx->sx_object, LOP_EXCLUSIVE | LOP_TRYLOCK, file, - line); + MPASS(curthread != NULL); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_try_xlock() of destroyed sx @ %s:%d", file, line)); + + if (sx_xlocked(sx) && (sx->lock_object.lo_flags & SX_RECURSE) != 0) { + sx->sx_recurse++; + atomic_set_ptr(&sx->sx_lock, SX_LOCK_RECURSED); + rval = 1; + } else + rval = atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, + (uintptr_t)curthread); + LOCK_LOG_TRY("XLOCK", &sx->lock_object, 0, rval, file, line); + if (rval) { + WITNESS_LOCK(&sx->lock_object, LOP_EXCLUSIVE | LOP_TRYLOCK, + file, line); curthread->td_locks++; - mtx_unlock(sx->sx_lock); - return (1); - } else { - LOCK_LOG_TRY("XLOCK", &sx->sx_object, 0, 0, file, line); - mtx_unlock(sx->sx_lock); - return (0); } + + return (rval); } void _sx_sunlock(struct sx *sx, const char *file, int line) { - _sx_assert(sx, SX_SLOCKED, file, line); - mtx_lock(sx->sx_lock); + MPASS(curthread != NULL); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_sunlock() of destroyed sx @ %s:%d", file, line)); + _sx_assert(sx, SA_SLOCKED, file, line); + curthread->td_locks--; + WITNESS_UNLOCK(&sx->lock_object, 0, file, line); + LOCK_LOG_LOCK("SUNLOCK", &sx->lock_object, 0, 0, file, line); +#ifdef LOCK_PROFILING_SHARED + if (SX_SHARERS(sx->sx_lock) == 1) + lock_profile_release_lock(&sx->lock_object); +#endif + __sx_sunlock(sx, file, line); +} + +void +_sx_xunlock(struct sx *sx, const char *file, int line) +{ + MPASS(curthread != NULL); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_xunlock() of destroyed sx @ %s:%d", file, line)); + _sx_assert(sx, SA_XLOCKED, file, line); curthread->td_locks--; - WITNESS_UNLOCK(&sx->sx_object, 0, file, line); + WITNESS_UNLOCK(&sx->lock_object, LOP_EXCLUSIVE, file, line); + LOCK_LOG_LOCK("XUNLOCK", &sx->lock_object, 0, sx->sx_recurse, file, + line); + if (!sx_recursed(sx)) + lock_profile_release_lock(&sx->lock_object); + __sx_xunlock(sx, curthread, file, line); +} - /* Release. */ - sx->sx_cnt--; +/* + * Try to do a non-blocking upgrade from a shared lock to an exclusive lock. + * This will only succeed if this thread holds a single shared lock. + * Return 1 if if the upgrade succeed, 0 otherwise. + */ +int +_sx_try_upgrade(struct sx *sx, const char *file, int line) +{ + uintptr_t x; + int success; + + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_try_upgrade() of destroyed sx @ %s:%d", file, line)); + _sx_assert(sx, SA_SLOCKED, file, line); /* - * If we just released the last shared lock, wake any waiters up, giving - * exclusive lockers precedence. In order to make sure that exclusive - * lockers won't be blocked forever, don't wake shared lock waiters if - * there are exclusive lock waiters. + * Try to switch from one shared lock to an exclusive lock. We need + * to maintain the SX_LOCK_EXCLUSIVE_WAITERS flag if set so that + * we will wake up the exclusive waiters when we drop the lock. */ - if (sx->sx_excl_wcnt > 0) { - if (sx->sx_cnt == 0) - cv_signal(&sx->sx_excl_cv); - } else if (sx->sx_shrd_wcnt > 0) - cv_broadcast(&sx->sx_shrd_cv); - - LOCK_LOG_LOCK("SUNLOCK", &sx->sx_object, 0, 0, file, line); - - mtx_unlock(sx->sx_lock); + x = sx->sx_lock & SX_LOCK_EXCLUSIVE_WAITERS; + success = atomic_cmpset_ptr(&sx->sx_lock, SX_SHARERS_LOCK(1) | x, + (uintptr_t)curthread | x); + LOCK_LOG_TRY("XUPGRADE", &sx->lock_object, 0, success, file, line); + if (success) + WITNESS_UPGRADE(&sx->lock_object, LOP_EXCLUSIVE | LOP_TRYLOCK, + file, line); + return (success); } +/* + * Downgrade an unrecursed exclusive lock into a single shared lock. + */ void -_sx_xunlock(struct sx *sx, const char *file, int line) +_sx_downgrade(struct sx *sx, const char *file, int line) { + uintptr_t x; - _sx_assert(sx, SX_XLOCKED, file, line); - mtx_lock(sx->sx_lock); - MPASS(sx->sx_cnt == -1); + KASSERT(sx->sx_lock != SX_LOCK_DESTROYED, + ("sx_downgrade() of destroyed sx @ %s:%d", file, line)); + _sx_assert(sx, SA_XLOCKED | SA_NOTRECURSED, file, line); +#ifndef INVARIANTS + if (sx_recursed(sx)) + panic("downgrade of a recursed lock"); +#endif - curthread->td_locks--; - WITNESS_UNLOCK(&sx->sx_object, LOP_EXCLUSIVE, file, line); + WITNESS_DOWNGRADE(&sx->lock_object, 0, file, line); - /* Release. */ - sx->sx_cnt++; - sx->sx_xholder = NULL; + /* + * Try to switch from an exclusive lock with no shared waiters + * to one sharer with no shared waiters. If there are + * exclusive waiters, we don't need to lock the sleep queue so + * long as we preserve the flag. We do one quick try and if + * that fails we grab the sleepq lock to keep the flags from + * changing and do it the slow way. + * + * We have to lock the sleep queue if there are shared waiters + * so we can wake them up. + */ + x = sx->sx_lock; + if (!(x & SX_LOCK_SHARED_WAITERS) && + atomic_cmpset_rel_ptr(&sx->sx_lock, x, SX_SHARERS_LOCK(1) | + (x & SX_LOCK_EXCLUSIVE_WAITERS))) { + LOCK_LOG_LOCK("XDOWNGRADE", &sx->lock_object, 0, 0, file, line); + return; + } /* - * Wake up waiters if there are any. Give precedence to slock waiters. + * Lock the sleep queue so we can read the waiters bits + * without any races and wakeup any shared waiters. */ - if (sx->sx_shrd_wcnt > 0) - cv_broadcast(&sx->sx_shrd_cv); - else if (sx->sx_excl_wcnt > 0) - cv_signal(&sx->sx_excl_cv); + sleepq_lock(&sx->lock_object); - LOCK_LOG_LOCK("XUNLOCK", &sx->sx_object, 0, 0, file, line); + /* + * Preserve SX_LOCK_EXCLUSIVE_WAITERS while downgraded to a single + * shared lock. If there are any shared waiters, wake them up. + */ + x = sx->sx_lock; + atomic_store_rel_ptr(&sx->sx_lock, SX_SHARERS_LOCK(1) | + (x & SX_LOCK_EXCLUSIVE_WAITERS)); + if (x & SX_LOCK_SHARED_WAITERS) + sleepq_broadcast_queue(&sx->lock_object, SLEEPQ_SX, -1, + SQ_SHARED_QUEUE); + else + sleepq_release(&sx->lock_object); - mtx_unlock(sx->sx_lock); + LOCK_LOG_LOCK("XDOWNGRADE", &sx->lock_object, 0, 0, file, line); } +/* + * This function represents the so-called 'hard case' for sx_xlock + * operation. All 'easy case' failures are redirected to this. Note + * that ideally this would be a static function, but it needs to be + * accessible from at least sx.h. + */ int -_sx_try_upgrade(struct sx *sx, const char *file, int line) +_sx_xlock_hard(struct sx *sx, uintptr_t tid, int opts, const char *file, + int line) { + GIANT_DECLARE; +#ifdef ADAPTIVE_SX + volatile struct thread *owner; +#endif + /* uint64_t waittime = 0; */ + uintptr_t x; + int /* contested = 0, */error = 0; + + /* If we already hold an exclusive lock, then recurse. */ + if (sx_xlocked(sx)) { + KASSERT((sx->lock_object.lo_flags & SX_RECURSE) != 0, + ("_sx_xlock_hard: recursed on non-recursive sx %s @ %s:%d\n", + sx->lock_object.lo_name, file, line)); + sx->sx_recurse++; + atomic_set_ptr(&sx->sx_lock, SX_LOCK_RECURSED); + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p recursing", __func__, sx); + return (0); + } - _sx_assert(sx, SX_SLOCKED, file, line); - mtx_lock(sx->sx_lock); + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR5(KTR_LOCK, "%s: %s contested (lock=%p) at %s:%d", __func__, + sx->lock_object.lo_name, (void *)sx->sx_lock, file, line); - if (sx->sx_cnt == 1) { - sx->sx_cnt = -1; - sx->sx_xholder = curthread; + while (!atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) { +#ifdef ADAPTIVE_SX + /* + * If the lock is write locked and the owner is + * running on another CPU, spin until the owner stops + * running or the state of the lock changes. + */ + x = sx->sx_lock; + if (!(x & SX_LOCK_SHARED) && + (sx->lock_object.lo_flags & SX_ADAPTIVESPIN)) { + x = SX_OWNER(x); + owner = (struct thread *)x; + if (TD_IS_RUNNING(owner)) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR3(KTR_LOCK, + "%s: spinning on %p held by %p", + __func__, sx, owner); + GIANT_SAVE(); + lock_profile_obtain_lock_failed( + &sx->lock_object, &contested, &waittime); + while (SX_OWNER(sx->sx_lock) == x && + TD_IS_RUNNING(owner)) + cpu_spinwait(); + continue; + } + } +#endif - LOCK_LOG_TRY("XUPGRADE", &sx->sx_object, 0, 1, file, line); - WITNESS_UPGRADE(&sx->sx_object, LOP_EXCLUSIVE | LOP_TRYLOCK, - file, line); + sleepq_lock(&sx->lock_object); + x = sx->sx_lock; - mtx_unlock(sx->sx_lock); - return (1); - } else { - LOCK_LOG_TRY("XUPGRADE", &sx->sx_object, 0, 0, file, line); - mtx_unlock(sx->sx_lock); - return (0); + /* + * If the lock was released while spinning on the + * sleep queue chain lock, try again. + */ + if (x == SX_LOCK_UNLOCKED) { + sleepq_release(&sx->lock_object); + continue; + } + +#ifdef ADAPTIVE_SX + /* + * The current lock owner might have started executing + * on another CPU (or the lock could have changed + * owners) while we were waiting on the sleep queue + * chain lock. If so, drop the sleep queue lock and try + * again. + */ + if (!(x & SX_LOCK_SHARED) && + (sx->lock_object.lo_flags & SX_ADAPTIVESPIN)) { + owner = (struct thread *)SX_OWNER(x); + if (TD_IS_RUNNING(owner)) { + sleepq_release(&sx->lock_object); + continue; + } + } +#endif + + /* + * If an exclusive lock was released with both shared + * and exclusive waiters and a shared waiter hasn't + * woken up and acquired the lock yet, sx_lock will be + * set to SX_LOCK_UNLOCKED | SX_LOCK_EXCLUSIVE_WAITERS. + * If we see that value, try to acquire it once. Note + * that we have to preserve SX_LOCK_EXCLUSIVE_WAITERS + * as there are other exclusive waiters still. If we + * fail, restart the loop. + */ + if (x == (SX_LOCK_UNLOCKED | SX_LOCK_EXCLUSIVE_WAITERS)) { + if (atomic_cmpset_acq_ptr(&sx->sx_lock, + SX_LOCK_UNLOCKED | SX_LOCK_EXCLUSIVE_WAITERS, + tid | SX_LOCK_EXCLUSIVE_WAITERS)) { + sleepq_release(&sx->lock_object); + CTR2(KTR_LOCK, "%s: %p claimed by new writer", + __func__, sx); + break; + } + sleepq_release(&sx->lock_object); + continue; + } + + /* + * Try to set the SX_LOCK_EXCLUSIVE_WAITERS. If we fail, + * than loop back and retry. + */ + if (!(x & SX_LOCK_EXCLUSIVE_WAITERS)) { + if (!atomic_cmpset_ptr(&sx->sx_lock, x, + x | SX_LOCK_EXCLUSIVE_WAITERS)) { + sleepq_release(&sx->lock_object); + continue; + } + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p set excl waiters flag", + __func__, sx); + } + + /* + * Since we have been unable to acquire the exclusive + * lock and the exclusive waiters flag is set, we have + * to sleep. + */ +#if 0 + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p blocking on sleep queue", + __func__, sx); +#endif + + GIANT_SAVE(); + lock_profile_obtain_lock_failed(&sx->lock_object, &contested, + &waittime); + sleepq_add_queue(&sx->lock_object, NULL, sx->lock_object.lo_name, + SLEEPQ_SX | ((opts & SX_INTERRUPTIBLE) ? + SLEEPQ_INTERRUPTIBLE : 0), SQ_EXCLUSIVE_QUEUE); + if (!(opts & SX_INTERRUPTIBLE)) + sleepq_wait(&sx->lock_object); + else + error = sleepq_wait_sig(&sx->lock_object); + + if (error) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, + "%s: interruptible sleep by %p suspended by signal", + __func__, sx); + break; + } + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p resuming from sleep queue", + __func__, sx); } + + GIANT_RESTORE(); + if (!error) + lock_profile_obtain_lock_success(&sx->lock_object, contested, + waittime, file, line); + return (error); } +/* + * This function represents the so-called 'hard case' for sx_xunlock + * operation. All 'easy case' failures are redirected to this. Note + * that ideally this would be a static function, but it needs to be + * accessible from at least sx.h. + */ void -_sx_downgrade(struct sx *sx, const char *file, int line) +_sx_xunlock_hard(struct sx *sx, uintptr_t tid, const char *file, int line) +{ + uintptr_t x; + int queue; + + MPASS(!(sx->sx_lock & SX_LOCK_SHARED)); + + /* If the lock is recursed, then unrecurse one level. */ + if (sx_xlocked(sx) && sx_recursed(sx)) { + if ((--sx->sx_recurse) == 0) + atomic_clear_ptr(&sx->sx_lock, SX_LOCK_RECURSED); + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p unrecursing", __func__, sx); + return; + } + MPASS(sx->sx_lock & (SX_LOCK_SHARED_WAITERS | + SX_LOCK_EXCLUSIVE_WAITERS)); + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p contested", __func__, sx); + + sleepq_lock(&sx->lock_object); + x = SX_LOCK_UNLOCKED; + + /* + * The wake up algorithm here is quite simple and probably not + * ideal. It gives precedence to shared waiters if they are + * present. For this condition, we have to preserve the + * state of the exclusive waiters flag. + */ + if (sx->sx_lock & SX_LOCK_SHARED_WAITERS) { + queue = SQ_SHARED_QUEUE; + x |= (sx->sx_lock & SX_LOCK_EXCLUSIVE_WAITERS); + } else + queue = SQ_EXCLUSIVE_QUEUE; + + /* Wake up all the waiters for the specific queue. */ + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR3(KTR_LOCK, "%s: %p waking up all threads on %s queue", + __func__, sx, queue == SQ_SHARED_QUEUE ? "shared" : + "exclusive"); + atomic_store_rel_ptr(&sx->sx_lock, x); + sleepq_broadcast_queue(&sx->lock_object, SLEEPQ_SX, -1, queue); +} + +/* + * This function represents the so-called 'hard case' for sx_slock + * operation. All 'easy case' failures are redirected to this. Note + * that ideally this would be a static function, but it needs to be + * accessible from at least sx.h. + */ +int +_sx_slock_hard(struct sx *sx, int opts, const char *file, int line) +{ + GIANT_DECLARE; +#ifdef ADAPTIVE_SX + volatile struct thread *owner; +#endif +#ifdef LOCK_PROFILING_SHARED + uint64_t waittime = 0; + int contested = 0; +#endif + uintptr_t x; + int error = 0; + + /* + * As with rwlocks, we don't make any attempt to try to block + * shared locks once there is an exclusive waiter. + */ + for (;;) { + x = sx->sx_lock; + + /* + * If no other thread has an exclusive lock then try to bump up + * the count of sharers. Since we have to preserve the state + * of SX_LOCK_EXCLUSIVE_WAITERS, if we fail to acquire the + * shared lock loop back and retry. + */ + if (x & SX_LOCK_SHARED) { + MPASS(!(x & SX_LOCK_SHARED_WAITERS)); + if (atomic_cmpset_acq_ptr(&sx->sx_lock, x, + x + SX_ONE_SHARER)) { +#ifdef LOCK_PROFILING_SHARED + if (SX_SHARERS(x) == 0) + lock_profile_obtain_lock_success( + &sx->lock_object, contested, + waittime, file, line); +#endif + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR4(KTR_LOCK, + "%s: %p succeed %p -> %p", __func__, + sx, (void *)x, + (void *)(x + SX_ONE_SHARER)); + break; + } + continue; + } + +#ifdef ADAPTIVE_SX + /* + * If the owner is running on another CPU, spin until + * the owner stops running or the state of the lock + * changes. + */ + else if (sx->lock_object.lo_flags & SX_ADAPTIVESPIN) { + x = SX_OWNER(x); + owner = (struct thread *)x; + if (TD_IS_RUNNING(owner)) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR3(KTR_LOCK, + "%s: spinning on %p held by %p", + __func__, sx, owner); + GIANT_SAVE(); +#ifdef LOCK_PROFILING_SHARED + lock_profile_obtain_lock_failed( + &sx->lock_object, &contested, &waittime); +#endif + while (SX_OWNER(sx->sx_lock) == x && + TD_IS_RUNNING(owner)) + cpu_spinwait(); + continue; + } + } +#endif + + /* + * Some other thread already has an exclusive lock, so + * start the process of blocking. + */ + sleepq_lock(&sx->lock_object); + x = sx->sx_lock; + + /* + * The lock could have been released while we spun. + * In this case loop back and retry. + */ + if (x & SX_LOCK_SHARED) { + sleepq_release(&sx->lock_object); + continue; + } + +#ifdef ADAPTIVE_SX + /* + * If the owner is running on another CPU, spin until + * the owner stops running or the state of the lock + * changes. + */ + if (!(x & SX_LOCK_SHARED) && + (sx->lock_object.lo_flags & SX_ADAPTIVESPIN)) { + owner = (struct thread *)SX_OWNER(x); + if (TD_IS_RUNNING(owner)) { + sleepq_release(&sx->lock_object); + continue; + } + } +#endif + + /* + * Try to set the SX_LOCK_SHARED_WAITERS flag. If we + * fail to set it drop the sleep queue lock and loop + * back. + */ + if (!(x & SX_LOCK_SHARED_WAITERS)) { + if (!atomic_cmpset_ptr(&sx->sx_lock, x, + x | SX_LOCK_SHARED_WAITERS)) { + sleepq_release(&sx->lock_object); + continue; + } + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p set shared waiters flag", + __func__, sx); + } + + /* + * Since we have been unable to acquire the shared lock, + * we have to sleep. + */ + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p blocking on sleep queue", + __func__, sx); + + GIANT_SAVE(); +#ifdef LOCK_PROFILING_SHARED + lock_profile_obtain_lock_failed(&sx->lock_object, &contested, + &waittime); +#endif + sleepq_add_queue(&sx->lock_object, NULL, sx->lock_object.lo_name, + SLEEPQ_SX | ((opts & SX_INTERRUPTIBLE) ? + SLEEPQ_INTERRUPTIBLE : 0), SQ_SHARED_QUEUE); + if (!(opts & SX_INTERRUPTIBLE)) + sleepq_wait(&sx->lock_object); + else + error = sleepq_wait_sig(&sx->lock_object); + + if (error) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, + "%s: interruptible sleep by %p suspended by signal", + __func__, sx); + break; + } + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p resuming from sleep queue", + __func__, sx); + } + + GIANT_RESTORE(); + return (error); +} + +/* + * This function represents the so-called 'hard case' for sx_sunlock + * operation. All 'easy case' failures are redirected to this. Note + * that ideally this would be a static function, but it needs to be + * accessible from at least sx.h. + */ +void +_sx_sunlock_hard(struct sx *sx, const char *file, int line) { + uintptr_t x; + + for (;;) { + x = sx->sx_lock; + + /* + * We should never have sharers while at least one thread + * holds a shared lock. + */ + KASSERT(!(x & SX_LOCK_SHARED_WAITERS), + ("%s: waiting sharers", __func__)); - _sx_assert(sx, SX_XLOCKED, file, line); - mtx_lock(sx->sx_lock); - MPASS(sx->sx_cnt == -1); + /* + * See if there is more than one shared lock held. If + * so, just drop one and return. + */ + if (SX_SHARERS(x) > 1) { + if (atomic_cmpset_ptr(&sx->sx_lock, x, + x - SX_ONE_SHARER)) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR4(KTR_LOCK, + "%s: %p succeeded %p -> %p", + __func__, sx, (void *)x, + (void *)(x - SX_ONE_SHARER)); + break; + } + continue; + } - WITNESS_DOWNGRADE(&sx->sx_object, 0, file, line); + /* + * If there aren't any waiters for an exclusive lock, + * then try to drop it quickly. + */ + if (!(x & SX_LOCK_EXCLUSIVE_WAITERS)) { + MPASS(x == SX_SHARERS_LOCK(1)); + if (atomic_cmpset_ptr(&sx->sx_lock, SX_SHARERS_LOCK(1), + SX_LOCK_UNLOCKED)) { + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p last succeeded", + __func__, sx); + break; + } + continue; + } - sx->sx_cnt = 1; - sx->sx_xholder = NULL; - if (sx->sx_shrd_wcnt > 0) - cv_broadcast(&sx->sx_shrd_cv); + /* + * At this point, there should just be one sharer with + * exclusive waiters. + */ + MPASS(x == (SX_SHARERS_LOCK(1) | SX_LOCK_EXCLUSIVE_WAITERS)); - LOCK_LOG_LOCK("XDOWNGRADE", &sx->sx_object, 0, 0, file, line); + sleepq_lock(&sx->lock_object); - mtx_unlock(sx->sx_lock); + /* + * Wake up semantic here is quite simple: + * Just wake up all the exclusive waiters. + * Note that the state of the lock could have changed, + * so if it fails loop back and retry. + */ + if (!atomic_cmpset_ptr(&sx->sx_lock, + SX_SHARERS_LOCK(1) | SX_LOCK_EXCLUSIVE_WAITERS, + SX_LOCK_UNLOCKED)) { + sleepq_release(&sx->lock_object); + continue; + } + if (LOCK_LOG_TEST(&sx->lock_object, 0)) + CTR2(KTR_LOCK, "%s: %p waking up all thread on" + "exclusive queue", __func__, sx); + sleepq_broadcast_queue(&sx->lock_object, SLEEPQ_SX, -1, + SQ_EXCLUSIVE_QUEUE); + break; + } } #ifdef INVARIANT_SUPPORT @@ -327,44 +861,76 @@ void _sx_assert(struct sx *sx, int what, const char *file, int line) { +#ifndef WITNESS + int slocked = 0; +#endif if (panicstr != NULL) return; switch (what) { - case SX_LOCKED: - case SX_SLOCKED: + case SA_SLOCKED: + case SA_SLOCKED | SA_NOTRECURSED: + case SA_SLOCKED | SA_RECURSED: +#ifndef WITNESS + slocked = 1; + /* FALLTHROUGH */ +#endif + case SA_LOCKED: + case SA_LOCKED | SA_NOTRECURSED: + case SA_LOCKED | SA_RECURSED: #ifdef WITNESS - witness_assert(&sx->sx_object, what, file, line); + witness_assert(&sx->lock_object, what, file, line); #else - mtx_lock(sx->sx_lock); - if (sx->sx_cnt <= 0 && - (what == SX_SLOCKED || sx->sx_xholder != curthread)) + /* + * If some other thread has an exclusive lock or we + * have one and are asserting a shared lock, fail. + * Also, if no one has a lock at all, fail. + */ + if (sx->sx_lock == SX_LOCK_UNLOCKED || + (!(sx->sx_lock & SX_LOCK_SHARED) && (slocked || + sx_xholder(sx) != curthread))) panic("Lock %s not %slocked @ %s:%d\n", - sx->sx_object.lo_name, (what == SX_SLOCKED) ? - "share " : "", file, line); - mtx_unlock(sx->sx_lock); + sx->lock_object.lo_name, slocked ? "share " : "", + file, line); + + if (!(sx->sx_lock & SX_LOCK_SHARED)) { + if (sx_recursed(sx)) { + if (what & SA_NOTRECURSED) + panic("Lock %s recursed @ %s:%d\n", + sx->lock_object.lo_name, file, + line); + } else if (what & SA_RECURSED) + panic("Lock %s not recursed @ %s:%d\n", + sx->lock_object.lo_name, file, line); + } #endif break; - case SX_XLOCKED: - mtx_lock(sx->sx_lock); - if (sx->sx_xholder != curthread) + case SA_XLOCKED: + case SA_XLOCKED | SA_NOTRECURSED: + case SA_XLOCKED | SA_RECURSED: + if (sx_xholder(sx) != curthread) panic("Lock %s not exclusively locked @ %s:%d\n", - sx->sx_object.lo_name, file, line); - mtx_unlock(sx->sx_lock); + sx->lock_object.lo_name, file, line); + if (sx_recursed(sx)) { + if (what & SA_NOTRECURSED) + panic("Lock %s recursed @ %s:%d\n", + sx->lock_object.lo_name, file, line); + } else if (what & SA_RECURSED) + panic("Lock %s not recursed @ %s:%d\n", + sx->lock_object.lo_name, file, line); break; - case SX_UNLOCKED: + case SA_UNLOCKED: #ifdef WITNESS - witness_assert(&sx->sx_object, what, file, line); + witness_assert(&sx->lock_object, what, file, line); #else /* - * We are able to check only exclusive lock here, - * we cannot assert that *this* thread owns slock. + * If we hold an exclusve lock fail. We can't + * reliably check to see if we hold a shared lock or + * not. */ - mtx_lock(sx->sx_lock); - if (sx->sx_xholder == curthread) + if (sx_xholder(sx) == curthread) panic("Lock %s exclusively locked @ %s:%d\n", - sx->sx_object.lo_name, file, line); - mtx_unlock(sx->sx_lock); + sx->lock_object.lo_name, file, line); #endif break; default: @@ -375,7 +941,7 @@ #endif /* INVARIANT_SUPPORT */ #ifdef DDB -void +static void db_show_sx(struct lock_object *lock) { struct thread *td; @@ -384,16 +950,36 @@ sx = (struct sx *)lock; db_printf(" state: "); - if (sx->sx_cnt < 0) { - td = sx->sx_xholder; + if (sx->sx_lock == SX_LOCK_UNLOCKED) + db_printf("UNLOCKED\n"); + else if (sx->sx_lock == SX_LOCK_DESTROYED) { + db_printf("DESTROYED\n"); + return; + } else if (sx->sx_lock & SX_LOCK_SHARED) + db_printf("SLOCK: %ju\n", (uintmax_t)SX_SHARERS(sx->sx_lock)); + else { + td = sx_xholder(sx); db_printf("XLOCK: %p (tid %d, pid %d, \"%s\")\n", td, td->td_tid, td->td_proc->p_pid, td->td_proc->p_comm); - } else if (sx->sx_cnt > 0) - db_printf("SLOCK: %d locks\n", sx->sx_cnt); - else - db_printf("UNLOCKED\n"); - db_printf(" waiters: %d shared, %d exclusive\n", sx->sx_shrd_wcnt, - sx->sx_excl_wcnt); + if (sx_recursed(sx)) + db_printf(" recursed: %d\n", sx->sx_recurse); + } + + db_printf(" waiters: "); + switch(sx->sx_lock & + (SX_LOCK_SHARED_WAITERS | SX_LOCK_EXCLUSIVE_WAITERS)) { + case SX_LOCK_SHARED_WAITERS: + db_printf("shared\n"); + break; + case SX_LOCK_EXCLUSIVE_WAITERS: + db_printf("exclusive\n"); + break; + case SX_LOCK_SHARED_WAITERS | SX_LOCK_EXCLUSIVE_WAITERS: + db_printf("exclusive and shared\n"); + break; + default: + db_printf("none\n"); + } } /* @@ -405,47 +991,26 @@ sx_chain(struct thread *td, struct thread **ownerp) { struct sx *sx; - struct cv *cv; /* - * First, see if it looks like td is blocked on a condition - * variable. + * Check to see if this thread is blocked on an sx lock. + * First, we check the lock class. If that is ok, then we + * compare the lock name against the wait message. */ - cv = td->td_wchan; - if (cv->cv_description != td->td_wmesg) +#define LOCK_CLASS(lo) (lo)->lo_class + sx = td->td_wchan; + if (LOCK_CLASS(&sx->lock_object) != &lock_class_sx || + sx->lock_object.lo_name != td->td_wmesg) return (0); - /* - * Ok, see if it looks like td is blocked on the exclusive - * condition variable. - */ - sx = (struct sx *)((char *)cv - offsetof(struct sx, sx_excl_cv)); - if (LOCK_CLASS(&sx->sx_object) == &lock_class_sx && - sx->sx_excl_wcnt > 0) - goto ok; - - /* - * Second, see if it looks like td is blocked on the shared - * condition variable. - */ - sx = (struct sx *)((char *)cv - offsetof(struct sx, sx_shrd_cv)); - if (LOCK_CLASS(&sx->sx_object) == &lock_class_sx && - sx->sx_shrd_wcnt > 0) - goto ok; - - /* Doesn't seem to be an sx lock. */ - return (0); - -ok: /* We think we have an sx lock, so output some details. */ db_printf("blocked on sx \"%s\" ", td->td_wmesg); - if (sx->sx_cnt >= 0) { - db_printf("SLOCK (count %d)\n", sx->sx_cnt); - *ownerp = NULL; - } else { + *ownerp = sx_xholder(sx); + if (sx->sx_lock & SX_LOCK_SHARED) + db_printf("SLOCK (count %ju)\n", + (uintmax_t)SX_SHARERS(sx->sx_lock)); + else db_printf("XLOCK\n"); - *ownerp = sx->sx_xholder; - } return (1); } #endif Index: kern/subr_sleepqueue.c =================================================================== RCS file: /cvs/ncvs/src/sys/kern/subr_sleepqueue.c,v retrieving revision 1.18.2.4 diff -u -r1.18.2.4 subr_sleepqueue.c --- kern/subr_sleepqueue.c 17 Aug 2006 19:53:06 -0000 1.18.2.4 +++ kern/subr_sleepqueue.c 31 Aug 2007 00:39:59 -0000 @@ -82,6 +82,12 @@ #include #endif +#include + +#ifdef DDB +#include +#endif + /* * Constants for the hash table of sleep queue chains. These constants are * the same ones that 4BSD (and possibly earlier versions of BSD) used. @@ -94,7 +100,7 @@ #define SC_SHIFT 8 #define SC_HASH(wc) (((uintptr_t)(wc) >> SC_SHIFT) & SC_MASK) #define SC_LOOKUP(wc) &sleepq_chains[SC_HASH(wc)] - +#define NR_SLEEPQS 2 /* * There two different lists of sleep queues. Both lists are connected * via the sq_hash entries. The first list is the sleep queue chain list @@ -114,13 +120,13 @@ * c - sleep queue chain lock */ struct sleepqueue { - TAILQ_HEAD(, thread) sq_blocked; /* (c) Blocked threads. */ + TAILQ_HEAD(, thread) sq_blocked[NR_SLEEPQS]; /* (c) Blocked threads. */ LIST_ENTRY(sleepqueue) sq_hash; /* (c) Chain and free list. */ LIST_HEAD(, sleepqueue) sq_free; /* (c) Free queues. */ void *sq_wchan; /* (c) Wait channel. */ #ifdef INVARIANTS int sq_type; /* (c) Queue type. */ - struct mtx *sq_lock; /* (c) Associated lock. */ + struct mtx *sq_lock; /* (c) Associated lock. */ #endif }; @@ -142,16 +148,22 @@ 0, "maxmimum depth achieved of a single chain"); #endif static struct sleepqueue_chain sleepq_chains[SC_TABLESIZE]; - -static MALLOC_DEFINE(M_SLEEPQUEUE, "sleep queues", "sleep queues"); +static uma_zone_t sleepq_zone; /* * Prototypes for non-exported routines. */ +static int sleepq_catch_signals(void *wchan); +static int sleepq_check_signals(void); static int sleepq_check_timeout(void); +#ifdef INVARIANTS +static void sleepq_dtor(void *mem, int size, void *arg); +#endif +static int sleepq_init(void *mem, int size, int flags); +static void sleepq_resume_thread(struct sleepqueue *sq, struct thread *td, + int pri); static void sleepq_switch(void *wchan); static void sleepq_timeout(void *arg); -static void sleepq_resume_thread(struct sleepqueue *sq, struct thread *td, int pri); /* * Early initialization of sleep queues that is called from the sleepinit() @@ -182,21 +194,24 @@ NULL); #endif } + sleepq_zone = uma_zcreate("SLEEPQUEUE", sizeof(struct sleepqueue), +#ifdef INVARIANTS + NULL, sleepq_dtor, sleepq_init, NULL, UMA_ALIGN_CACHE, 0); +#else + NULL, NULL, sleepq_init, NULL, UMA_ALIGN_CACHE, 0); +#endif + thread0.td_sleepqueue = sleepq_alloc(); } /* - * Malloc and initialize a new sleep queue for a new thread. + * Get a sleep queue for a new thread. */ struct sleepqueue * sleepq_alloc(void) { - struct sleepqueue *sq; - sq = malloc(sizeof(struct sleepqueue), M_SLEEPQUEUE, M_WAITOK | M_ZERO); - TAILQ_INIT(&sq->sq_blocked); - LIST_INIT(&sq->sq_free); - return (sq); + return (uma_zalloc(sleepq_zone, M_WAITOK)); } /* @@ -206,9 +221,7 @@ sleepq_free(struct sleepqueue *sq) { - MPASS(sq != NULL); - MPASS(TAILQ_EMPTY(&sq->sq_blocked)); - free(sq, M_SLEEPQUEUE); + uma_zfree(sleepq_zone, sq); } /* @@ -262,7 +275,8 @@ * woken up. */ void -sleepq_add(void *wchan, struct mtx *lock, const char *wmesg, int flags) +sleepq_add_queue(void *wchan, struct mtx *lock, const char *wmesg, int flags, + int queue) { struct sleepqueue_chain *sc; struct sleepqueue *sq; @@ -273,10 +287,11 @@ mtx_assert(&sc->sc_lock, MA_OWNED); MPASS(td->td_sleepqueue != NULL); MPASS(wchan != NULL); + MPASS((queue >= 0) && (queue < NR_SLEEPQS)); /* If this thread is not allowed to sleep, die a horrible death. */ KASSERT(!(td->td_pflags & TDP_NOSLEEPING), - ("trying to sleep while sleeping is prohibited")); + ("Trying sleep, but thread marked as sleeping prohibited")); /* Look up the sleep queue associated with the wait channel 'wchan'. */ sq = sleepq_lookup(wchan); @@ -287,6 +302,19 @@ * into the sleep queue already in use by this wait channel. */ if (sq == NULL) { +#ifdef INVARIANTS + int i; + + sq = td->td_sleepqueue; + for (i = 0; i < NR_SLEEPQS; i++) + KASSERT(TAILQ_EMPTY(&sq->sq_blocked[i]), + ("thread's sleep queue %d is not empty", i)); + KASSERT(LIST_EMPTY(&sq->sq_free), + ("thread's sleep queue has a non-empty free list")); + KASSERT(sq->sq_wchan == NULL, ("stale sq_wchan pointer")); + sq->sq_lock = lock; + sq->sq_type = flags & SLEEPQ_TYPE; +#endif #ifdef SLEEPQUEUE_PROFILING sc->sc_depth++; if (sc->sc_depth > sc->sc_max_depth) { @@ -297,25 +325,17 @@ #endif sq = td->td_sleepqueue; LIST_INSERT_HEAD(&sc->sc_queues, sq, sq_hash); - KASSERT(TAILQ_EMPTY(&sq->sq_blocked), - ("thread's sleep queue has a non-empty queue")); - KASSERT(LIST_EMPTY(&sq->sq_free), - ("thread's sleep queue has a non-empty free list")); - KASSERT(sq->sq_wchan == NULL, ("stale sq_wchan pointer")); sq->sq_wchan = wchan; -#ifdef INVARIANTS - sq->sq_lock = lock; - sq->sq_type = flags & SLEEPQ_TYPE; -#endif } else { MPASS(wchan == sq->sq_wchan); MPASS(lock == sq->sq_lock); MPASS((flags & SLEEPQ_TYPE) == sq->sq_type); LIST_INSERT_HEAD(&sq->sq_free, td->td_sleepqueue, sq_hash); } - TAILQ_INSERT_TAIL(&sq->sq_blocked, td, td_slpq); + TAILQ_INSERT_TAIL(&sq->sq_blocked[queue], td, td_slpq); td->td_sleepqueue = NULL; mtx_lock_spin(&sched_lock); + td->td_sqqueue = queue; td->td_wchan = wchan; td->td_wmesg = wmesg; if (flags & SLEEPQ_INTERRUPTIBLE) { @@ -606,12 +626,13 @@ MPASS(td != NULL); MPASS(sq->sq_wchan != NULL); MPASS(td->td_wchan == sq->sq_wchan); + MPASS(td->td_sqqueue < NR_SLEEPQS && td->td_sqqueue >= 0); sc = SC_LOOKUP(sq->sq_wchan); mtx_assert(&sc->sc_lock, MA_OWNED); mtx_assert(&sched_lock, MA_OWNED); /* Remove the thread from the queue. */ - TAILQ_REMOVE(&sq->sq_blocked, td, td_slpq); + TAILQ_REMOVE(&sq->sq_blocked[td->td_sqqueue], td, td_slpq); /* * Get a sleep queue for this thread. If this is the last waiter, @@ -652,17 +673,51 @@ setrunnable(td); } +#ifdef INVARIANTS +/* + * UMA zone item deallocator. + */ +static void +sleepq_dtor(void *mem, int size, void *arg) +{ + struct sleepqueue *sq; + int i; + + sq = mem; + for (i = 0; i < NR_SLEEPQS; i++) + MPASS(TAILQ_EMPTY(&sq->sq_blocked[i])); +} +#endif + +/* + * UMA zone item initializer. + */ +static int +sleepq_init(void *mem, int size, int flags) +{ + struct sleepqueue *sq; + int i; + + bzero(mem, size); + sq = mem; + for (i = 0; i < NR_SLEEPQS; i++) + TAILQ_INIT(&sq->sq_blocked[i]); + LIST_INIT(&sq->sq_free); + return (0); +} + /* * Find the highest priority thread sleeping on a wait channel and resume it. */ void -sleepq_signal(void *wchan, int flags, int pri) +sleepq_signal_queue(void *wchan, int flags, int pri, int queue) { struct sleepqueue *sq; struct thread *td, *besttd; CTR2(KTR_PROC, "sleepq_signal(%p, %d)", wchan, flags); KASSERT(wchan != NULL, ("%s: invalid NULL wait channel", __func__)); + MPASS((queue >= 0) && (queue < NR_SLEEPQS)); sq = sleepq_lookup(wchan); if (sq == NULL) { sleepq_release(wchan); @@ -678,7 +733,7 @@ * the tail of sleep queues. */ besttd = NULL; - TAILQ_FOREACH(td, &sq->sq_blocked, td_slpq) { + TAILQ_FOREACH(td, &sq->sq_blocked[queue], td_slpq) { if (besttd == NULL || td->td_priority < besttd->td_priority) besttd = td; } @@ -693,12 +748,13 @@ * Resume all threads sleeping on a specified wait channel. */ void -sleepq_broadcast(void *wchan, int flags, int pri) +sleepq_broadcast_queue(void *wchan, int flags, int pri, int queue) { struct sleepqueue *sq; CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); KASSERT(wchan != NULL, ("%s: invalid NULL wait channel", __func__)); + MPASS((queue >= 0) && (queue < NR_SLEEPQS)); sq = sleepq_lookup(wchan); if (sq == NULL) { sleepq_release(wchan); @@ -709,8 +765,9 @@ /* Resume all blocked threads on the sleep queue. */ mtx_lock_spin(&sched_lock); - while (!TAILQ_EMPTY(&sq->sq_blocked)) - sleepq_resume_thread(sq, TAILQ_FIRST(&sq->sq_blocked), pri); + while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) + sleepq_resume_thread(sq, TAILQ_FIRST(&sq->sq_blocked[queue]), + pri); mtx_unlock_spin(&sched_lock); sleepq_release(wchan); } @@ -859,6 +916,76 @@ struct sleepqueue_chain *sc; struct sleepqueue *sq; #ifdef INVARIANTS + struct mtx *lock; +#endif + struct thread *td; + void *wchan; + int i; + + if (!have_addr) + return; + + /* + * First, see if there is an active sleep queue for the wait channel + * indicated by the address. + */ + wchan = (void *)addr; + sc = SC_LOOKUP(wchan); + LIST_FOREACH(sq, &sc->sc_queues, sq_hash) + if (sq->sq_wchan == wchan) + goto found; + + /* + * Second, see if there is an active sleep queue at the address + * indicated. + */ + for (i = 0; i < SC_TABLESIZE; i++) + LIST_FOREACH(sq, &sleepq_chains[i].sc_queues, sq_hash) { + if (sq == (struct sleepqueue *)addr) + goto found; + } + + db_printf("Unable to locate a sleep queue via %p\n", (void *)addr); + return; +found: + db_printf("Wait channel: %p\n", sq->sq_wchan); +#ifdef INVARIANTS + db_printf("Queue type: %d\n", sq->sq_type); + if (sq->sq_lock) { + lock = sq->sq_lock; +#define LOCK_CLASS(lock) (lock)->mtx_object.lo_class + db_printf("Associated Interlock: %p - (%s) %s\n", lock, + LOCK_CLASS(lock)->lc_name, lock->mtx_object.lo_name); + } +#endif + db_printf("Blocked threads:\n"); + for (i = 0; i < NR_SLEEPQS; i++) { + db_printf("\nQueue[%d]:\n", i); + if (TAILQ_EMPTY(&sq->sq_blocked[i])) + db_printf("\tempty\n"); + else + TAILQ_FOREACH(td, &sq->sq_blocked[0], + td_slpq) { + db_printf("\t%p (tid %d, pid %d, \"%s\")\n", td, + td->td_tid, td->td_proc->p_pid, + /*td->td_name[i] != '\0' ? td->td_name :*/ + td->td_proc->p_comm); + } + } +} + +#if 0 +/* Alias 'show sleepqueue' to 'show sleepq'. */ +DB_SET(sleepqueue, db_show_sleepqueue, db_show_cmd_set, 0, NULL); +#endif +#endif + +#ifdef DDB +DB_SHOW_COMMAND(sleepq, db_show_sleepqueue) +{ + struct sleepqueue_chain *sc; + struct sleepqueue *sq; +#ifdef INVARIANTS struct lock_object *lock; #endif struct thread *td; Index: kern/subr_turnstile.c =================================================================== RCS file: /cvs/ncvs/src/sys/kern/subr_turnstile.c,v retrieving revision 1.152.2.5 diff -u -r1.152.2.5 subr_turnstile.c --- kern/subr_turnstile.c 23 Jan 2007 22:16:33 -0000 1.152.2.5 +++ kern/subr_turnstile.c 31 Aug 2007 02:15:23 -0000 @@ -114,7 +114,8 @@ * q - td_contested lock */ struct turnstile { - TAILQ_HEAD(, thread) ts_blocked; /* (c + q) Blocked threads. */ + /* struct mtx ts_lock; */ /* Spin lock for self. */ + TAILQ_HEAD(, thread) ts_blocked[2]; /* (c + q) Blocked threads. */ TAILQ_HEAD(, thread) ts_pending; /* (c) Pending threads. */ LIST_ENTRY(turnstile) ts_hash; /* (c) Chain and free list. */ LIST_ENTRY(turnstile) ts_link; /* (q) Contested locks. */ @@ -143,6 +144,12 @@ static struct mtx td_contested_lock; static struct turnstile_chain turnstile_chains[TC_TABLESIZE]; +/* XXX: stats, remove me */ +static u_int turnstile_nullowners; +SYSCTL_UINT(_debug, OID_AUTO, turnstile_nullowners, CTLFLAG_RD, + &turnstile_nullowners, 0, "called with null owner on a shared queue"); + + static MALLOC_DEFINE(M_TURNSTILE, "turnstiles", "turnstiles"); /* @@ -267,6 +274,7 @@ { struct turnstile_chain *tc; struct thread *td1, *td2; + int queue; mtx_assert(&sched_lock, MA_OWNED); MPASS(TD_ON_LOCK(td)); @@ -300,16 +308,18 @@ * Remove thread from blocked chain and determine where * it should be moved to. */ + queue = td->td_tsqueue; + MPASS(queue == TS_EXCLUSIVE_QUEUE || queue == TS_SHARED_QUEUE); mtx_lock_spin(&td_contested_lock); - TAILQ_REMOVE(&ts->ts_blocked, td, td_lockq); - TAILQ_FOREACH(td1, &ts->ts_blocked, td_lockq) { + TAILQ_REMOVE(&ts->ts_blocked[queue], td, td_lockq); + TAILQ_FOREACH(td1, &ts->ts_blocked[queue], td_lockq) { MPASS(td1->td_proc->p_magic == P_MAGIC); if (td1->td_priority > td->td_priority) break; } if (td1 == NULL) - TAILQ_INSERT_TAIL(&ts->ts_blocked, td, td_lockq); + TAILQ_INSERT_TAIL(&ts->ts_blocked[queue], td, td_lockq); else TAILQ_INSERT_BEFORE(td1, td, td_lockq); mtx_unlock_spin(&td_contested_lock); @@ -412,7 +422,10 @@ * Note that we currently don't try to revoke lent priorities * when our priority goes up. */ - if (td == TAILQ_FIRST(&ts->ts_blocked) && td->td_priority < oldpri) { + MPASS(td->td_tsqueue == TS_EXCLUSIVE_QUEUE || + td->td_tsqueue == TS_SHARED_QUEUE); + if (td == TAILQ_FIRST(&ts->ts_blocked[td->td_tsqueue]) && + td->td_priority < oldpri) { mtx_unlock_spin(&tc->tc_lock); critical_enter(); propagate_priority(td); @@ -429,8 +442,11 @@ { mtx_assert(&td_contested_lock, MA_OWNED); - MPASS(owner->td_proc->p_magic == P_MAGIC); MPASS(ts->ts_owner == NULL); + if (owner == NULL) + return; + + MPASS(owner->td_proc->p_magic == P_MAGIC); ts->ts_owner = owner; LIST_INSERT_HEAD(&owner->td_contested, ts, ts_link); } @@ -444,7 +460,8 @@ struct turnstile *ts; ts = malloc(sizeof(struct turnstile), M_TURNSTILE, M_WAITOK | M_ZERO); - TAILQ_INIT(&ts->ts_blocked); + TAILQ_INIT(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]); + TAILQ_INIT(&ts->ts_blocked[TS_SHARED_QUEUE]); TAILQ_INIT(&ts->ts_pending); LIST_INIT(&ts->ts_free); return (ts); @@ -458,7 +475,8 @@ { MPASS(ts != NULL); - MPASS(TAILQ_EMPTY(&ts->ts_blocked)); + MPASS(TAILQ_EMPTY(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE])); + MPASS(TAILQ_EMPTY(&ts->ts_blocked[TS_SHARED_QUEUE])); MPASS(TAILQ_EMPTY(&ts->ts_pending)); free(ts, M_TURNSTILE); } @@ -507,6 +525,22 @@ } /* + * Return a pointer to the thread waiting on this turnstile with the + * most important priority or NULL if the turnstile has no waiters. + */ +static struct thread * +turnstile_first_waiter(struct turnstile *ts) +{ + struct thread *std, *xtd; + + std = TAILQ_FIRST(&ts->ts_blocked[TS_SHARED_QUEUE]); + xtd = TAILQ_FIRST(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]); + if (xtd == NULL || (std != NULL && std->td_priority < xtd->td_priority)) + return (std); + return (xtd); +} + +/* * Take ownership of a turnstile and adjust the priority of the new * owner appropriately. */ @@ -527,7 +561,7 @@ turnstile_setowner(ts, owner); mtx_unlock_spin(&td_contested_lock); - td = TAILQ_FIRST(&ts->ts_blocked); + td = turnstile_first_waiter(ts); MPASS(td != NULL); MPASS(td->td_proc->p_magic == P_MAGIC); mtx_unlock_spin(&tc->tc_lock); @@ -548,7 +582,7 @@ * turnstile chain locked and will return with it unlocked. */ void -turnstile_wait(struct lock_object *lock, struct thread *owner) +turnstile_wait_queue(struct lock_object *lock, struct thread *owner, int queue) { struct turnstile_chain *tc; struct turnstile *ts; @@ -558,8 +592,13 @@ tc = TC_LOOKUP(lock); mtx_assert(&tc->tc_lock, MA_OWNED); MPASS(td->td_turnstile != NULL); - MPASS(owner != NULL); - MPASS(owner->td_proc->p_magic == P_MAGIC); + if (owner) + MPASS(owner->td_proc->p_magic == P_MAGIC); + /* XXX: stats, remove me */ + if (!owner && queue == TS_SHARED_QUEUE) { + turnstile_nullowners++; + } + MPASS(queue == TS_SHARED_QUEUE || queue == TS_EXCLUSIVE_QUEUE); /* Look up the turnstile associated with the lock 'lock'. */ ts = turnstile_lookup(lock); @@ -582,25 +621,27 @@ LIST_INSERT_HEAD(&tc->tc_turnstiles, ts, ts_hash); KASSERT(TAILQ_EMPTY(&ts->ts_pending), ("thread's turnstile has pending threads")); - KASSERT(TAILQ_EMPTY(&ts->ts_blocked), - ("thread's turnstile has a non-empty queue")); + KASSERT(TAILQ_EMPTY(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]), + ("thread's turnstile has exclusive waiters")); + KASSERT(TAILQ_EMPTY(&ts->ts_blocked[TS_SHARED_QUEUE]), + ("thread's turnstile has shared waiters")); KASSERT(LIST_EMPTY(&ts->ts_free), ("thread's turnstile has a non-empty free list")); KASSERT(ts->ts_lockobj == NULL, ("stale ts_lockobj pointer")); ts->ts_lockobj = lock; mtx_lock_spin(&td_contested_lock); - TAILQ_INSERT_TAIL(&ts->ts_blocked, td, td_lockq); + TAILQ_INSERT_TAIL(&ts->ts_blocked[queue], td, td_lockq); turnstile_setowner(ts, owner); mtx_unlock_spin(&td_contested_lock); } else { - TAILQ_FOREACH(td1, &ts->ts_blocked, td_lockq) + TAILQ_FOREACH(td1, &ts->ts_blocked[queue], td_lockq) if (td1->td_priority > td->td_priority) break; mtx_lock_spin(&td_contested_lock); if (td1 != NULL) TAILQ_INSERT_BEFORE(td1, td, td_lockq); else - TAILQ_INSERT_TAIL(&ts->ts_blocked, td, td_lockq); + TAILQ_INSERT_TAIL(&ts->ts_blocked[queue], td, td_lockq); mtx_unlock_spin(&td_contested_lock); MPASS(td->td_turnstile != NULL); LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash); @@ -664,7 +705,7 @@ * pending list. This must be called with the turnstile chain locked. */ int -turnstile_signal(struct turnstile *ts) +turnstile_signal_queue(struct turnstile *ts, int queue) { struct turnstile_chain *tc; struct thread *td; @@ -675,15 +716,18 @@ MPASS(ts->ts_owner == curthread); tc = TC_LOOKUP(ts->ts_lockobj); mtx_assert(&tc->tc_lock, MA_OWNED); + MPASS(ts->ts_owner == curthread || + (queue == TS_EXCLUSIVE_QUEUE && ts->ts_owner == NULL)); + MPASS(queue == TS_SHARED_QUEUE || queue == TS_EXCLUSIVE_QUEUE); /* * Pick the highest priority thread blocked on this lock and * move it to the pending list. */ - td = TAILQ_FIRST(&ts->ts_blocked); + td = TAILQ_FIRST(&ts->ts_blocked[queue]); MPASS(td->td_proc->p_magic == P_MAGIC); mtx_lock_spin(&td_contested_lock); - TAILQ_REMOVE(&ts->ts_blocked, td, td_lockq); + TAILQ_REMOVE(&ts->ts_blocked[queue], td, td_lockq); mtx_unlock_spin(&td_contested_lock); TAILQ_INSERT_TAIL(&ts->ts_pending, td, td_lockq); @@ -692,7 +736,8 @@ * give it to the about-to-be-woken thread. Otherwise take a * turnstile from the free list and give it to the thread. */ - empty = TAILQ_EMPTY(&ts->ts_blocked); + empty = TAILQ_EMPTY(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]) && + TAILQ_EMPTY(&ts->ts_blocked[TS_SHARED_QUEUE]); if (empty) { MPASS(LIST_EMPTY(&ts->ts_free)); #ifdef TURNSTILE_PROFILING @@ -712,7 +757,7 @@ * the turnstile chain locked. */ void -turnstile_broadcast(struct turnstile *ts) +turnstile_broadcast_queue(struct turnstile *ts, int queue) { struct turnstile_chain *tc; struct turnstile *ts1; @@ -720,15 +765,17 @@ MPASS(ts != NULL); MPASS(curthread->td_proc->p_magic == P_MAGIC); - MPASS(ts->ts_owner == curthread); + MPASS(ts->ts_owner == curthread || + (queue == TS_EXCLUSIVE_QUEUE && ts->ts_owner == NULL)); tc = TC_LOOKUP(ts->ts_lockobj); mtx_assert(&tc->tc_lock, MA_OWNED); + MPASS(queue == TS_SHARED_QUEUE || queue == TS_EXCLUSIVE_QUEUE); /* * Transfer the blocked list to the pending list. */ mtx_lock_spin(&td_contested_lock); - TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked, td_lockq); + TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], td_lockq); mtx_unlock_spin(&td_contested_lock); /* @@ -756,15 +803,17 @@ * chain locked. */ void -turnstile_unpend(struct turnstile *ts) +turnstile_unpend_queue(struct turnstile *ts, int owner_type) { TAILQ_HEAD( ,thread) pending_threads; struct turnstile_chain *tc; + struct turnstile *nts; struct thread *td; u_char cp, pri; MPASS(ts != NULL); - MPASS(ts->ts_owner == curthread); + MPASS(ts->ts_owner == curthread || + (owner_type == TS_SHARED_LOCK && ts->ts_owner == NULL)); tc = TC_LOOKUP(ts->ts_lockobj); mtx_assert(&tc->tc_lock, MA_OWNED); MPASS(!TAILQ_EMPTY(&ts->ts_pending)); @@ -776,7 +825,8 @@ TAILQ_INIT(&pending_threads); TAILQ_CONCAT(&pending_threads, &ts->ts_pending, td_lockq); #ifdef INVARIANTS - if (TAILQ_EMPTY(&ts->ts_blocked)) + if (TAILQ_EMPTY(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]) && + TAILQ_EMPTY(&ts->ts_blocked[TS_SHARED_QUEUE])) ts->ts_lockobj = NULL; #endif @@ -802,8 +852,8 @@ pri = PRI_MAX; mtx_lock_spin(&sched_lock); mtx_lock_spin(&td_contested_lock); - LIST_FOREACH(ts, &td->td_contested, ts_link) { - cp = TAILQ_FIRST(&ts->ts_blocked)->td_priority; + LIST_FOREACH(nts, &td->td_contested, ts_link) { + cp = turnstile_first_waiter(nts)->td_priority; if (cp < pri) pri = cp; } @@ -837,19 +887,69 @@ } /* + * Give up ownership of a turnstile. This must be called with the + * turnstile chain locked. + */ +void +turnstile_disown(struct turnstile *ts) +{ + struct turnstile_chain *tc; + struct thread *td; + u_char cp, pri; + + MPASS(ts != NULL); + MPASS(ts->ts_owner == curthread); + tc = TC_LOOKUP(ts->ts_lockobj); + mtx_assert(&tc->tc_lock, MA_OWNED); + MPASS(TAILQ_EMPTY(&ts->ts_pending)); + MPASS(!TAILQ_EMPTY(&ts->ts_blocked[TS_EXCLUSIVE_QUEUE]) || + !TAILQ_EMPTY(&ts->ts_blocked[TS_SHARED_QUEUE])); + + /* + * Remove the turnstile from this thread's list of contested locks + * since this thread doesn't own it anymore. New threads will + * not be blocking on the turnstile until it is claimed by a new + * owner. + */ + mtx_lock_spin(&td_contested_lock); + ts->ts_owner = NULL; + LIST_REMOVE(ts, ts_link); + mtx_unlock_spin(&td_contested_lock); + + /* + * Adjust the priority of curthread based on other contested + * locks it owns. Don't lower the priority below the base + * priority however. + */ + td = curthread; + pri = PRI_MAX; + mtx_lock_spin(&sched_lock); + mtx_lock_spin(&td_contested_lock); + LIST_FOREACH(ts, &td->td_contested, ts_link) { + cp = turnstile_first_waiter(ts)->td_priority; + if (cp < pri) + pri = cp; + } + mtx_unlock_spin(&td_contested_lock); + sched_unlend_prio(td, pri); + mtx_unlock_spin(&sched_lock); +} + +/* * Return the first thread in a turnstile. */ struct thread * -turnstile_head(struct turnstile *ts) +turnstile_head_queue(struct turnstile *ts, int queue) { #ifdef INVARIANTS struct turnstile_chain *tc; MPASS(ts != NULL); tc = TC_LOOKUP(ts->ts_lockobj); + MPASS(queue == TS_SHARED_QUEUE || queue == TS_EXCLUSIVE_QUEUE); mtx_assert(&tc->tc_lock, MA_OWNED); #endif - return (TAILQ_FIRST(&ts->ts_blocked)); + return (TAILQ_FIRST(&ts->ts_blocked[queue])); } #ifdef DDB @@ -1146,7 +1246,7 @@ * Returns true if a turnstile is empty. */ int -turnstile_empty(struct turnstile *ts) +turnstile_empty_queue(struct turnstile *ts, int queue) { #ifdef INVARIANTS struct turnstile_chain *tc; @@ -1154,6 +1254,7 @@ MPASS(ts != NULL); tc = TC_LOOKUP(ts->ts_lockobj); mtx_assert(&tc->tc_lock, MA_OWNED); + MPASS(queue == TS_SHARED_QUEUE || queue == TS_EXCLUSIVE_QUEUE); #endif - return (TAILQ_EMPTY(&ts->ts_blocked)); + return (TAILQ_EMPTY(&ts->ts_blocked[queue])); } Index: netinet6/in6_src.c =================================================================== RCS file: /cvs/ncvs/src/sys/netinet6/in6_src.c,v retrieving revision 1.30.2.4 diff -u -r1.30.2.4 in6_src.c --- netinet6/in6_src.c 25 Dec 2005 14:03:37 -0000 1.30.2.4 +++ netinet6/in6_src.c 31 Aug 2007 01:23:38 -0000 @@ -76,6 +76,7 @@ #include #include #include +#include #include #include cvs diff: sys/_rwlock.h is a new entry, no comparison available cvs diff: sys/_sx.h is a new entry, no comparison available cvs diff: sys/lock_profile.h is a new entry, no comparison available Index: sys/proc.h =================================================================== RCS file: /cvs/ncvs/src/sys/sys/proc.h,v retrieving revision 1.432.2.10 diff -u -r1.432.2.10 proc.h --- sys/proc.h 11 Jun 2007 11:27:04 -0000 1.432.2.10 +++ sys/proc.h 31 Aug 2007 00:51:44 -0000 @@ -261,12 +261,14 @@ int td_inhibitors; /* (j) Why can not run. */ int td_pflags; /* (k) Private thread (TDP_*) flags. */ int td_dupfd; /* (k) Ret value from fdopen. XXX */ + int td_sqqueue; /* (t) Sleepqueue queue blocked on. */ void *td_wchan; /* (j) Sleep address. */ const char *td_wmesg; /* (j) Reason for sleep. */ u_char td_lastcpu; /* (j) Last cpu we were on. */ u_char td_oncpu; /* (j) Which cpu we are on. */ volatile u_char td_owepreempt; /* (k*) Preempt on last critical_exit */ short td_locks; /* (k) Count of non-spin locks. */ + u_char td_tsqueue; /* (t) Turnstile queue blocked on. */ struct turnstile *td_blocked; /* (j) Lock process is blocked on. */ void *td_ithd; /* (n) Unused, kept to preserve ABI. */ const char *td_lockname; /* (j) Name of lock blocked on. */ @@ -324,7 +326,19 @@ struct mdthread td_md; /* (k) Any machine-dependent fields. */ struct td_sched *td_sched; /* (*) Scheduler-specific data. */ struct kaudit_record *td_ar; /* (k) Active audit record, if any. */ -}; +} __attribute__ ((aligned (16))); /* see comment below */ + +/* + * Comment about aligned attribute on struct thread. + * + * We must force at least 16 byte alignment for "struct thread" + * because the rwlocks and sxlocks expect to use the bottom bits + * of the pointer for bookkeeping information. + * + * This causes problems for the thread0 data structure because it + * may not be properly aligned otherwise. + */ + /* * Flags kept in td_flags: cvs diff: sys/rwlock.h is a new entry, no comparison available Index: sys/sleepqueue.h =================================================================== RCS file: /cvs/ncvs/src/sys/sys/sleepqueue.h,v retrieving revision 1.6.2.1 diff -u -r1.6.2.1 sleepqueue.h --- sys/sleepqueue.h 27 Feb 2006 00:19:39 -0000 1.6.2.1 +++ sys/sleepqueue.h 31 Aug 2007 01:47:52 -0000 @@ -26,15 +26,16 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * - * $FreeBSD: src/sys/sys/sleepqueue.h,v 1.6.2.1 2006/02/27 00:19:39 davidxu Exp $ + * $FreeBSD: src/sys/sys/sleepqueue.h,v 1.12 2007/03/31 23:23:42 jhb Exp $ */ #ifndef _SYS_SLEEPQUEUE_H_ #define _SYS_SLEEPQUEUE_H_ /* - * Sleep queue interface. Sleep/wakeup and condition variables use a sleep - * queue for the queue of threads blocked on a sleep channel. + * Sleep queue interface. Sleep/wakeup, condition variables, and sx + * locks use a sleep queue for the queue of threads blocked on a sleep + * channel. * * A thread calls sleepq_lock() to lock the sleep queue chain associated * with a given wait channel. A thread can then call call sleepq_add() to @@ -84,24 +85,33 @@ #define SLEEPQ_TYPE 0x0ff /* Mask of sleep queue types. */ #define SLEEPQ_MSLEEP 0x00 /* Used by msleep/wakeup. */ #define SLEEPQ_CONDVAR 0x01 /* Used for a cv. */ +#define SLEEPQ_SX 0x03 /* Used by an sx lock. */ #define SLEEPQ_INTERRUPTIBLE 0x100 /* Sleep is interruptible. */ void init_sleepqueues(void); void sleepq_abort(struct thread *td, int intrval); -void sleepq_add(void *, struct mtx *, const char *, int); +void sleepq_add_queue(void *, struct mtx *, const char *, int, int); struct sleepqueue *sleepq_alloc(void); -void sleepq_broadcast(void *, int, int); +void sleepq_broadcast_queue(void *, int, int, int); void sleepq_free(struct sleepqueue *); void sleepq_lock(void *); struct sleepqueue *sleepq_lookup(void *); void sleepq_release(void *); void sleepq_remove(struct thread *, void *); -void sleepq_signal(void *, int, int); +void sleepq_signal_queue(void *, int, int, int); void sleepq_set_timeout(void *wchan, int timo); int sleepq_timedwait(void *wchan); int sleepq_timedwait_sig(void *wchan); void sleepq_wait(void *); int sleepq_wait_sig(void *wchan); +/* Preserve source compat with 6.x */ +#define sleepq_add(wchan, lock, wmesg, flags) \ + sleepq_add_queue(wchan, lock, wmesg, flags, 0) +#define sleepq_broadcast(wchan, flags, pri) \ + sleepq_broadcast_queue(wchan, flags, pri, 0) +#define sleepq_signal(wchan, flags, pri) \ + sleepq_signal_queue(wchan, flags, pri, 0) + #endif /* _KERNEL */ #endif /* !_SYS_SLEEPQUEUE_H_ */ Index: sys/sx.h =================================================================== RCS file: /cvs/ncvs/src/sys/sys/sx.h,v retrieving revision 1.21.2.5 diff -u -r1.21.2.5 sx.h --- sys/sx.h 27 Aug 2007 13:45:35 -0000 1.21.2.5 +++ sys/sx.h 31 Aug 2007 01:02:47 -0000 @@ -1,5 +1,7 @@ /*- - * Copyright (C) 2001 Jason Evans . All rights reserved. + * Copyright (c) 2007 Attilio Rao + * Copyright (c) 2001 Jason Evans + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -24,39 +26,93 @@ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. * - * $FreeBSD: src/sys/sys/sx.h,v 1.21.2.5 2007/08/27 13:45:35 jhb Exp $ + * $FreeBSD: src/sys/sys/sx.h,v 1.37 2007/07/06 13:20:44 attilio Exp $ */ #ifndef _SYS_SX_H_ #define _SYS_SX_H_ -#include #include -#include /* XXX */ +#include +#include -struct sx { - struct lock_object sx_object; /* Common lock properties. */ - struct mtx *sx_lock; /* General protection lock. */ - int sx_cnt; /* -1: xlock, > 0: slock count. */ - struct cv sx_shrd_cv; /* slock waiters. */ - int sx_shrd_wcnt; /* Number of slock waiters. */ - struct cv sx_excl_cv; /* xlock waiters. */ - int sx_excl_wcnt; /* Number of xlock waiters. */ - struct thread *sx_xholder; /* Thread presently holding xlock. */ -}; +#ifdef _KERNEL +#include +#endif + +/* + * In general, the sx locks and rwlocks use very similar algorithms. + * The main difference in the implementations is how threads are + * blocked when a lock is unavailable. For this, sx locks use sleep + * queues which do not support priority propagation, and rwlocks use + * turnstiles which do. + * + * The sx_lock field consists of several fields. The low bit + * indicates if the lock is locked with a shared or exclusive lock. A + * value of 0 indicates an exclusive lock, and a value of 1 indicates + * a shared lock. Bit 1 is a boolean indicating if there are any + * threads waiting for a shared lock. Bit 2 is a boolean indicating + * if there are any threads waiting for an exclusive lock. Bit 3 is a + * boolean indicating if an exclusive lock is recursively held. The + * rest of the variable's definition is dependent on the value of the + * first bit. For an exclusive lock, it is a pointer to the thread + * holding the lock, similar to the mtx_lock field of mutexes. For + * shared locks, it is a count of read locks that are held. + * + * When the lock is not locked by any thread, it is encoded as a + * shared lock with zero waiters. + * + * A note about memory barriers. Exclusive locks need to use the same + * memory barriers as mutexes: _acq when acquiring an exclusive lock + * and _rel when releasing an exclusive lock. On the other side, + * shared lock needs to use an _acq barrier when acquiring the lock + * but, since they don't update any locked data, no memory barrier is + * needed when releasing a shared lock. + */ + +#define SX_LOCK_SHARED 0x01 +#define SX_LOCK_SHARED_WAITERS 0x02 +#define SX_LOCK_EXCLUSIVE_WAITERS 0x04 +#define SX_LOCK_RECURSED 0x08 +#define SX_LOCK_FLAGMASK \ + (SX_LOCK_SHARED | SX_LOCK_SHARED_WAITERS | \ + SX_LOCK_EXCLUSIVE_WAITERS | SX_LOCK_RECURSED) + +#define SX_OWNER(x) ((x) & ~SX_LOCK_FLAGMASK) +#define SX_SHARERS_SHIFT 4 +#define SX_SHARERS(x) (SX_OWNER(x) >> SX_SHARERS_SHIFT) +#define SX_SHARERS_LOCK(x) \ + ((x) << SX_SHARERS_SHIFT | SX_LOCK_SHARED) +#define SX_ONE_SHARER (1 << SX_SHARERS_SHIFT) + +#define SX_LOCK_UNLOCKED SX_SHARERS_LOCK(0) +#define SX_LOCK_DESTROYED \ + (SX_LOCK_SHARED_WAITERS | SX_LOCK_EXCLUSIVE_WAITERS) #ifdef _KERNEL + +/* + * Function prototipes. Routines that start with an underscore are not part + * of the public interface and are wrappered with a macro. + */ void sx_sysinit(void *arg); -void sx_init(struct sx *sx, const char *description); +#define sx_init(sx, desc) sx_init_flags((sx), (desc), 0) +void sx_init_flags(struct sx *sx, const char *description, int opts); void sx_destroy(struct sx *sx); -void _sx_slock(struct sx *sx, const char *file, int line); -void _sx_xlock(struct sx *sx, const char *file, int line); +int _sx_slock(struct sx *sx, int opts, const char *file, int line); +int _sx_xlock(struct sx *sx, int opts, const char *file, int line); int _sx_try_slock(struct sx *sx, const char *file, int line); int _sx_try_xlock(struct sx *sx, const char *file, int line); void _sx_sunlock(struct sx *sx, const char *file, int line); void _sx_xunlock(struct sx *sx, const char *file, int line); int _sx_try_upgrade(struct sx *sx, const char *file, int line); void _sx_downgrade(struct sx *sx, const char *file, int line); +int _sx_xlock_hard(struct sx *sx, uintptr_t tid, int opts, + const char *file, int line); +int _sx_slock_hard(struct sx *sx, int opts, const char *file, int line); +void _sx_xunlock_hard(struct sx *sx, uintptr_t tid, const char *file, int + line); +void _sx_sunlock_hard(struct sx *sx, const char *file, int line); #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT) void _sx_assert(struct sx *sx, int what, const char *file, int line); #endif @@ -79,15 +135,119 @@ SYSUNINIT(name##_sx_sysuninit, SI_SUB_LOCK, SI_ORDER_MIDDLE, \ sx_destroy, (sxa)) -#define sx_xlocked(sx) ((sx)->sx_cnt < 0 && (sx)->sx_xholder == curthread) -#define sx_slock(sx) _sx_slock((sx), LOCK_FILE, LOCK_LINE) -#define sx_xlock(sx) _sx_xlock((sx), LOCK_FILE, LOCK_LINE) +/* + * Full lock operations that are suitable to be inlined in non-debug kernels. + * If the lock can't be acquired or released trivially then the work is + * deferred to 'tougher' functions. + */ + +/* Acquire an exclusive lock. */ +static __inline int +__sx_xlock(struct sx *sx, struct thread *td, int opts, const char *file, + int line) +{ + uintptr_t tid = (uintptr_t)td; + int error = 0; + + if (!atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) + error = _sx_xlock_hard(sx, tid, opts, file, line); + else + lock_profile_obtain_lock_success(&sx->lock_object, 0, 0, file, + line); + + return (error); +} + +/* Release an exclusive lock. */ +static __inline void +__sx_xunlock(struct sx *sx, struct thread *td, const char *file, int line) +{ + uintptr_t tid = (uintptr_t)td; + + if (!atomic_cmpset_rel_ptr(&sx->sx_lock, tid, SX_LOCK_UNLOCKED)) + _sx_xunlock_hard(sx, tid, file, line); +} + +/* Acquire a shared lock. */ +static __inline int +__sx_slock(struct sx *sx, int opts, const char *file, int line) +{ + uintptr_t x = sx->sx_lock; + int error = 0; + + if (!(x & SX_LOCK_SHARED) || + !atomic_cmpset_acq_ptr(&sx->sx_lock, x, x + SX_ONE_SHARER)) + error = _sx_slock_hard(sx, opts, file, line); +#ifdef LOCK_PROFILING_SHARED + else if (SX_SHARERS(x) == 0) + lock_profile_obtain_lock_success(&sx->lock_object, 0, 0, file, + line); +#endif + + return (error); +} + +/* + * Release a shared lock. We can just drop a single shared lock so + * long as we aren't trying to drop the last shared lock when other + * threads are waiting for an exclusive lock. This takes advantage of + * the fact that an unlocked lock is encoded as a shared lock with a + * count of 0. + */ +static __inline void +__sx_sunlock(struct sx *sx, const char *file, int line) +{ + uintptr_t x = sx->sx_lock; + + if (x == (SX_SHARERS_LOCK(1) | SX_LOCK_EXCLUSIVE_WAITERS) || + !atomic_cmpset_ptr(&sx->sx_lock, x, x - SX_ONE_SHARER)) + _sx_sunlock_hard(sx, file, line); +} + +/* + * Public interface for lock operations. + */ +#ifndef LOCK_DEBUG +#error "LOCK_DEBUG not defined, include before " +#endif +#if (LOCK_DEBUG > 0) || defined(SX_NOINLINE) +#define sx_xlock(sx) (void)_sx_xlock((sx), 0, LOCK_FILE, LOCK_LINE) +#define sx_xlock_sig(sx) \ + _sx_xlock((sx), SX_INTERRUPTIBLE, LOCK_FILE, LOCK_LINE) +#define sx_xunlock(sx) _sx_xunlock((sx), LOCK_FILE, LOCK_LINE) +#define sx_slock(sx) (void)_sx_slock((sx), 0, LOCK_FILE, LOCK_LINE) +#define sx_slock_sig(sx) \ + _sx_slock((sx), SX_INTERRUPTIBLE, LOCK_FILE, LOCK_LINE) +#define sx_sunlock(sx) _sx_sunlock((sx), LOCK_FILE, LOCK_LINE) +#else +#define sx_xlock(sx) \ + (void)__sx_xlock((sx), curthread, 0, LOCK_FILE, LOCK_LINE) +#define sx_xlock_sig(sx) \ + __sx_xlock((sx), curthread, SX_INTERRUPTIBLE, LOCK_FILE, LOCK_LINE) +#define sx_xunlock(sx) \ + __sx_xunlock((sx), curthread, LOCK_FILE, LOCK_LINE) +#define sx_slock(sx) (void)__sx_slock((sx), 0, LOCK_FILE, LOCK_LINE) +#define sx_slock_sig(sx) \ + __sx_slock((sx), SX_INTERRUPTIBLE, LOCK_FILE, LOCK_LINE) +#define sx_sunlock(sx) __sx_sunlock((sx), LOCK_FILE, LOCK_LINE) +#endif /* LOCK_DEBUG > 0 || SX_NOINLINE */ #define sx_try_slock(sx) _sx_try_slock((sx), LOCK_FILE, LOCK_LINE) #define sx_try_xlock(sx) _sx_try_xlock((sx), LOCK_FILE, LOCK_LINE) -#define sx_sunlock(sx) _sx_sunlock((sx), LOCK_FILE, LOCK_LINE) -#define sx_xunlock(sx) _sx_xunlock((sx), LOCK_FILE, LOCK_LINE) #define sx_try_upgrade(sx) _sx_try_upgrade((sx), LOCK_FILE, LOCK_LINE) #define sx_downgrade(sx) _sx_downgrade((sx), LOCK_FILE, LOCK_LINE) + +/* + * Return a pointer to the owning thread if the lock is exclusively + * locked. + */ +#define sx_xholder(sx) \ + ((sx)->sx_lock & SX_LOCK_SHARED ? NULL : \ + (struct thread *)SX_OWNER((sx)->sx_lock)) + +#define sx_xlocked(sx) \ + (((sx)->sx_lock & ~(SX_LOCK_FLAGMASK & ~SX_LOCK_SHARED)) == \ + (uintptr_t)curthread) + #define sx_unlock(sx) do { \ if (sx_xlocked(sx)) \ sx_xunlock(sx); \ @@ -95,17 +255,39 @@ sx_sunlock(sx); \ } while (0) +#define sx_sleep(chan, sx, pri, wmesg, timo) \ + _sleep((chan), &(sx)->lock_object, (pri), (wmesg), (timo)) + +/* + * Options passed to sx_init_flags(). + */ +#define SX_DUPOK 0x01 +#define SX_NOPROFILE 0x02 +#define SX_NOWITNESS 0x04 +#define SX_QUIET 0x08 +#define SX_ADAPTIVESPIN 0x10 +#define SX_RECURSE 0x20 + +/* + * Options passed to sx_*lock_hard(). + */ +#define SX_INTERRUPTIBLE 0x40 + #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT) #define SA_LOCKED LA_LOCKED #define SA_SLOCKED LA_SLOCKED #define SA_XLOCKED LA_XLOCKED #define SA_UNLOCKED LA_UNLOCKED +#define SA_RECURSED LA_RECURSED +#define SA_NOTRECURSED LA_NOTRECURSED /* Backwards compatability. */ #define SX_LOCKED LA_LOCKED #define SX_SLOCKED LA_SLOCKED #define SX_XLOCKED LA_XLOCKED #define SX_UNLOCKED LA_UNLOCKED +#define SX_RECURSED LA_RECURSED +#define SX_NOTRECURSED LA_NOTRECURSED #endif #ifdef INVARIANTS Index: sys/turnstile.h =================================================================== RCS file: /cvs/ncvs/src/sys/sys/turnstile.h,v retrieving revision 1.7 diff -u -r1.7 turnstile.h --- sys/turnstile.h 7 Jan 2005 02:29:24 -0000 1.7 +++ sys/turnstile.h 31 Aug 2007 00:39:59 -0000 @@ -73,20 +73,43 @@ #ifdef _KERNEL +/* Which queue to block on or which queue to wakeup one or more threads from. */ +#define TS_EXCLUSIVE_QUEUE 0 +#define TS_SHARED_QUEUE 1 + +/* The type of lock currently held. */ +#define TS_EXCLUSIVE_LOCK TS_EXCLUSIVE_QUEUE +#define TS_SHARED_LOCK TS_SHARED_QUEUE + void init_turnstiles(void); void turnstile_adjust(struct thread *, u_char); struct turnstile *turnstile_alloc(void); -void turnstile_broadcast(struct turnstile *); +#define turnstile_wakeup(turnstile) turnstile_broadcast(turnstile) +#define turnstile_broadcast(turnstile) \ + turnstile_broadcast_queue(turnstile, TS_EXCLUSIVE_QUEUE) +void turnstile_broadcast_queue(struct turnstile *, int); void turnstile_claim(struct lock_object *); -int turnstile_empty(struct turnstile *); +void turnstile_disown(struct turnstile *); +#define turnstile_empty(turnstile) \ + turnstile_empty_queue(turnstile, TS_EXCLUSIVE_QUEUE); +int turnstile_empty_queue(struct turnstile *, int); void turnstile_free(struct turnstile *); -struct thread *turnstile_head(struct turnstile *); +#define turnstile_head(turnstile) \ + turnstile_head_queue(turnstile, TS_EXCLUSIVE_QUEUE) +struct thread *turnstile_head_queue(struct turnstile *, int); void turnstile_lock(struct lock_object *); struct turnstile *turnstile_lookup(struct lock_object *); void turnstile_release(struct lock_object *); -int turnstile_signal(struct turnstile *); -void turnstile_unpend(struct turnstile *); -void turnstile_wait(struct lock_object *, struct thread *); +#define turnstile_signal(turnstile) \ + turnstile_signal_queue(turnstile, TS_EXCLUSIVE_QUEUE) +int turnstile_signal_queue(struct turnstile *, int); +struct turnstile *turnstile_trywait(struct lock_object *); +#define turnstile_unpend(turnstile) \ + turnstile_unpend_queue(turnstile, TS_EXCLUSIVE_QUEUE); +void turnstile_unpend_queue(struct turnstile *, int); +#define turnstile_wait(lock_object, thread) \ + turnstile_wait_queue(lock_object, thread, TS_EXCLUSIVE_QUEUE) +void turnstile_wait_queue(struct lock_object *, struct thread *, int); #endif /* _KERNEL */ #endif /* _SYS_TURNSTILE_H_ */ Index: vm/vm_map.c =================================================================== RCS file: /cvs/ncvs/src/sys/vm/vm_map.c,v retrieving revision 1.366.2.4 diff -u -r1.366.2.4 vm_map.c --- vm/vm_map.c 30 Aug 2007 02:32:04 -0000 1.366.2.4 +++ vm/vm_map.c 31 Aug 2007 03:12:00 -0000 @@ -429,7 +429,7 @@ if (map->system_map) _mtx_lock_flags(&map->system_mtx, 0, file, line); else - _sx_xlock(&map->lock, file, line); + (void) _sx_xlock(&map->lock, 0, file, line); map->timestamp++; } @@ -450,7 +450,7 @@ if (map->system_map) _mtx_lock_flags(&map->system_mtx, 0, file, line); else - _sx_xlock(&map->lock, file, line); + (void) _sx_xlock(&map->lock, 0, file, line); } void --PPYy/fEw/8QCHSq3-- From owner-freebsd-smp@FreeBSD.ORG Fri Aug 31 08:53:41 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C2F316A419; Fri, 31 Aug 2007 08:53:41 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx1.freebsd.org (Postfix) with ESMTP id 6D43713C4A3; Fri, 31 Aug 2007 08:53:40 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <46D7D711.80406@FreeBSD.org> Date: Fri, 31 Aug 2007 10:53:37 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Alfred Perlstein References: <20070831071048.GF87451@elvis.mu.org> In-Reply-To: <20070831071048.GF87451@elvis.mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: attilio@freebsd.org, smp@freebsd.org Subject: Re: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2007 08:53:41 -0000 Alfred Perlstein wrote: > Hi guys, > > Some work here at work was approved for sharing with community so > I'm posting it here in hope of a review. > > We run some pretty good stress testing on our code, so I think it's > pretty solid. > > My only concern is that I've tried my best to preserve kernel source > API, but not binary compat though a few simple #defines. > > I can make binary compat, in albeit a somewhat confusing manner, but > that will require some rototilling and weird renaming of calls to > the sleepq and turnstile code. In short, I'd rather not, but I will > if you think it's something that should be done. > > There's also a few placeholders for lock profiling which I will > very likely be backporting shortly as well. > > Patch is attached. > > Comments/questions? Hmm, I would be happy to see this but I think binary compatibility is actually important here since this is -stable and low-level primitives like sx are probably used all over the place in existing third party modules. Kris From owner-freebsd-smp@FreeBSD.ORG Fri Aug 31 09:42:40 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED83C16A417 for ; Fri, 31 Aug 2007 09:42:40 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.188]) by mx1.freebsd.org (Postfix) with ESMTP id 8453413C442 for ; Fri, 31 Aug 2007 09:42:40 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by nf-out-0910.google.com with SMTP id k4so626844nfd for ; Fri, 31 Aug 2007 02:42:02 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=VebpFZmqPiGZsC7hwRcM+rcPanmqZw3zow0Q3vr5c+Pm5WTd6KpzPlCqe5gCys53yarTs/UWJ2QeZkZh1i9SeSr2ZgCN4EU81U9NXLooOwVn6VTeTI/FFlMU3FDh116+41FWfkNcAjI0JNyqpL+oNUV7OSBOtmRrUkJYAULwd/Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=YXJhSmdkPMbjtZAPRqi0/WY7hF4c/3OXs597QvF/LiW+yzK/7CiPxUV5w7Cd5rohKwhhUyx04taqp+wSlFXqF4k6atI2mxpz0MKHnm8aY8g28avmu05uw8i3ifBUCrDvE+adZ0GLl0hTTcAEEAPUoCOOYaZCdt6HKPoLAyiRz9w= Received: by 10.78.118.5 with SMTP id q5mr1056017huc.1188551602097; Fri, 31 Aug 2007 02:13:22 -0700 (PDT) Received: by 10.78.97.18 with HTTP; Fri, 31 Aug 2007 02:13:22 -0700 (PDT) Message-ID: <3bbf2fe10708310213o297512e8m86e4bb5eb029bf9b@mail.gmail.com> Date: Fri, 31 Aug 2007 11:13:22 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Kris Kennaway" In-Reply-To: <46D7D711.80406@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070831071048.GF87451@elvis.mu.org> <46D7D711.80406@FreeBSD.org> X-Google-Sender-Auth: 7fdf2a7decf47266 Cc: smp@freebsd.org, Alfred Perlstein Subject: Re: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2007 09:42:41 -0000 2007/8/31, Kris Kennaway : > Alfred Perlstein wrote: > > Hi guys, > > > > Some work here at work was approved for sharing with community so > > I'm posting it here in hope of a review. > > > > We run some pretty good stress testing on our code, so I think it's > > pretty solid. > > > > My only concern is that I've tried my best to preserve kernel source > > API, but not binary compat though a few simple #defines. > > > > I can make binary compat, in albeit a somewhat confusing manner, but > > that will require some rototilling and weird renaming of calls to > > the sleepq and turnstile code. In short, I'd rather not, but I will > > if you think it's something that should be done. > > > > There's also a few placeholders for lock profiling which I will > > very likely be backporting shortly as well. > > > > Patch is attached. > > > > Comments/questions? > > Hmm, I would be happy to see this but I think binary compatibility is > actually important here since this is -stable and low-level primitives > like sx are probably used all over the place in existing third party > modules. Regarding to this I think that really we could only backport rwlocks as sxlocks want a new ABI. I will try to give a look at it ASAP, btw thanks for the effort! Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-smp@FreeBSD.ORG Fri Aug 31 16:09:56 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7723016A41B for ; Fri, 31 Aug 2007 16:09:56 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail6.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 250D313C48E for ; Fri, 31 Aug 2007 16:09:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8k2) with ESMTP id 206376366-1834499 for multiple; Fri, 31 Aug 2007 09:14:19 -0400 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id l7VDGBxu070269; Fri, 31 Aug 2007 09:16:11 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: Alfred Perlstein Date: Fri, 31 Aug 2007 08:41:45 -0400 User-Agent: KMail/1.9.6 References: <20070831071048.GF87451@elvis.mu.org> In-Reply-To: <20070831071048.GF87451@elvis.mu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200708310841.46735.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Fri, 31 Aug 2007 09:16:11 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/4110/Thu Aug 30 19:49:40 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: attilio@freebsd.org, smp@freebsd.org Subject: Re: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2007 16:09:56 -0000 On Friday 31 August 2007 03:10:48 am Alfred Perlstein wrote: > Hi guys, > > Some work here at work was approved for sharing with community so > I'm posting it here in hope of a review. > > We run some pretty good stress testing on our code, so I think it's > pretty solid. > > My only concern is that I've tried my best to preserve kernel source > API, but not binary compat though a few simple #defines. The sleepq and turnstile API/ABIs are internal and aren't used in any modules, so there's no need to go through any hoops to preserve them. The ABI of 'struct sx' (it's size) must be maintained however, but your diff didn't include any new files so I can't see if you did that. You can remove all your current #define's, etc. for turnstile and sleepq names. I would leave out the DB_SET() for 'show_sleepq' rather than duplicating the function. Alternatively, you could backport 1.41 of sys/ddb/ddb.h but leave DB_SET() as an alias for DB_FUNC() in 6.x and add a DB_SET2() that is like DB_SET() in 7 and use DB_SET2() for 'show sleepq'. -- John Baldwin From owner-freebsd-smp@FreeBSD.ORG Sat Sep 1 09:18:37 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE7CB16A419; Sat, 1 Sep 2007 09:18:37 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id B402413C469; Sat, 1 Sep 2007 09:18:37 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 2BF581A4D88; Sat, 1 Sep 2007 02:15:25 -0700 (PDT) Date: Sat, 1 Sep 2007 02:15:25 -0700 From: Alfred Perlstein To: Kris Kennaway Message-ID: <20070901091525.GK87451@elvis.mu.org> References: <20070831071048.GF87451@elvis.mu.org> <46D7D711.80406@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46D7D711.80406@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: attilio@freebsd.org, smp@freebsd.org Subject: Re: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Sep 2007 09:18:37 -0000 * Kris Kennaway [070831 01:51] wrote: > Alfred Perlstein wrote: > >Hi guys, > > > >Some work here at work was approved for sharing with community so > >I'm posting it here in hope of a review. > > > >We run some pretty good stress testing on our code, so I think it's > >pretty solid. > > > >My only concern is that I've tried my best to preserve kernel source > >API, but not binary compat though a few simple #defines. > > > >I can make binary compat, in albeit a somewhat confusing manner, but > >that will require some rototilling and weird renaming of calls to > >the sleepq and turnstile code. In short, I'd rather not, but I will > >if you think it's something that should be done. > > > >There's also a few placeholders for lock profiling which I will > >very likely be backporting shortly as well. > > > >Patch is attached. > > > >Comments/questions? > > Hmm, I would be happy to see this but I think binary compatibility is > actually important here since this is -stable and low-level primitives > like sx are probably used all over the place in existing third party > modules. Ah, yes, sorry, the API stuff is the sleep_queue and turnstile apis, not the sx api itself. -Alfred From owner-freebsd-smp@FreeBSD.ORG Sat Sep 1 10:17:35 2007 Return-Path: Delivered-To: smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E95716A417; Sat, 1 Sep 2007 10:17:35 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id EAAB213C468; Sat, 1 Sep 2007 10:17:34 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 715F91A4D8B; Sat, 1 Sep 2007 02:14:38 -0700 (PDT) Date: Sat, 1 Sep 2007 02:14:38 -0700 From: Alfred Perlstein To: John Baldwin Message-ID: <20070901091438.GJ87451@elvis.mu.org> References: <20070831071048.GF87451@elvis.mu.org> <200708310841.46735.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200708310841.46735.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: attilio@freebsd.org, smp@freebsd.org Subject: Re: request for review: backport of sx and rwlocks from 7.0 to 6-stable X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Sep 2007 10:17:35 -0000 * John Baldwin [070831 09:07] wrote: > On Friday 31 August 2007 03:10:48 am Alfred Perlstein wrote: > > Hi guys, > > > > Some work here at work was approved for sharing with community so > > I'm posting it here in hope of a review. > > > > We run some pretty good stress testing on our code, so I think it's > > pretty solid. > > > > My only concern is that I've tried my best to preserve kernel source > > API, but not binary compat though a few simple #defines. > > The sleepq and turnstile API/ABIs are internal and aren't used in any modules, > so there's no need to go through any hoops to preserve them. The ABI > of 'struct sx' (it's size) must be maintained however, but your diff didn't > include any new files so I can't see if you did that. You can remove all > your current #define's, etc. for turnstile and sleepq names. > > I would leave out the DB_SET() for 'show_sleepq' rather than duplicating the > function. Alternatively, you could backport 1.41 of sys/ddb/ddb.h but leave > DB_SET() as an alias for DB_FUNC() in 6.x and add a DB_SET2() that is like > DB_SET() in 7 and use DB_SET2() for 'show sleepq'. > Ok, should have something in a few days. -- - Alfred Perlstein