From owner-freebsd-current@freebsd.org Fri May 27 19:17:06 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EADEB4C3F9 for ; Fri, 27 May 2016 19:17:06 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x241.google.com (mail-wm0-x241.google.com [IPv6:2a00:1450:400c:c09::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D27901DFD for ; Fri, 27 May 2016 19:17:05 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x241.google.com with SMTP id e3so1172676wme.2 for ; Fri, 27 May 2016 12:17:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=hHiF8LBG5VgtMSe+Vr3PRQn0lA2iGkTeBDmNWmRNdUg=; b=B1sRmErJfcmI3YhziU16sYcvjRf8h3vRJVm8Ueo+XgBdmKtr9+tOBs5wB0OWQCRn+3 HwBhMezXvjZZc/1Qv+KJQ14RhI/dYKnCCDN3KGvESjRGIj7bv7jUpejVQdD2z8TkwHdw NVcQYaiKtV7BYqWO0mPz9iQjyQXNz0Buf5o7Z9uxrrJHyYzmGHNjQbAm6c3Qa+cBWt/X xrah+oyPnake6kg8k+7cTK2yJ7A3/2ck6JQq2ekHQ3a0X+IajitmTPMUQD++8p+HEDMK GLCC7TxiNivNkcIpy8OVd278zIkHjjQm4I6JRsTDK52c4P+AnUJFaEoAYH2qS9IUkq0G Qrbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :mime-version:content-disposition:user-agent; bh=hHiF8LBG5VgtMSe+Vr3PRQn0lA2iGkTeBDmNWmRNdUg=; b=KM4YNLmeFRzDbBJuUewJEeBdhiWMF+3/kxUsnWC2bw0jY+NCcpn3MhAfzGlzTeAufQ AMWPjXth/e4Ct4f03ijS+rSj7qdAof3NC4tv2uaD7QE9mSJYX7TDH/h1H9TDSl0EjnPQ AFA0e+hVhUkIw3in/8tuW2FaswxYz4jcKd+lwh6/jMjMamZndx4ahHmErO/MITsLANxU ifbblkV7x0fSgjoyyg7Zn7Q+1a4PiIKNyxxiBNjNSH4IZMz6oqYylmTAxZcHyUABCvBf CEcj1jXfsKzNYwTHUW05CV44Bv5WRKuqSWX9JI8p60Jq1YZl8cEtFspDrLCGSn621/RV pJDA== X-Gm-Message-State: ALyK8tIWNxM760aLdslDpD9/pMU+wC3T3o1nU+8pDkrbuI83c4N6zkwuV2d7CHoHvRKAwg== X-Received: by 10.28.92.18 with SMTP id q18mr222331wmb.48.1464376624172; Fri, 27 May 2016 12:17:04 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id q125sm9996546wmd.19.2016.05.27.12.17.02 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Fri, 27 May 2016 12:17:03 -0700 (PDT) Date: Fri, 27 May 2016 21:17:01 +0200 From: Mateusz Guzik To: freebsd-current@freebsd.org Subject: [PATCH] microoptimize locking primitives by avoiding unnecessary atomic ops Message-ID: <20160527191700.GA23039@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 May 2016 19:17:06 -0000 Hello there, quite some time ago I posted a trivial patch to locking primitives. What they do is the inline part tries an atomic op and if that fails the actual function is called, which immediately tries the same op. The obvious optimisation checks for the availability of the lock first. There concerns about the way it was done previously by relying on volatile behaving in a specific way. Later a simplified version was posted which should not have the concern, but the thread died. I refer you to https://lists.freebsd.org/pipermail/freebsd-current/2015-November/058100.html for simple benchmark results. I would like to get the patch in before 11 freeze. diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c index 34d7b34..1b10623 100644 --- a/sys/kern/kern_lock.c +++ b/sys/kern/kern_lock.c @@ -787,8 +787,10 @@ __lockmgr_args(struct lock *lk, u_int flags, struct lock_object *ilk, break; } - while (!atomic_cmpset_acq_ptr(&lk->lk_lock, LK_UNLOCKED, - tid)) { + for (;;) { + if (lk->lk_lock == LK_UNLOCKED && + atomic_cmpset_acq_ptr(&lk->lk_lock, LK_UNLOCKED, tid)) + break; #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif @@ -1124,7 +1126,11 @@ __lockmgr_args(struct lock *lk, u_int flags, struct lock_object *ilk, __func__, iwmesg, file, line); } - while (!atomic_cmpset_acq_ptr(&lk->lk_lock, LK_UNLOCKED, tid)) { + for (;;) { + if (lk->lk_lock == LK_UNLOCKED && + atomic_cmpset_acq_ptr(&lk->lk_lock, LK_UNLOCKED, tid)) + break; + #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c index 3f62d17..d167205 100644 --- a/sys/kern/kern_mutex.c +++ b/sys/kern/kern_mutex.c @@ -419,7 +419,9 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t tid, int opts, all_time -= lockstat_nsecs(&m->lock_object); #endif - while (!_mtx_obtain_lock(m, tid)) { + for (;;) { + if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) + break; #ifdef KDTRACE_HOOKS spin_cnt++; #endif @@ -602,8 +604,9 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t tid, int opts, #ifdef KDTRACE_HOOKS spin_time -= lockstat_nsecs(&m->lock_object); #endif - while (!_mtx_obtain_lock(m, tid)) { - + for (;;) { + if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) + break; /* Give interrupts a chance while we spin. */ spinlock_exit(); while (m->mtx_lock != MTX_UNOWNED) { @@ -675,7 +678,9 @@ retry: m->lock_object.lo_name, file, line)); WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL); - while (!_mtx_obtain_lock(m, tid)) { + for (;;) { + if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) + break; if (m->mtx_lock == tid) { m->mtx_recurse++; break; diff --git a/sys/kern/kern_rwlock.c b/sys/kern/kern_rwlock.c index 6541724..6a904d2 100644 --- a/sys/kern/kern_rwlock.c +++ b/sys/kern/kern_rwlock.c @@ -771,7 +771,9 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t tid, const char *file, all_time -= lockstat_nsecs(&rw->lock_object); state = rw->rw_lock; #endif - while (!_rw_write_lock(rw, tid)) { + for (;;) { + if (rw->rw_lock == RW_UNLOCKED && _rw_write_lock(rw, tid)) + break; #ifdef KDTRACE_HOOKS spin_cnt++; #endif diff --git a/sys/kern/kern_sx.c b/sys/kern/kern_sx.c index 96e117b..2a81c04 100644 --- a/sys/kern/kern_sx.c +++ b/sys/kern/kern_sx.c @@ -544,7 +544,10 @@ _sx_xlock_hard(struct sx *sx, uintptr_t tid, int opts, const char *file, all_time -= lockstat_nsecs(&sx->lock_object); state = sx->sx_lock; #endif - while (!atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) { + for (;;) { + if (sx->sx_lock == SX_LOCK_UNLOCKED && + atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) + break; #ifdef KDTRACE_HOOKS spin_cnt++; #endif diff --git a/sys/sys/mutex.h b/sys/sys/mutex.h index a9ec072..0443922 100644 --- a/sys/sys/mutex.h +++ b/sys/sys/mutex.h @@ -185,7 +185,7 @@ void thread_lock_flags_(struct thread *, int, const char *, int); #define __mtx_lock(mp, tid, opts, file, line) do { \ uintptr_t _tid = (uintptr_t)(tid); \ \ - if (!_mtx_obtain_lock((mp), _tid)) \ + if (((mp)->mtx_lock != MTX_UNOWNED || !_mtx_obtain_lock((mp), _tid)))\ _mtx_lock_sleep((mp), _tid, (opts), (file), (line)); \ else \ LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, \ @@ -203,7 +203,7 @@ void thread_lock_flags_(struct thread *, int, const char *, int); uintptr_t _tid = (uintptr_t)(tid); \ \ spinlock_enter(); \ - if (!_mtx_obtain_lock((mp), _tid)) { \ + if (((mp)->mtx_lock != MTX_UNOWNED || !_mtx_obtain_lock((mp), _tid))) {\ if ((mp)->mtx_lock == _tid) \ (mp)->mtx_recurse++; \ else \ @@ -232,7 +232,7 @@ void thread_lock_flags_(struct thread *, int, const char *, int); \ if ((mp)->mtx_recurse == 0) \ LOCKSTAT_PROFILE_RELEASE_LOCK(adaptive__release, mp); \ - if (!_mtx_release_lock((mp), _tid)) \ + if ((mp)->mtx_lock != _tid || !_mtx_release_lock((mp), _tid)) \ _mtx_unlock_sleep((mp), (opts), (file), (line)); \ } while (0) diff --git a/sys/sys/rwlock.h b/sys/sys/rwlock.h index f8947c5..6b4f505 100644 --- a/sys/sys/rwlock.h +++ b/sys/sys/rwlock.h @@ -96,7 +96,7 @@ #define __rw_wlock(rw, tid, file, line) do { \ uintptr_t _tid = (uintptr_t)(tid); \ \ - if (!_rw_write_lock((rw), _tid)) \ + if ((rw)->rw_lock != RW_UNLOCKED || !_rw_write_lock((rw), _tid))\ _rw_wlock_hard((rw), _tid, (file), (line)); \ else \ LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(rw__acquire, rw, \ @@ -112,7 +112,7 @@ else { \ LOCKSTAT_PROFILE_RELEASE_RWLOCK(rw__release, rw, \ LOCKSTAT_WRITER); \ - if (!_rw_write_unlock((rw), _tid)) \ + if ((rw)->rw_lock != _tid || !_rw_write_unlock((rw), _tid))\ _rw_wunlock_hard((rw), _tid, (file), (line)); \ } \ } while (0) diff --git a/sys/sys/sx.h b/sys/sys/sx.h index 03b51d3..57a31d9 100644 --- a/sys/sys/sx.h +++ b/sys/sys/sx.h @@ -150,7 +150,8 @@ __sx_xlock(struct sx *sx, struct thread *td, int opts, const char *file, uintptr_t tid = (uintptr_t)td; int error = 0; - if (!atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) + if (sx->sx_lock != SX_LOCK_UNLOCKED || + !atomic_cmpset_acq_ptr(&sx->sx_lock, SX_LOCK_UNLOCKED, tid)) error = _sx_xlock_hard(sx, tid, opts, file, line); else LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(sx__acquire, sx, @@ -168,7 +169,8 @@ __sx_xunlock(struct sx *sx, struct thread *td, const char *file, int line) if (sx->sx_recurse == 0) LOCKSTAT_PROFILE_RELEASE_RWLOCK(sx__release, sx, LOCKSTAT_WRITER); - if (!atomic_cmpset_rel_ptr(&sx->sx_lock, tid, SX_LOCK_UNLOCKED)) + if (sx->sx_lock != tid || + !atomic_cmpset_rel_ptr(&sx->sx_lock, tid, SX_LOCK_UNLOCKED)) _sx_xunlock_hard(sx, tid, file, line); }