From owner-freebsd-hackers@freebsd.org Wed Aug 5 00:04:28 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A72DA9B37BC for ; Wed, 5 Aug 2015 00:04:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 80B5C110B for ; Wed, 5 Aug 2015 00:04:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (75-48-78-19.lightspeed.cncrca.sbcglobal.net [75.48.78.19]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7B72BB99B; Tue, 4 Aug 2015 20:04:27 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Cc: Philippe Jalaber Subject: Re: adaptive rwlock deadlock Date: Tue, 04 Aug 2015 13:10:50 -0700 Message-ID: <2768515.JZVZhYiQVE@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 04 Aug 2015 20:04:27 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Aug 2015 00:04:28 -0000 On Tuesday, July 07, 2015 12:10:19 PM Philippe Jalaber wrote: > Hi, > > I am facing a strange problem using the network stack and adaptive rwlocks > running Freebsd 9.3. > Basically I can reproduce the problem with 3 threads: > > 1) thread 1 has taken the rwlock of structure inpcb in exclusive mode in > tcp_input.c. This thread also runs my own code and repeatedly takes a > rwlock (called g_rwlock) in shared mode and releases it, until a shared > object is marked not "busy" any more: > > rwlock(inp_lock); > .... > do { // thread is active waiting in the loop > rlock(g_rwlock); > o = find(); > if ( o == NULL ) > break; > busy = o.busy; > if (o != NULL && busy) > runlock(g_rwlock); > } while ( busy ); > > if ( o != NULL ) > { > // do something with o > .... > } > runlock(g_rwlock); > .... > > 2) thread 2 wants to set the shared object as "ready". So it tries to take > g_rwlock in exclusive mode and is blocked in _rw_wlock_hard@kern_rwlock.c:815 > "turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE)" because thread 1 has > already taken it in shared mode: > > wlock(g_rwlock); > o = find(); > if ( o != NULL ) > o.busy = 1; > wunlock(g_rwlock); > > // o is busy so work on it without any lock > .... > > wlock(g_rwlock); // thread is blocked here > o.busy = 0; > maybe_delete(o); > wunlock(g_rwlock); > > 3) thread 3 spins on the same inpcb rwlock than thread 1 in > _rw_wlock_hard@kern_rwlock.c:721 "while ((struct > thread*)RW_OWNER(rw->rw_lock) == owner && TD_IS_RUNNING(owner)) " > > > My target machine has two cpus. > Thread 1 is pinned to cpu 0. > Thread 2 and Thread 3 are pinned to cpu 1. > Thread 1 and Thread 2 have a priority of 28. > Thread 3 has a priority of 127 > > Now what seems to happen is that when thread 1 calls runlock(g_rwlock), it > calls turnstile_broadcast@kern_rwlock.c:650, but thread 2 never regains > control because thread 3 is spinning on the inpcb rwlock. Also the > condition TD_IS_RUNNING(owner) is always true because thread 1 is active > waiting in a loop. So the 3 threads deadlock. > Note that if I compile the kernel without adaptive rwlocks it works without > any problem. > A workaround is to add a call to "sched_relinquish(curthread)" in thread 1 > in the loop just after the call to runlock. It sounds like we are not forcing a preemption on CPU 1 in this case via sched_add(). For SCHED_4BSD you could try the 'FULL_PREEMPTION' kernel option. For ULE you can adjust 'preempt_thresh' on the fly, though I think the default setting should actually still work. Can you use KTR or some such to determine if IPI_PREEMPT is being sent by CPU 0 to CPU 1 in this case? > I am also wondering about the code in _rw_runlock after > "turnstile_broadcast(ts, queue)". Isn't the flag RW_LOCK_WRITE_WAITERS > definitely lost if the other thread which is blocked in turnstile_wait > never regains control ? All the write waiters are awakened by a broadcast (as opposed to a signal operation). They are on the run queue, not on the turnstile queue anymore, so there aren't any write waiters left (the bit only tracks if there are waiters on the turnstile). -- John Baldwin