Date: Thu, 6 Aug 2015 10:09:51 +0200 From: Philippe Jalaber <pjalaber@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: adaptive rwlock deadlock Message-ID: <CA%2Bi3ByLROuP=nU_gXGBcAuEoC653QNiC9Rth4_U%2BT5WpEUbfXA@mail.gmail.com> In-Reply-To: <1902697.ny7xAkAVI4@ralph.baldwin.cx> References: <CA%2Bi3ByK8TLb6cRCw3dJgGYCb81ENE=HrgsDX%2BMM-=yVn8P1hgg@mail.gmail.com> <2768515.JZVZhYiQVE@ralph.baldwin.cx> <CA%2Bi3ByKHU33sMN3oM73-H5pCuj3qqNY_VUCRkWWyJ8_X9PvUew@mail.gmail.com> <1902697.ny7xAkAVI4@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
2015-08-05 17:41 GMT+02:00 John Baldwin <jhb@freebsd.org>: > On Wednesday, August 05, 2015 04:27:53 PM Philippe Jalaber wrote: > > 2015-08-04 22:10 GMT+02:00 John Baldwin <jhb@freebsd.org>: > > > > > On Tuesday, July 07, 2015 12:10:19 PM Philippe Jalaber wrote: > > > > Hi, > > > > > > > > I am facing a strange problem using the network stack and adaptive > > > rwlocks > > > > running Freebsd 9.3. > > > > Basically I can reproduce the problem with 3 threads: > > > > > > > > 1) thread 1 has taken the rwlock of structure inpcb in exclusive > mode in > > > > tcp_input.c. This thread also runs my own code and repeatedly takes a > > > > rwlock (called g_rwlock) in shared mode and releases it, until a > shared > > > > object is marked not "busy" any more: > > > > > > > > rwlock(inp_lock); > > > > .... > > > > do { // thread is active waiting in the loop > > > > rlock(g_rwlock); > > > > o = find(); > > > > if ( o == NULL ) > > > > break; > > > > busy = o.busy; > > > > if (o != NULL && busy) > > > > runlock(g_rwlock); > > > > } while ( busy ); > > > > > > > > if ( o != NULL ) > > > > { > > > > // do something with o > > > > .... > > > > } > > > > runlock(g_rwlock); > > > > .... > > > > > > > > 2) thread 2 wants to set the shared object as "ready". So it tries to > > > take > > > > g_rwlock in exclusive mode and is blocked in > _rw_wlock_hard@kern_rwlock.c > > > :815 > > > > "turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE)" because > thread 1 > > > has > > > > already taken it in shared mode: > > > > > > > > wlock(g_rwlock); > > > > o = find(); > > > > if ( o != NULL ) > > > > o.busy = 1; > > > > wunlock(g_rwlock); > > > > > > > > // o is busy so work on it without any lock > > > > .... > > > > > > > > wlock(g_rwlock); // thread is blocked here > > > > o.busy = 0; > > > > maybe_delete(o); > > > > wunlock(g_rwlock); > > > > > > > > 3) thread 3 spins on the same inpcb rwlock than thread 1 in > > > > _rw_wlock_hard@kern_rwlock.c:721 "while ((struct > > > > thread*)RW_OWNER(rw->rw_lock) == owner && TD_IS_RUNNING(owner)) " > > > > > > > > > > > > My target machine has two cpus. > > > > Thread 1 is pinned to cpu 0. > > > > Thread 2 and Thread 3 are pinned to cpu 1. > > > > Thread 1 and Thread 2 have a priority of 28. > > > > Thread 3 has a priority of 127 > > > > > > > > Now what seems to happen is that when thread 1 calls > runlock(g_rwlock), > > > it > > > > calls turnstile_broadcast@kern_rwlock.c:650, but thread 2 never > regains > > > > control because thread 3 is spinning on the inpcb rwlock. Also the > > > > condition TD_IS_RUNNING(owner) is always true because thread 1 is > active > > > > waiting in a loop. So the 3 threads deadlock. > > > > Note that if I compile the kernel without adaptive rwlocks it works > > > without > > > > any problem. > > > > A workaround is to add a call to "sched_relinquish(curthread)" in > thread > > > 1 > > > > in the loop just after the call to runlock. > > > > > > It sounds like we are not forcing a preemption on CPU 1 in this case > via > > > sched_add(). > > > > > > For SCHED_4BSD you could try the 'FULL_PREEMPTION' kernel option. > > > For ULE you can adjust 'preempt_thresh' on the fly, though I think the > > > default setting should actually still work. > > > > > > Can you use KTR or some such to determine if IPI_PREEMPT is being sent > by > > > CPU 0 to CPU 1 in this case? > > > > > > > I am also wondering about the code in _rw_runlock after > > > > "turnstile_broadcast(ts, queue)". Isn't the flag > RW_LOCK_WRITE_WAITERS > > > > definitely lost if the other thread which is blocked in > turnstile_wait > > > > never regains control ? > > > > > > All the write waiters are awakened by a broadcast (as opposed to a > signal > > > operation). They are on the run queue, not on the turnstile queue > anymore, > > > so there aren't any write waiters left (the bit only tracks if there > are > > > waiters on the turnstile). > > > > > > -- > > > John Baldwin > > > > > > > I Use ULE scheduler. > > Here's the KTR output using ktrdump on a vmcore after watchdog. > > > > 75447 ipi_selected: cpu: 1 ipi: fc > > 75446 stop_cpus() with 252 type > > 75445 ipi_cpu: cpu: 1 ipi: 2 > > 75444 ipi_cpu: cpu: 1 ipi: 2 > > 75443 ipi_cpu: cpu: 1 ipi: 2 > > 75442 ipi_cpu: cpu: 1 ipi: 2 > > 75441 ipi_cpu: cpu: 1 ipi: 2 > > .... > > 3862 ipi_cpu: cpu: 1 ipi: 2 > > 3861 ipi_cpu: cpu: 1 ipi: 2 > > 3860 ipi_cpu: cpu: 1 ipi: 2 > > 3859 ipi_cpu: cpu: 1 ipi: 2 > > 3858 ipi_cpu: cpu: 1 ipi: 2 > > 3857 ipi_selected: cpu: 1 ipi: f3 > > 3856 ipi_cpu: cpu: 1 ipi: 2 > > 3855 ipi_cpu: cpu: 1 ipi: 2 > > 3854 ipi_cpu: cpu: 1 ipi: 2 > > 3853 ipi_selected: cpu: 0 ipi: f3 > > 3852 ipi_cpu: cpu: 1 ipi: 2 > > 3851 ipi_selected: cpu: 1 ipi: f3 > > 3850 ipi_cpu: cpu: 1 ipi: 2 > > 3849 ipi_cpu: cpu: 1 ipi: 2 > > 3848 ipi_selected: cpu: 0 ipi: f3 > > 3847 ipi_cpu: cpu: 1 ipi: 2 > > 3846 ipi_cpu: cpu: 1 ipi: 2 > > 3845 ipi_cpu: cpu: 1 ipi: 2 > > 3844 ipi_cpu: cpu: 1 ipi: 2 > > 3843 ipi_cpu: cpu: 1 ipi: 2 > > 3842 ipi_cpu: cpu: 1 ipi: 2 > > 3841 ipi_cpu: cpu: 1 ipi: 2 > > 3840 ipi_cpu: cpu: 1 ipi: 2 > > 3839 ipi_cpu: cpu: 1 ipi: 2 > > 3838 ipi_cpu: cpu: 1 ipi: 2 > > 3837 ipi_cpu: cpu: 1 ipi: 2 > > 3836 ipi_cpu: cpu: 1 ipi: 2 > > 3835 ipi_cpu: cpu: 0 ipi: 1 > > 3834 ipi_cpu: cpu: 0 ipi: 1 > > 3833 ipi_cpu: cpu: 0 ipi: 1 > > 3832 ipi_cpu: cpu: 0 ipi: 1 > > 3831 ipi_cpu: cpu: 0 ipi: 1 > > 3830 ipi_cpu: cpu: 0 ipi: 1 > > Unfortunately this has a lot of other noise. Can you add some > traces specifically in sched_ule in tdq_notify to note that it > is deciding to notify a CPU due to scheduling a thread? > > -- > John Baldwin > OK. Here's the patch I have used in tdq_notify on Freebsd 9.3: --- kern/sched_ule.c 2015-08-06 09:03:49.000000000 +0200 +++ kern/sched_ule.c 2015-08-06 09:03:38.000000000 +0200 @@ -1020,6 +1020,7 @@ return; } tdq->tdq_ipipending = 1; + CTR3(KTR_SMP, "%s: cpu: %d IPI_PREEMPT=%x", __func__, cpu, IPI_PREEMPT); ipi_cpu(cpu, IPI_PREEMPT); } Please find the output of ktrdump in attachment. The last IPI_PREEMPT sent is at index #9424. Hope it helps.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bi3ByLROuP=nU_gXGBcAuEoC653QNiC9Rth4_U%2BT5WpEUbfXA>