Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Aug 2015 16:27:53 +0200
From:      Philippe Jalaber <pjalaber@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: adaptive rwlock deadlock
Message-ID:  <CA%2Bi3ByKHU33sMN3oM73-H5pCuj3qqNY_VUCRkWWyJ8_X9PvUew@mail.gmail.com>
In-Reply-To: <2768515.JZVZhYiQVE@ralph.baldwin.cx>
References:  <CA%2Bi3ByK8TLb6cRCw3dJgGYCb81ENE=HrgsDX%2BMM-=yVn8P1hgg@mail.gmail.com> <2768515.JZVZhYiQVE@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
2015-08-04 22:10 GMT+02:00 John Baldwin <jhb@freebsd.org>:

> On Tuesday, July 07, 2015 12:10:19 PM Philippe Jalaber wrote:
> > Hi,
> >
> > I am facing a strange problem using the network stack and adaptive
> rwlocks
> > running Freebsd 9.3.
> > Basically I can reproduce the problem with 3 threads:
> >
> > 1) thread 1 has taken the rwlock of structure inpcb in exclusive mode in
> > tcp_input.c. This thread also runs my own code and repeatedly takes a
> > rwlock (called g_rwlock) in shared mode and releases it, until a shared
> > object is marked not "busy" any more:
> >
> > rwlock(inp_lock);
> > ....
> > do { // thread is active waiting in the loop
> >     rlock(g_rwlock);
> >     o = find();
> >     if ( o == NULL )
> >         break;
> >     busy = o.busy;
> >     if (o != NULL && busy)
> >         runlock(g_rwlock);
> > } while ( busy );
> >
> > if ( o != NULL )
> > {
> >     // do something with o
> >     ....
> > }
> > runlock(g_rwlock);
> > ....
> >
> > 2) thread 2 wants to set the shared object as "ready". So it tries to
> take
> > g_rwlock in exclusive mode and is blocked in _rw_wlock_hard@kern_rwlock.c
> :815
> > "turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE)" because thread 1
> has
> > already taken it in shared mode:
> >
> > wlock(g_rwlock);
> > o = find();
> > if ( o != NULL )
> >     o.busy = 1;
> > wunlock(g_rwlock);
> >
> > // o is busy so work on it without any lock
> > ....
> >
> > wlock(g_rwlock); // thread is blocked here
> > o.busy = 0;
> > maybe_delete(o);
> > wunlock(g_rwlock);
> >
> > 3) thread 3 spins on the same inpcb rwlock than thread 1 in
> > _rw_wlock_hard@kern_rwlock.c:721 "while ((struct
> > thread*)RW_OWNER(rw->rw_lock) == owner && TD_IS_RUNNING(owner)) "
> >
> >
> > My target machine has two cpus.
> > Thread 1 is pinned to cpu 0.
> > Thread 2 and Thread 3 are pinned to cpu 1.
> > Thread 1 and Thread 2 have a priority of 28.
> > Thread 3 has a priority of 127
> >
> > Now what seems to happen is that when thread 1 calls runlock(g_rwlock),
> it
> > calls turnstile_broadcast@kern_rwlock.c:650, but thread 2 never regains
> > control because thread 3 is spinning on the inpcb rwlock. Also the
> > condition TD_IS_RUNNING(owner) is always true because thread 1 is active
> > waiting in a loop. So the 3 threads deadlock.
> > Note that if I compile the kernel without adaptive rwlocks it works
> without
> > any problem.
> > A workaround is to add a call to "sched_relinquish(curthread)" in thread
> 1
> > in the loop just after the call to runlock.
>
> It sounds like we are not forcing a preemption on CPU 1 in this case via
> sched_add().
>
> For SCHED_4BSD you could try the 'FULL_PREEMPTION' kernel option.
> For ULE you can adjust 'preempt_thresh' on the fly, though I think the
> default setting should actually still work.
>
> Can you use KTR or some such to determine if IPI_PREEMPT is being sent by
> CPU 0 to CPU 1 in this case?
>
> > I am also wondering about the code in _rw_runlock after
> > "turnstile_broadcast(ts, queue)". Isn't the flag RW_LOCK_WRITE_WAITERS
> > definitely lost if the other thread which is blocked in turnstile_wait
> > never regains control ?
>
> All the write waiters are awakened by a broadcast (as opposed to a signal
> operation).  They are on the run queue, not on the turnstile queue anymore,
> so there aren't any write waiters left (the bit only tracks if there are
> waiters on the turnstile).
>
> --
> John Baldwin
>

I Use ULE scheduler.
Here's the KTR output using ktrdump on a vmcore after watchdog.

  75447 ipi_selected: cpu: 1 ipi: fc
  75446 stop_cpus() with 252 type
  75445 ipi_cpu: cpu: 1 ipi: 2
  75444 ipi_cpu: cpu: 1 ipi: 2
  75443 ipi_cpu: cpu: 1 ipi: 2
  75442 ipi_cpu: cpu: 1 ipi: 2
  75441 ipi_cpu: cpu: 1 ipi: 2
  ....
  3862 ipi_cpu: cpu: 1 ipi: 2
  3861 ipi_cpu: cpu: 1 ipi: 2
  3860 ipi_cpu: cpu: 1 ipi: 2
  3859 ipi_cpu: cpu: 1 ipi: 2
  3858 ipi_cpu: cpu: 1 ipi: 2
  3857 ipi_selected: cpu: 1 ipi: f3
  3856 ipi_cpu: cpu: 1 ipi: 2
  3855 ipi_cpu: cpu: 1 ipi: 2
  3854 ipi_cpu: cpu: 1 ipi: 2
  3853 ipi_selected: cpu: 0 ipi: f3
  3852 ipi_cpu: cpu: 1 ipi: 2
  3851 ipi_selected: cpu: 1 ipi: f3
  3850 ipi_cpu: cpu: 1 ipi: 2
  3849 ipi_cpu: cpu: 1 ipi: 2
  3848 ipi_selected: cpu: 0 ipi: f3
  3847 ipi_cpu: cpu: 1 ipi: 2
  3846 ipi_cpu: cpu: 1 ipi: 2
  3845 ipi_cpu: cpu: 1 ipi: 2
  3844 ipi_cpu: cpu: 1 ipi: 2
  3843 ipi_cpu: cpu: 1 ipi: 2
  3842 ipi_cpu: cpu: 1 ipi: 2
  3841 ipi_cpu: cpu: 1 ipi: 2
  3840 ipi_cpu: cpu: 1 ipi: 2
  3839 ipi_cpu: cpu: 1 ipi: 2
  3838 ipi_cpu: cpu: 1 ipi: 2
  3837 ipi_cpu: cpu: 1 ipi: 2
  3836 ipi_cpu: cpu: 1 ipi: 2
  3835 ipi_cpu: cpu: 0 ipi: 1
  3834 ipi_cpu: cpu: 0 ipi: 1
  3833 ipi_cpu: cpu: 0 ipi: 1
  3832 ipi_cpu: cpu: 0 ipi: 1
  3831 ipi_cpu: cpu: 0 ipi: 1
  3830 ipi_cpu: cpu: 0 ipi: 1



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bi3ByKHU33sMN3oM73-H5pCuj3qqNY_VUCRkWWyJ8_X9PvUew>