Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Oct 2012 18:59:42 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>, Jeff Roberson <jeff@freebsd.org>
Subject:   Re: ule+smp: small optimization for turnstile priority lending
Message-ID:  <CAJ-FndCkfF4zDbq6i80P7fCTrVPe%2Bw-x-uqf_-VaWN-Z7Fvg5Q@mail.gmail.com>
In-Reply-To: <CAJ-FndAzuCBCuZurMehDu=OYRCKJ=0RJ8-VcjUE5QRJpxdyWzw@mail.gmail.com>
References:  <50587F8D.9060102@FreeBSD.org> <CAJ-FndCWzTBRYsA0mFDCj8RU06ZUTi3G0LeEFcS9_c5zKd%2BgWQ@mail.gmail.com> <5058C68B.1010508@FreeBSD.org> <CAJ-FndC8j12a95N0QprYTLJLC06R8jjcaHuxEZKu5D9RHW=ZVw@mail.gmail.com> <50596019.5060708@FreeBSD.org> <CAJ-FndBxWYOkRCn-DZTdZ%2BB4RpvsvaxtwDyMc8M7YhRj9DaL2w@mail.gmail.com> <505AF836.7050004@FreeBSD.org> <CAJ-FndAiiH5ToUtns=jC7_V27BkeOvJ8_T9SOmw2DGdFyfvTdg@mail.gmail.com> <506C461F.2050008@FreeBSD.org> <CAJ-FndAzuCBCuZurMehDu=OYRCKJ=0RJ8-VcjUE5QRJpxdyWzw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao <attilio@freebsd.org> wrote:
> On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon <avg@freebsd.org> wrote:
>> on 20/09/2012 16:14 Attilio Rao said the following:
>>> On 9/20/12, Andriy Gapon <avg@freebsd.org> wrote:
>> [snip]
>>>> The patch works well as far as I can tell.  Thank you!
>>>> There is one warning with full witness enables but it appears to be harmless
>>>> (so
>>>> far):
>>>
>>> Andriy,
>>> thanks a lot for your testing and reports you made so far.
>>> Unfortunately I'm going off for 2 weeks now and I won't work on
>>> FreeBSD for that timeframe. I will get back to those in 2 weeks then.
>>> If you want  to play more with this idea feel free to extend/fix/etc.
>>> this patch.
>>
>> Unfortunately I haven't found time to work on this further, but I have some
>> additional thoughts.
>>
>> First, I think that the witness message was not benign and it actually warns about
>> the same kind of deadlock that I originally had.
>> The problem is that sched_lend_prio() is called with target thread's td_lock held,
>> which is a lock of tdq on the thread's CPU.  Then, in your patch, we acquire
>> current tdq's lock to modify its load.  But if two tdq locks are to be held at the
>> same time, then they must be locked using tdq_lock_pair, so that lock order is
>> maintained.  With the patch no tdq lock order can be maintained (can arbitrary)
>> and thus it is possible to run into a deadlock.
>
> Indeed. I realized this after thinking about this problem while I was
> on holiday.
>
>>
>> I see two possibilities here, but don't rule out that there can be more.
>>
>> 1. Do something like my patch does.  That is, manipulate current thread's tdq in
>> advance before any other thread/tdq locks are acquired (or td_lock is changed to
>> point to a different lock and current tdq is unlocked).  The API can be made more
>> generic in nature.  E.g. it can look like this:
>> void
>> sched_thread_block(struct thread *td, int inhibitor)
>> {
>>         struct tdq *tdq;
>>
>>         THREAD_LOCK_ASSERT(td, MA_OWNED);
>>         KASSERT(td == curthread,
>>             ("sched_thread_block: only curthread is supported"));
>>         tdq = TDQ_SELF();
>>         TDQ_LOCK_ASSERT(tdq, MA_OWNED);
>>         MPASS(td->td_lock == TDQ_LOCKPTR(tdq));
>>         TD_SET_INHIB(td, inhibitor);
>>         tdq_load_rem(tdq, td);
>>         tdq_setlowpri(tdq, td);
>> }
>>
>>
>> 2. Try to do things from sched_lend_prio based on curthread's state.  This, as it
>> seems, would require completely lock-less manipulations of current tdq.  E.g. for
>> tdq_load we could use atomic operations (it is already accessed locklessly, but
>> not modified).  But for tdq_lowpri a more elaborate trick would be required like
>> having a separate field for a temporary value.
>
> No, this is not going to work because tdq_lowpri and tdq_load are
> updated and used in the same critical path (and possibly also other
> tdq* members in tdq_runqueue_add(), for example, but I didn't bother
> to also check that).

Ok, so here I wasn't completely right because td_lowpri and tdq_load
are not read in the same critical path (cpu_search() and
sched_pickcpu() in the self case).
However, I was over-estimating a factor in this patch: I don't think
we need to fix real tdq_load for self in that case before cpu_search()
because self is handled in a very special way, with its own
comparison. I then think that a patch local to sched_pickcpu() on the
self path should be enough.

This brings us to another bigger issue, however. tdq_lowpri handling
is not perfect in that patch. Think about the case where the thread to
be switched out is, infact, the lowest priority one. In that case,
what happens is that what we should do is recalculating tdq_lowpri
which is usually a very expensive operation and requires TDQ self
locking to be in place to be brought on correctly.

At this point, I would suggest to use the new version of avg_ule.patch
which doesn't take into account tdq_lowpri. This means the
optimization won't work in all the cases but at least it is a good
trade-off with a more hacky version.

This also let me wonder about the tdq_lowpri check in the self case,
in general. Basically it forces sched_pickcpu() to select self if and
only if the new thread to schedule has an higher priority than the
lowest one on curcpu. Why is that like this? Exactly this check is
used to enforce some sort of fairness?
It would be good if Jeff spends a word or two on this check specifically.

Anyway the patch that implements what suggested, let me know your
thinking about it:
http://www.freebsd.org/~attilio/avg_ule2.patch

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCkfF4zDbq6i80P7fCTrVPe%2Bw-x-uqf_-VaWN-Z7Fvg5Q>