Date: Tue, 6 Nov 2012 11:02:04 +0000 From: Attilio Rao <attilio@freebsd.org> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-hackers <freebsd-hackers@freebsd.org>, Jeff Roberson <jeff@freebsd.org> Subject: Re: ule+smp: small optimization for turnstile priority lending Message-ID: <CAJ-FndA-r2n9cFDZ9k2Y4NKsGWpm_jh41iUzv20eXc_q26MGaw@mail.gmail.com> In-Reply-To: <CAJ-FndCkfF4zDbq6i80P7fCTrVPe%2Bw-x-uqf_-VaWN-Z7Fvg5Q@mail.gmail.com> References: <50587F8D.9060102@FreeBSD.org> <CAJ-FndCWzTBRYsA0mFDCj8RU06ZUTi3G0LeEFcS9_c5zKd%2BgWQ@mail.gmail.com> <5058C68B.1010508@FreeBSD.org> <CAJ-FndC8j12a95N0QprYTLJLC06R8jjcaHuxEZKu5D9RHW=ZVw@mail.gmail.com> <50596019.5060708@FreeBSD.org> <CAJ-FndBxWYOkRCn-DZTdZ%2BB4RpvsvaxtwDyMc8M7YhRj9DaL2w@mail.gmail.com> <505AF836.7050004@FreeBSD.org> <CAJ-FndAiiH5ToUtns=jC7_V27BkeOvJ8_T9SOmw2DGdFyfvTdg@mail.gmail.com> <506C461F.2050008@FreeBSD.org> <CAJ-FndAzuCBCuZurMehDu=OYRCKJ=0RJ8-VcjUE5QRJpxdyWzw@mail.gmail.com> <CAJ-FndCkfF4zDbq6i80P7fCTrVPe%2Bw-x-uqf_-VaWN-Z7Fvg5Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/12, Attilio Rao <attilio@freebsd.org> wrote: > On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao <attilio@freebsd.org> wrote: >> On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon <avg@freebsd.org> wrote: >>> on 20/09/2012 16:14 Attilio Rao said the following: >>>> On 9/20/12, Andriy Gapon <avg@freebsd.org> wrote: >>> [snip] >>>>> The patch works well as far as I can tell. Thank you! >>>>> There is one warning with full witness enables but it appears to be >>>>> harmless >>>>> (so >>>>> far): >>>> >>>> Andriy, >>>> thanks a lot for your testing and reports you made so far. >>>> Unfortunately I'm going off for 2 weeks now and I won't work on >>>> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >>>> If you want to play more with this idea feel free to extend/fix/etc. >>>> this patch. >>> >>> Unfortunately I haven't found time to work on this further, but I have >>> some >>> additional thoughts. >>> >>> First, I think that the witness message was not benign and it actually >>> warns about >>> the same kind of deadlock that I originally had. >>> The problem is that sched_lend_prio() is called with target thread's >>> td_lock held, >>> which is a lock of tdq on the thread's CPU. Then, in your patch, we >>> acquire >>> current tdq's lock to modify its load. But if two tdq locks are to be >>> held at the >>> same time, then they must be locked using tdq_lock_pair, so that lock >>> order is >>> maintained. With the patch no tdq lock order can be maintained (can >>> arbitrary) >>> and thus it is possible to run into a deadlock. >> >> Indeed. I realized this after thinking about this problem while I was >> on holiday. >> >>> >>> I see two possibilities here, but don't rule out that there can be more. >>> >>> 1. Do something like my patch does. That is, manipulate current thread's >>> tdq in >>> advance before any other thread/tdq locks are acquired (or td_lock is >>> changed to >>> point to a different lock and current tdq is unlocked). The API can be >>> made more >>> generic in nature. E.g. it can look like this: >>> void >>> sched_thread_block(struct thread *td, int inhibitor) >>> { >>> struct tdq *tdq; >>> >>> THREAD_LOCK_ASSERT(td, MA_OWNED); >>> KASSERT(td == curthread, >>> ("sched_thread_block: only curthread is supported")); >>> tdq = TDQ_SELF(); >>> TDQ_LOCK_ASSERT(tdq, MA_OWNED); >>> MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >>> TD_SET_INHIB(td, inhibitor); >>> tdq_load_rem(tdq, td); >>> tdq_setlowpri(tdq, td); >>> } >>> >>> >>> 2. Try to do things from sched_lend_prio based on curthread's state. >>> This, as it >>> seems, would require completely lock-less manipulations of current tdq. >>> E.g. for >>> tdq_load we could use atomic operations (it is already accessed >>> locklessly, but >>> not modified). But for tdq_lowpri a more elaborate trick would be >>> required like >>> having a separate field for a temporary value. >> >> No, this is not going to work because tdq_lowpri and tdq_load are >> updated and used in the same critical path (and possibly also other >> tdq* members in tdq_runqueue_add(), for example, but I didn't bother >> to also check that). > > Ok, so here I wasn't completely right because td_lowpri and tdq_load > are not read in the same critical path (cpu_search() and > sched_pickcpu() in the self case). > However, I was over-estimating a factor in this patch: I don't think > we need to fix real tdq_load for self in that case before cpu_search() > because self is handled in a very special way, with its own > comparison. I then think that a patch local to sched_pickcpu() on the > self path should be enough. > > This brings us to another bigger issue, however. tdq_lowpri handling > is not perfect in that patch. Think about the case where the thread to > be switched out is, infact, the lowest priority one. In that case, > what happens is that what we should do is recalculating tdq_lowpri > which is usually a very expensive operation and requires TDQ self > locking to be in place to be brought on correctly. I think that the easy fix for this is to implement a second field called tdq_sec_lowpri where we store the next thread to be scheduled, after tdq_lowpri. This is going to not really change anything because we will get it for free when we already calculate tdq_lowpri and it will let us the implement a definitive logic that helps in the case you describe. However I still want to think about the base idea behind the current algorithm in the self/likely cpu pickup. Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndA-r2n9cFDZ9k2Y4NKsGWpm_jh41iUzv20eXc_q26MGaw>