From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 29 16:57:01 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A21AEE72; Mon, 29 Oct 2012 16:57:01 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9EE7B8FC14; Mon, 29 Oct 2012 16:57:00 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id e12so5011235lag.13 for ; Mon, 29 Oct 2012 09:56:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=I6506+vnYFSYAdhqeMC/wSJhEyjilt+0hTZ8gWfNIxA=; b=GNiXxV57HYH7dP1/rHNBjCB4fUe0Xmk8/ONMBOnQVppd8BoJqZeYvYTSAOHOoMXnZh 2Wg0+nYK0jutPwW23BIJk4ZZTxqFr+UttBQKMMssOvByf3+vukwooNUwgPd9RSSejkUa j6b5Pdwae4jr4Ro289R5OEqXFG0zolYkpBOgRSiGYENPCEW1SLL6mVI0eSNFQEap6gf8 Q8g3+/ui5RZWKQNskT/ZNiUEya9EiCJCCNctX+Pcb+4ZzGwfYq/oxTlGdMdSuFlpS0es 9Gfn1oDjpuQ5AKcyPKNIcFtk0UmYKWpLi82jtshBN5NaVWOj2o6uv4VpFLQsHvyksCDR l4Qg== MIME-Version: 1.0 Received: by 10.112.26.67 with SMTP id j3mr12382188lbg.39.1351529819488; Mon, 29 Oct 2012 09:56:59 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.30.37 with HTTP; Mon, 29 Oct 2012 09:56:59 -0700 (PDT) In-Reply-To: <506C461F.2050008@FreeBSD.org> References: <50587F8D.9060102@FreeBSD.org> <5058C68B.1010508@FreeBSD.org> <50596019.5060708@FreeBSD.org> <505AF836.7050004@FreeBSD.org> <506C461F.2050008@FreeBSD.org> Date: Mon, 29 Oct 2012 16:56:59 +0000 X-Google-Sender-Auth: LkoTfcF2YQDq5-0J9gdwj01UQ9c Message-ID: Subject: Re: ule+smp: small optimization for turnstile priority lending From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers , Jeff Roberson X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2012 16:57:01 -0000 On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon wrote: > on 20/09/2012 16:14 Attilio Rao said the following: >> On 9/20/12, Andriy Gapon wrote: > [snip] >>> The patch works well as far as I can tell. Thank you! >>> There is one warning with full witness enables but it appears to be harmless >>> (so >>> far): >> >> Andriy, >> thanks a lot for your testing and reports you made so far. >> Unfortunately I'm going off for 2 weeks now and I won't work on >> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >> If you want to play more with this idea feel free to extend/fix/etc. >> this patch. > > Unfortunately I haven't found time to work on this further, but I have some > additional thoughts. > > First, I think that the witness message was not benign and it actually warns about > the same kind of deadlock that I originally had. > The problem is that sched_lend_prio() is called with target thread's td_lock held, > which is a lock of tdq on the thread's CPU. Then, in your patch, we acquire > current tdq's lock to modify its load. But if two tdq locks are to be held at the > same time, then they must be locked using tdq_lock_pair, so that lock order is > maintained. With the patch no tdq lock order can be maintained (can arbitrary) > and thus it is possible to run into a deadlock. Indeed. I realized this after thinking about this problem while I was on holiday. > > I see two possibilities here, but don't rule out that there can be more. > > 1. Do something like my patch does. That is, manipulate current thread's tdq in > advance before any other thread/tdq locks are acquired (or td_lock is changed to > point to a different lock and current tdq is unlocked). The API can be made more > generic in nature. E.g. it can look like this: > void > sched_thread_block(struct thread *td, int inhibitor) > { > struct tdq *tdq; > > THREAD_LOCK_ASSERT(td, MA_OWNED); > KASSERT(td == curthread, > ("sched_thread_block: only curthread is supported")); > tdq = TDQ_SELF(); > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); > TD_SET_INHIB(td, inhibitor); > tdq_load_rem(tdq, td); > tdq_setlowpri(tdq, td); > } > > > 2. Try to do things from sched_lend_prio based on curthread's state. This, as it > seems, would require completely lock-less manipulations of current tdq. E.g. for > tdq_load we could use atomic operations (it is already accessed locklessly, but > not modified). But for tdq_lowpri a more elaborate trick would be required like > having a separate field for a temporary value. No, this is not going to work because tdq_lowpri and tdq_load are updated and used in the same critical path (and possibly also other tdq* members in tdq_runqueue_add(), for example, but I didn't bother to also check that). Let me think some more about this and get back to you. Attilio -- Peace can only be achieved by understanding - A. Einstein