From owner-freebsd-current@FreeBSD.ORG  Tue Jan 29 15:42:33 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 4116F339;
 Tue, 29 Jan 2013 15:42:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 10AD82EC;
 Tue, 29 Jan 2013 15:42:33 +0000 (UTC)
Received: from pakbsde14.localnet (unknown [38.105.238.108])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 74F34B91E;
 Tue, 29 Jan 2013 10:42:32 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Ryan Stone <rysto32@gmail.com>
Subject: Re: Why don't we check for preemption when we unlend priority?
Date: Tue, 29 Jan 2013 09:59:36 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p22; KDE/4.5.5; amd64; ; )
References: <CAFMmRNzzYUm1f09kMSQnfYUVUH+hPLOZqcE6dCq0UsC+iGD31A@mail.gmail.com>
In-Reply-To: <CAFMmRNzzYUm1f09kMSQnfYUVUH+hPLOZqcE6dCq0UsC+iGD31A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201301290959.37166.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 29 Jan 2013 10:42:32 -0500 (EST)
Cc: Attilio Rao <attilio@freebsd.org>,
 FreeBSD Current <freebsd-current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Jan 2013 15:42:33 -0000

On Friday, January 25, 2013 3:01:14 pm Ryan Stone wrote:
> I'm having a problem where userland threads are running with a loaned (via
> mutex priority propagation) priority even after they have released the
> mutex.  This is causing the low-priority userland thread to starve
> high-priority interrupt threads.  One scenario is(which I am seeing on
> FreeBSD 8.2, but the priority lending code does not seem to have changed in
> HEAD):
> 
> 1) I have several swi threads that are pulling packets off of interfaces
> and forwarding them.  Each swi thread is pinned to a CPU.
> 
> 2) A remote user initiates an scp of a large file to this machine.
> 
> 3) sshd (which handles the scp) takes a mutex on its socket
> 
> 4) An interrupt or taskqueue thread belonging to the driver that is
> receiving the scp traffic tries to take the same mutex.  It can't and so it
> lends its priority to the sshd thread.
> 
> 5) My swi thread is woken.  It is pinned to the same CPU as sshd, but it
> has a lower priority than the lent priority, so it is placed on the
> runqueue.
> 
> 6) The sshd thread releases the mutex.  sched_unlend_prio is called to set
> its priority back to a normal user priority.  However, it does not check
> whether any higher-priority threads are waiting in the runqueue.
> 
> 7) The sshd thread is allowed to run for its full quantum (100ms).  This is
> easily long enough for my swi thread to start and I drop packets.

Well, I think there is an implicit assumption that the only higher priority
things are ones that are about to be woken up later in turnstile_unpend().
The order of operations in turnstile_unpend() should handle the case where
one of the threads waiting on the lock needs your inherited priority.  In
the absence of CPU pinning this would indeed work correctly because if the
current thread's priority is lowered, it is definitely going to preempt to
the thread it borrowed it from, and when that thread finishes running it will
examine the run queues to find which thread to run next.  CPU pinning has 
broken that assumption however.

I can't fully evaulate your patch, though sched_unlend_prio should only be
called on curthread, so I think that your sched_check_preempt() should always
see tdq == TDQ_SELF()?

> diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
> index 01559ee..2659614 100644
> --- a/sys/kern/sched_ule.c
> +++ b/sys/kern/sched_ule.c
> @@ -1586,6 +1586,22 @@ sched_pctcpu_update(struct td_sched *ts)
>         ts->ts_ftick = ts->ts_ltick - SCHED_TICK_TARG;
>  }
> 
> +static void
> +sched_check_preempt(struct tdq *tdq, struct thread *td)
> +{
> +
> +       KASSERT(TD_IS_RUNNING(td), ("thread is not running"));
> +       TDQ_LOCK_ASSERT(tdq, MA_OWNED);
> +       KASSERT(tdq == TDQ_CPU(td->td_sched->ts_cpu), ("tdq does not td"));
> +
> +       if (tdq == TDQ_SELF()) {
> +               if (td->td_priority > tdq->tdq_lowpri &&
> +                   sched_shouldpreempt(tdq->tdq_lowpri, td->td_priority,
> 0))
> +                       td->td_owepreempt = 1;
> +       } else
> +               tdq_notify(tdq, td);
> +}
> +
>  /*
>   * Adjust the priority of a thread.  Move it to the appropriate run-queue
>   * if necessary.  This is the back-end for several priority related
> @@ -1635,8 +1651,10 @@ sched_thread_priority(struct thread *td, u_char prio)
>                 td->td_priority = prio;
>                 if (prio < tdq->tdq_lowpri)
>                         tdq->tdq_lowpri = prio;
> -               else if (tdq->tdq_lowpri == oldpri)
> +               else if (tdq->tdq_lowpri == oldpri) {
>                         tdq_setlowpri(tdq, td);
> +                       sched_check_preempt(tdq, td);
> +               }
>                 return;
>         }
>         td->td_priority = prio;
> 

-- 
John Baldwin