From owner-freebsd-arch@freebsd.org  Sat Aug 26 17:50:25 2017
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02D06DD6A9B
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Sat, 26 Aug 2017 17:50:25 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id B683072746;
 Sat, 26 Aug 2017 17:50:24 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id v7QHoG2c053745;
 Sat, 26 Aug 2017 10:50:20 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201708261750.v7QHoG2c053745@gw.catspoiler.org>
Date: Sat, 26 Aug 2017 10:50:16 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: ULE steal_idle questions
To: avg@FreeBSD.org
cc: freebsd-arch@FreeBSD.org
In-Reply-To: <201708251824.v7PIOA6q048321@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 26 Aug 2017 17:50:25 -0000

On 25 Aug, To: avg@FreeBSD.org wrote:
> On 24 Aug, To: avg@FreeBSD.org wrote:
>> Aside from the Ryzen problem, I think the steal_idle code should be
>> re-written so that it doesn't block interrupts for so long.  In its
>> current state, interrupt latence increases with the number of cores and
>> the complexity of the topology.
>> 
>> What I'm thinking is that we should set a flag at the start of the
>> search for a thread to steal.  If we are preempted by another, higher
>> priority thread, that thread will clear the flag.  Next we start the
>> loop to search up the hierarchy.  Once we find a candidate CPU:
>> 
>>                 steal = TDQ_CPU(cpu);
>>                 CPU_CLR(cpu, &mask);
>>                 tdq_lock_pair(tdq, steal);
>> 		if (tdq->tdq_load != 0) {
>> 			goto out; /* to exit loop and switch to the new thread */
>> 		}
>> 		if (flag was cleared) {
>> 			tdq_unlock_pair(tdq, steal);
>> 			goto restart; /* restart the search */
>> 		}
>> 		if (steal->tdq_load < thresh || steal->tdq_transferable == 0 ||
>> 		    tdq_move(steal, tdq) == 0) {
>> 			tdq_unlock_pair(tdq, steal);
>> 			continue;
>> 		}
>> 	    out:
>> 	    	TDQ_UNLOCK(steal);
>> 	    	clear flag;
>> 	    	mi_switch(SW_VOL | SWT_IDLE, NULL);
>> 	    	thread_unlock(curthread);
>> 	    	return (0);
>> 
>> And we also have to clear the flag if we did not find a thread to steal.
> 
> I've implemented something like this and added a bunch of counters to it
> to get a better understanding of its behavior.  Instead of adding a flag
> to detect preemption, I used the same switchcnt test as is used by
> sched_idletd().  These are the results of a ~9 hour poudriere run:
> 
> kern.sched.steal.none: 9971668   # no threads were stolen
> kern.sched.steal.fail: 23709     # unable to steal from cpu=sched_highest()
> kern.sched.steal.level2: 191839  # somewhere on this chip
> kern.sched.steal.level1: 557659  # a core on this CCX
> kern.sched.steal.level0: 4555426 # the other SMT thread on this core
> kern.sched.steal.restart: 404    # preemption detected so restart the search
> kern.sched.steal.call: 15276638  # of times tdq_idled() called
> 
> There are a few surprises here.
> 
> One is the number of failed moves.  I don't know if the load on the
> source CPU fell below thresh, tdq_transferable went to zero, or if
> tdq_move() failed.  I also wonder if the failures are evenly distributed
> across CPUs.  It is possible that these failures are concentrated on CPU
> 0, which handles most interrupts.  If interrupts don't affect switchcnt,
> then the data collected by sched_highest() could be a bit stale and we
> would not know it.

Most of the above failed moves were do to the either tdq_load dropping
below the threshold or tdq_transferable going to zero.  These are evenly
distributed across CPUs that we want to steal from.  I didn't not bin
the results by which CPU this code was running on.  Actual failures of
tdq_move() are bursty and not evenly distributed across CPUs.

I've created this review for my changes:
https://reviews.freebsd.org/D12130