From owner-freebsd-arch@freebsd.org Thu Aug 24 19:25:22 2017 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05A25DE6869 for ; Thu, 24 Aug 2017 19:25:22 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BD4B33FAD; Thu, 24 Aug 2017 19:25:21 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id v7OJPDTT043392; Thu, 24 Aug 2017 12:25:17 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201708241925.v7OJPDTT043392@gw.catspoiler.org> Date: Thu, 24 Aug 2017 12:25:13 -0700 (PDT) From: Don Lewis Subject: Re: ULE steal_idle questions To: avg@FreeBSD.org cc: freebsd-arch@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Aug 2017 19:25:22 -0000 On 23 Aug, Andriy Gapon wrote: > On 23/08/2017 18:04, Don Lewis wrote: >> I've been looking at the steal_idle code in tdq_idled() and found some >> things that puzzle me. >> >> Consider a machine with three CPUs: >> A, which is idle >> B, which is busy running a thread >> C, which is busy running a thread and has another thread in queue >> It would seem to make sense that the tdq_load values for these three >> CPUs would be 0, 1, and 2 respectively in order to select the best CPU >> to run a new thread. >> >> If so, then why do we pass thresh=1 to sched_highest() in the code that >> implements steal_idle? That value is used to set cs_limit which is used >> in this comparison in cpu_search: >> if (match & CPU_SEARCH_HIGHEST) >> if (tdq->tdq_load >= hgroup.cs_limit && >> That would seem to make CPU B a candidate for stealing a thread from. >> Ignoring CPU C for the moment, that shouldn't happen if the thread is >> running, but even if it was possible, it would just make CPU B go idle, >> which isn't terribly helpful in terms of load balancing and would just >> thrash the caches. The same comparison is repeated in tdq_idled() after >> a candidate CPU has been chosen: >> if (steal->tdq_load < thresh || steal->tdq_transferable == 0) { >> tdq_unlock_pair(tdq, steal); >> continue; >> } >> >> It looks to me like there is an off-by-one error here, and there is a >> similar problem in the code that implements kern.sched.balance. > > > I agree with your analysis. I had the same questions as well. > I think that the tdq_transferable check is what saves the code from > running into any problems. But it indeed would make sense for the code > to understand that tdq_load includes a currently running, never > transferable thread as well. Things aren't quite as bad as I initially thought. cpu_search() does look at tdq_transferable so sched_highest() should not return a cpu that does not have a transferable thread at the time it was examined, so in most cases the unnecessary lock/unlock shouldn't happen. The extra check after the lock will catch the case where tdq_transferable went to zero between when it was examined by cpu_search() and when we actually grabbed the lock. Using a larger thresh value for SMT threads is still a no-op, though.