From owner-freebsd-arch@freebsd.org Sat Aug 26 00:28:55 2017 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DF00DE4E46 for ; Sat, 26 Aug 2017 00:28:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 262F176A40; Sat, 26 Aug 2017 00:28:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 84DA51020B7; Sat, 26 Aug 2017 10:28:51 +1000 (AEST) Date: Sat, 26 Aug 2017 10:28:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Don Lewis cc: avg@freebsd.org, freebsd-arch@freebsd.org Subject: Re: ULE steal_idle questions In-Reply-To: <201708251824.v7PIOA6q048321@gw.catspoiler.org> Message-ID: <20170826094725.G1648@besplex.bde.org> References: <201708251824.v7PIOA6q048321@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=LI0WeNe9 c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=uoTqs28qTOXnMGlVIiMA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Aug 2017 00:28:55 -0000 On Fri, 25 Aug 2017, Don Lewis wrote: > ... > Something else that I did not expect is the how frequently threads are > stolen from the other SMT thread on the same core, even though I > increased steal_thresh from 2 to 3 to account for the off-by-one > problem. This is true even right after the system has booted and no > significant load has been applied. My best guess is that because of > affinity, both the parent and child processes run on the same CPU after > fork(), and if a number of processes are forked() in quick succession, > the run queue of that CPU can get really long. Forcing a thread > migration in exec() might be a good solution. Since you are trying a lot of combinations, maybe you can tell us which ones work best. SCHED_4BSD works better for me on an old 2-core system. SCHED_ULE works better on a not-so old 4x2 core (Haswell) system, but I don't like it due to its complexity. It makes differences of at most +-2% except when mistuned it can give -5% for real time (but better for CPU and presumably power). For SCHED_4BSD, I wrote fancy tuning for fork/exec and sometimes get everything to like up for a 3% improvement (803 seconds instead of 823 on the old system, with -current much slower at 840+ and old versions of ULE before steal_idle taking 890+). This is very resource (mainly cache associativity?) dependent and my tuning makes little difference on the newer system. SCHED_ULE still has bugfeatures which tend to help large builds by reducing context switching, e.g., by bogusly clamping all CPU-bound threads to nearly maximal priority. Bruce