From owner-freebsd-current@freebsd.org Sun Jun 10 01:07:18 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0728F1007746 for ; Sun, 10 Jun 2018 01:07:18 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A37C981EB7; Sun, 10 Jun 2018 01:07:17 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from mousie.catspoiler.org (unknown [76.212.85.177]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: truckman) by smtp.freebsd.org (Postfix) with ESMTPSA id C0A7325B56; Sun, 10 Jun 2018 01:07:16 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Date: Sat, 9 Jun 2018 18:07:15 -0700 (PDT) From: Don Lewis Subject: Re: Is kern.sched.preempt_thresh=0 a sensible default? To: Stefan Esser cc: Andriy Gapon , FreeBSD Current , "M. Warner Losh" In-Reply-To: Message-ID: References: <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org> <49fa8de4-e164-0642-4e01-a6188992c32e@freebsd.org> <32d6305b-3d57-4d37-ba1b-51631e994520@FreeBSD.org> <93efc3e1-7ac3-fedc-a71e-66c99f8e8c1e@freebsd.org> <9aaec961-e604-303a-52f3-ee24e3a435d0@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jun 2018 01:07:18 -0000 On 9 Jun, Stefan Esser wrote: > 3) Programs that evenly split the load on all available cores have been > suffering from sub-optimal assignment of threads to cores. E.g. on a > CPU with 8 (virtual) cores, this resulted in 6 cores running the load > in nominal time, 1 core taking twice as long because 2 threads were > scheduled to run on it, while 1 core was mostly idle. Even if the > load was initially evenly distributed, a woken up process that ran on > one core destroyed the symmetry and it was not recovered. (This was a > problem e.g. for parallel programs using MPI or the like.) When a core is about to go idle or first enters the idle state it will search for the most heavily loaded core and steal a thread from it. The core will only go to sleep if it can't find a non-running thread to steal. If there are N cores and N+1 runnable threads, there is a long term load balancer than runs periodically. It searches for the most and least loaded cores and moves a thread from the former to the latter. That prevents the same pair of threads from having to share the same core indefinitely. There is an observed bug where a low priority thread can get pinned to a particular core that is already occupied by a high-priority CPU-bound thread that never releases the CPU. The low priority thread can't migrate to another core that subsequently becomes available because it it is pinned. It is not known how the thread originally got into this state. I don't see any reason for 4BSD to be immune to this problem. > 4) The real time behavior of SCHED_ULE is weak due to interactive > processes (e.g. the X server) being put into the "time-share" class > and then suffering from the problems described as 1) or 2) above. > (You distinguish time-share and batch processes, which both are > allowed to consume their full quanta even of a higher priority > process in their class becomes runnable. I think this will not > give the required responsiveness e.g. for an X server.) > They should be considered I/O intensive, if they often don't use > their full quantum, without taking the significant amount of CPU > time they may use at times into account. (I.e. the criterion for > time-sharing should not be the CPU time consumed, but rather some > fraction of the quanta not being fully used due to voluntarily giving > up the CPU.) With many real-time threads it may be hard to identify > interactive threads, since they are non-voluntarily disrupted too > often - this must be considered in the sampling of voluntary vs. > non-voluntary context switches. It can actually be worse than this. There is a bug that can cause the wnck-applet component of the MATE desktop to consume a large amount of CPU time, and apparently it is communicating with the Xorg server, which it drives to 100% CPU. That makes it's PRI value increase greatly so it has a lower scheduling priority. Even without competing CPU load, interactive performance is hurt. With competing CPU load it gets much worse.