From owner-freebsd-stable@freebsd.org Wed Apr 4 01:13:30 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF237F82A94 for ; Wed, 4 Apr 2018 01:13:29 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "uucp.dinoex.sub.de", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 61E367F710 for ; Wed, 4 Apr 2018 01:13:25 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2]) by uucp.dinoex.sub.de (8.15.2/8.15.2) with ESMTPS id w341D7LL038753 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 4 Apr 2018 03:13:08 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: from citylink.dinoex.sub.org (uucp@localhost) by uucp.dinoex.sub.de (8.15.2/8.15.2/Submit) with UUCP id w341D732038752 for freebsd-stable@FreeBSD.ORG; Wed, 4 Apr 2018 03:13:07 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.15.2/8.15.2) with ESMTP id w34166fj010958 for ; Wed, 4 Apr 2018 03:06:06 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.15.2/8.15.2) with ESMTP id w34142Xl010690 for ; Wed, 4 Apr 2018 03:04:02 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: (from news@localhost) by gate.oper.dinoex.org (8.15.2/8.15.2/Submit) id w341424f010683 for freebsd-stable@FreeBSD.ORG; Wed, 4 Apr 2018 03:04:02 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-Authentication-Warning: gate.oper.dinoex.org: news set sender to li-fbsd@citylink.dinoex.sub.org using -f From: Peter Subject: kern.sched.quantum: Creepy, sadistic scheduler Date: Wed, 4 Apr 2018 02:52:55 +0200 Organization: even some more stinky socks Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 4 Apr 2018 00:53:27 -0000 (UTC) Injection-Info: oper.dinoex.de; logging-data="8285"; mail-complaints-to="usenet@citylink.dinoex.sub.org" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 X-Mozilla-News-Host: news://localhost:119 Sender: li-fbsd@citylink.dinoex.sub.org To: freebsd-stable@FreeBSD.ORG X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (uucp.dinoex.sub.de [194.45.71.2]); Wed, 04 Apr 2018 03:13:09 +0200 (CEST) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2018 01:13:30 -0000 Occasionally I noticed that the system would not quickly process the tasks i need done, but instead prefer other, longrunning tasks. I figured it must be related to the scheduler, and decided it hates me. A closer look shows the behaviour as follows (single CPU): Lets run an I/O-active task, e.g, postgres VACUUM that would continuousely read from big files (while doing compute as well [1]): >pool alloc free read write read write >cache - - - - - - > ada1s4 7.08G 10.9G 1.58K 0 12.9M 0 Now start an endless loop: # while true; do :; done And the effect is: >pool alloc free read write read write >cache - - - - - - > ada1s4 7.08G 10.9G 9 0 76.8K 0 The VACUUM gets almost stuck! This figures with WCPU in "top": > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND >85583 root 99 0 7044K 1944K RUN 1:06 92.21% bash >53005 pgsql 52 0 620M 91856K RUN 5:47 0.50% postgres Hacking on kern.sched.quantum makes it quite a bit better: # sysctl kern.sched.quantum=1 kern.sched.quantum: 94488 -> 7874 >pool alloc free read write read write >cache - - - - - - > ada1s4 7.08G 10.9G 395 0 3.12M 0 > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND >85583 root 94 0 7044K 1944K RUN 4:13 70.80% bash >53005 pgsql 52 0 276M 91856K RUN 5:52 11.83% postgres Now, as usual, the "root-cause" questions arise: What exactly does this "quantum"? Is this solution a workaround, i.e. actually something else is wrong, and has it tradeoff in other situations? Or otherwise, why is such a default value chosen, which appears to be ill-deceived? The docs for the quantum parameter are a bit unsatisfying - they say its the max num of ticks a process gets - and what happens when they're exhausted? If by default the endless loop is actually allowed to continue running for 94k ticks (or 94ms, more likely) uninterrupted, then that explains the perceived behaviour - buts thats certainly not what a scheduler should do when other procs are ready to run. 11.1-RELEASE-p7, kern.hz=200. Switching tickless mode on or off does not influence the matter. Starting the endless loop with "nice" does not influence the matter. [1] A pure-I/O job without compute load, like "dd", does not show this behaviour. Also, when other tasks are running, the unjust behaviour is not so stongly pronounced.