From owner-freebsd-stable@freebsd.org  Wed Apr  4 01:13:30 2018
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF237F82A94
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed,  4 Apr 2018 01:13:29 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de
 [IPv6:2001:1440:5001:1::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "uucp.dinoex.sub.de",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 61E367F710
 for <freebsd-stable@FreeBSD.ORG>; Wed,  4 Apr 2018 01:13:25 +0000 (UTC)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2])
 by uucp.dinoex.sub.de (8.15.2/8.15.2) with ESMTPS id w341D7LL038753
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-stable@FreeBSD.ORG>; Wed, 4 Apr 2018 03:13:08 +0200 (CEST)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-MDaemon-Deliver-To: <freebsd-stable@FreeBSD.ORG>
Received: from citylink.dinoex.sub.org (uucp@localhost)
 by uucp.dinoex.sub.de (8.15.2/8.15.2/Submit) with UUCP id w341D732038752
 for freebsd-stable@FreeBSD.ORG; Wed, 4 Apr 2018 03:13:07 +0200 (CEST)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by citylink.dinoex.sub.de (8.15.2/8.15.2) with ESMTP id w34166fj010958
 for <freebsd-stable@FreeBSD.ORG>; Wed, 4 Apr 2018 03:06:06 +0200 (CEST)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: from gate.oper.dinoex.org (gate-e [192.168.98.2])
 by gate.oper.dinoex.org (8.15.2/8.15.2) with ESMTP id w34142Xl010690
 for <freebsd-stable@FreeBSD.ORG>; Wed, 4 Apr 2018 03:04:02 +0200 (CEST)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
Received: (from news@localhost)
 by gate.oper.dinoex.org (8.15.2/8.15.2/Submit) id w341424f010683
 for freebsd-stable@FreeBSD.ORG; Wed, 4 Apr 2018 03:04:02 +0200 (CEST)
 (envelope-from li-fbsd@citylink.dinoex.sub.org)
X-Authentication-Warning: gate.oper.dinoex.org: news set sender to
 li-fbsd@citylink.dinoex.sub.org using -f
From: Peter <pmc@citylink.dinoex.sub.org>
Subject: kern.sched.quantum: Creepy, sadistic scheduler
Date: Wed, 4 Apr 2018 02:52:55 +0200
Organization: even some more stinky socks
Message-ID: <pa17m7$82t$1@oper.dinoex.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 4 Apr 2018 00:53:27 -0000 (UTC)
Injection-Info: oper.dinoex.de;
 logging-data="8285"; mail-complaints-to="usenet@citylink.dinoex.sub.org"
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:51.0) Gecko/20100101
 Firefox/51.0 SeaMonkey/2.48
X-Mozilla-News-Host: news://localhost:119
Sender: li-fbsd@citylink.dinoex.sub.org
To: freebsd-stable@FreeBSD.ORG
X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2;
 Sender-helo: uucp.dinoex.sub.de; )
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (uucp.dinoex.sub.de [194.45.71.2]); Wed, 04 Apr 2018 03:13:09 +0200 (CEST)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Apr 2018 01:13:30 -0000

Occasionally I noticed that the system would not quickly process the
tasks i need done, but instead prefer other, longrunning tasks. I
figured it must be related to the scheduler, and decided it hates me.


A closer look shows the behaviour as follows (single CPU):

Lets run an I/O-active task, e.g, postgres VACUUM that would
continuousely read from big files (while doing compute as well [1]):
 >pool        alloc   free   read  write   read  write
 >cache           -      -      -      -      -      -
 >  ada1s4    7.08G  10.9G  1.58K      0  12.9M      0

Now start an endless loop:
# while true; do :; done

And the effect is:
 >pool        alloc   free   read  write   read  write
 >cache           -      -      -      -      -      -
 >  ada1s4    7.08G  10.9G      9      0  76.8K      0

The VACUUM gets almost stuck! This figures with WCPU in "top":

 >  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
 >85583 root        99    0  7044K  1944K RUN      1:06  92.21% bash
 >53005 pgsql       52    0   620M 91856K RUN      5:47   0.50% postgres

Hacking on kern.sched.quantum makes it quite a bit better:
# sysctl kern.sched.quantum=1
kern.sched.quantum: 94488 -> 7874

 >pool        alloc   free   read  write   read  write
 >cache           -      -      -      -      -      -
 >  ada1s4    7.08G  10.9G    395      0  3.12M      0

 >  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
 >85583 root        94    0  7044K  1944K RUN      4:13  70.80% bash
 >53005 pgsql       52    0   276M 91856K RUN      5:52  11.83% postgres


Now, as usual, the "root-cause" questions arise: What exactly does
this "quantum"? Is this solution a workaround, i.e. actually something
else is wrong, and has it tradeoff in other situations? Or otherwise,
why is such a default value chosen, which appears to be ill-deceived?

The docs for the quantum parameter are a bit unsatisfying - they say
its the max num of ticks a process gets - and what happens when
they're exhausted? If by default the endless loop is actually allowed
to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
then that explains the perceived behaviour - buts thats certainly not
what a scheduler should do when other procs are ready to run.

11.1-RELEASE-p7, kern.hz=200. Switching tickless mode on or off does
not influence the matter. Starting the endless loop with "nice" does
not influence the matter.


[1]
A pure-I/O job without compute load, like "dd", does not show
this behaviour. Also, when other tasks are running, the unjust
behaviour is not so stongly pronounced.