From owner-freebsd-stable@freebsd.org  Wed Apr  4 13:31:00 2018
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EC119F8EF65
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed,  4 Apr 2018 13:30:59 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from mailout10.t-online.de (mailout10.t-online.de [194.25.134.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mailout00.t-online.de",
 Issuer "TeleSec ServerPass DE-2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7377E7B5D5
 for <freebsd-stable@FreeBSD.ORG>; Wed,  4 Apr 2018 13:30:59 +0000 (UTC)
 (envelope-from se@freebsd.org)
Received: from fwd03.aul.t-online.de (fwd03.aul.t-online.de [172.20.27.148])
 by mailout10.t-online.de (Postfix) with SMTP id CDF9941E40D0;
 Wed,  4 Apr 2018 15:30:51 +0200 (CEST)
Received: from Stefans-MBP-7.fritz.box
 (V8cXzaZ6Yh6xiupXF5oX2Z0H-3khQ-lb5lHGM7a-5pUDtLRT+j0sJt57QXaFHmSZZy@[84.154.99.226])
 by fwd03.t-online.de
 with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted)
 esmtp id 1f3iUx-0qESyu0; Wed, 4 Apr 2018 15:30:47 +0200
Subject: Try setting kern.sched.preempt_thresh != 0 (was: Re:
 kern.sched.quantum: Creepy, sadistic scheduler)
To: Alban Hertroys <haramrae@gmail.com>, Peter <pmc@citylink.dinoex.sub.org>
Cc: freebsd-stable@FreeBSD.ORG
References: <pa17m7$82t$1@oper.dinoex.de>
 <9FDC510B-49D0-4722-B695-6CD38CA20D4A@gmail.com>
From: Stefan Esser <se@freebsd.org>
Openpgp: preference=signencrypt
Autocrypt: addr=se@freebsd.org; prefer-encrypt=mutual; keydata=
 xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU
 b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x
 YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ
 L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7
 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB
 AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E
 EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv
 Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc
 mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV
 FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF
 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN
 FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx
 iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi
 loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs
 kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH
 uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu
 fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE
 GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil
 piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6
 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo
 KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB
 V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva
 RS1+ODuLoy1pzRd907hl8a7eaVLQWA==
Message-ID: <f3dc3b75-9b2f-ee3e-862b-55414097ad4a@freebsd.org>
Date: Wed, 4 Apr 2018 15:30:46 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0)
 Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <9FDC510B-49D0-4722-B695-6CD38CA20D4A@gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-ID: V8cXzaZ6Yh6xiupXF5oX2Z0H-3khQ-lb5lHGM7a-5pUDtLRT+j0sJt57QXaFHmSZZy
X-TOI-MSGID: 246fb8cb-6076-46e7-bba8-8f9fed7ec667
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Apr 2018 13:31:00 -0000

Am 04.04.18 um 12:39 schrieb Alban Hertroys:
> 
>> On 4 Apr 2018, at 2:52, Peter <pmc@citylink.dinoex.sub.org> wrote:
>>
>> Occasionally I noticed that the system would not quickly process the
>> tasks i need done, but instead prefer other, longrunning tasks. I
>> figured it must be related to the scheduler, and decided it hates me.
> 
> If it hated you, it would behave much worse.
> 
>> A closer look shows the behaviour as follows (single CPU):
> 
> A single CPU? That's becoming rare! Is that a VM? Old hardware? Something really specific?
> 
>> Lets run an I/O-active task, e.g, postgres VACUUM that would
> 
> And you're running a multi-process database server on it no less. That is going to hurt, no matter how well the scheduler works.
> 
>> continuousely read from big files (while doing compute as well [1]):
>>> pool        alloc   free   read  write   read  write
>>> cache           -      -      -      -      -      -
>>>  ada1s4    7.08G  10.9G  1.58K      0  12.9M      0
>>
>> Now start an endless loop:
>> # while true; do :; done
>>
>> And the effect is:
>>> pool        alloc   free   read  write   read  write
>>> cache           -      -      -      -      -      -
>>>  ada1s4    7.08G  10.9G      9      0  76.8K      0
>>
>> The VACUUM gets almost stuck! This figures with WCPU in "top":
>>
>>>  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
>>> 85583 root        99    0  7044K  1944K RUN      1:06  92.21% bash
>>> 53005 pgsql       52    0   620M 91856K RUN      5:47   0.50% postgres
>>
>> Hacking on kern.sched.quantum makes it quite a bit better:
>> # sysctl kern.sched.quantum=1
>> kern.sched.quantum: 94488 -> 7874
>>
>>> pool        alloc   free   read  write   read  write
>>> cache           -      -      -      -      -      -
>>>  ada1s4    7.08G  10.9G    395      0  3.12M      0
>>
>>>  PID USERNAME   PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
>>> 85583 root        94    0  7044K  1944K RUN      4:13  70.80% bash
>>> 53005 pgsql       52    0   276M 91856K RUN      5:52  11.83% postgres
>>
>>
>> Now, as usual, the "root-cause" questions arise: What exactly does
>> this "quantum"? Is this solution a workaround, i.e. actually something
>> else is wrong, and has it tradeoff in other situations? Or otherwise,
>> why is such a default value chosen, which appears to be ill-deceived?
>>
>> The docs for the quantum parameter are a bit unsatisfying - they say
>> its the max num of ticks a process gets - and what happens when
>> they're exhausted? If by default the endless loop is actually allowed
>> to continue running for 94k ticks (or 94ms, more likely) uninterrupted,
>> then that explains the perceived behaviour - buts thats certainly not
>> what a scheduler should do when other procs are ready to run.
> 
> I can answer this from the operating systems course I followed recently. This does not apply to FreeBSD specifically, it is general job scheduling theory. I still need to read up on SCHED_ULE to see how the details were implemented there. Or are you using the older SCHED_4BSD?
> Anyway...
> 
> Jobs that are ready to run are collected on a ready queue. Since you have a single CPU, there can only be a single job active on the CPU. When that job is finished, the scheduler takes the next job in the ready queue and assigns it to the CPU, etc.

I'm guessing that the problem is caused by kern.sched.preempt_thresh=0, which
prevents preemption of low priority processes by interactive or I/O bound
processes.

For a quick test try:

# sysctl kern.sched.preempt_thresh=1

to see whether it makes a difference. The value 1 is unreasonably low, but it
has the most visible effect in that any higher priority process can steal the
CPU from any lower priority one (high priority corresponds to low PRI values
as displayed by ps -l or top).

Reasonable values might be in the range of 80 to 224 depending on the system
usage scenario (that's what I found to have been suggested in the mail-lists).

Higher values result in less preemption.

Regards, STefan