From owner-freebsd-current@freebsd.org  Wed Apr  4 13:19:52 2018
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B80E0F8E29C
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed,  4 Apr 2018 13:19:51 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from mailout09.t-online.de (mailout09.t-online.de [194.25.134.84])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "mailout00.t-online.de",
 Issuer "TeleSec ServerPass DE-2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4040F7A912;
 Wed,  4 Apr 2018 13:19:51 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from fwd07.aul.t-online.de (fwd07.aul.t-online.de [172.20.27.150])
 by mailout09.t-online.de (Postfix) with SMTP id A41F4425ACFB;
 Wed,  4 Apr 2018 15:19:43 +0200 (CEST)
Received: from Stefans-MBP-7.fritz.box
 (rIQ1sTZYohnT6btbQrgVDXah6LVOVA1-TCxo2yIxWLYgcdhxhGoCeKt9n4eeNkSgLy@[84.154.99.226])
 by fwd07.t-online.de
 with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted)
 esmtp id 1f3iK3-1o6jEO0; Wed, 4 Apr 2018 15:19:31 +0200
Subject: Is kern.sched.preempt_thresh=0 a sensible default? (was: Re:
 Extremely low disk throughput under high compute load)
From: Stefan Esser <se@freebsd.org>
To: "M. Warner Losh" <imp@freebsd.org>
Cc: FreeBSD Current <freebsd-current@freebsd.org>
References: <dc8d0285-1916-6581-2b2d-e8320ec3d894@freebsd.org>
 <CANCZdfoieekesqKa5RmOp=z2vycsVqnVss7ROnO87YTV-qBUzA@mail.gmail.com>
 <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org>
Openpgp: preference=signencrypt
Autocrypt: addr=se@freebsd.org; prefer-encrypt=mutual; keydata=
 xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU
 b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x
 YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ
 L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7
 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB
 AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E
 EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv
 Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc
 mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV
 FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF
 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN
 FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx
 iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi
 loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs
 kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH
 uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu
 fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE
 GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil
 piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6
 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo
 KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB
 V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva
 RS1+ODuLoy1pzRd907hl8a7eaVLQWA==
Message-ID: <49fa8de4-e164-0642-4e01-a6188992c32e@freebsd.org>
Date: Wed, 4 Apr 2018 15:19:31 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0)
 Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org>
Content-Type: text/plain; charset=windows-1252
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-ID: rIQ1sTZYohnT6btbQrgVDXah6LVOVA1-TCxo2yIxWLYgcdhxhGoCeKt9n4eeNkSgLy
X-TOI-MSGID: ec6d4659-8586-4ac6-8349-bd582f9c7fc1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 Apr 2018 13:19:52 -0000

Am 02.04.18 um 00:18 schrieb Stefan Esser:
> Am 01.04.18 um 18:33 schrieb Warner Losh:
>> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser <se@freebsd.org
>> <mailto:se@freebsd.org>> wrote:
>>
>>     My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8
>>     (starting from a clean state) which caused a load average of 12 for more than
>>     1 hour, when I decided to move a directory structure holding some 10 GB to its
>>     own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB.
>>
>>     I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there
>>     was nearly no disk activity caused by the buildworld.
>>
>>     The copying proceeded at a rate of at most 10 MB/s, but most of the time less
>>     than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a
>>     much better priority than the compute bound compiler processes, but it got
>>     just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled
>>     at such a low rate, that it only managed to issue a few controller writes per
>>     second.
>>
>>     The system is healthy and does not show any problems or anomalies under
>>     normal use (e.g., file copies are fast, without the high compute load).
>>
>>     This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging.
>>
>>     Is this a regression in -CURRENT?
>>
>> Does 'sync' push a lot of I/O to the disk?
> 
> Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so
> slow, not much is written.
> 
> Normal gstat output for the 3 drives the RAIDZ1 consists of:
> 
> dT: 1.002s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0      2      2     84   39.1      0      0    0.0    7.8  ada0
>     0      4      4     92   66.6      0      0    0.0   26.6  ada1
>     0      6      6    259   66.9      0      0    0.0   36.2  ada3
> dT: 1.058s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0      1      1     60   70.6      0      0    0.0    6.7  ada0
>     0      3      3     68   71.3      0      0    0.0   20.2  ada1
>     0      6      6    242   65.5      0      0    0.0   28.8  ada3
> dT: 1.002s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0      5      5    192   44.8      0      0    0.0   22.4  ada0
>     0      6      6    160   61.9      0      0    0.0   26.5  ada1
>     0      6      6    172   43.7      0      0    0.0   26.2  ada3
> 
> This includes the copy process and the reads caused by "make -j 8 world"
> (but I assume that all the source files are already cached in ARC).

I have identified the cause of the extremely low I/O performance (2 to 6 read
operations scheduled per second).

The default value of kern.sched.preempt_thresh=0 does not give any CPU to the
I/O bound process unless a (long) time slice expires (kern.sched.quantum=94488
on my system with HZ=1000) or one of the CPU bound processes voluntarily gives
up the CPU (or exits).

Any non-zero value of preemt_thresh lets the system perform I/O in parallel
with the CPU bound processes, again.

I'm not sure about the bias relative to the PRI values displayed by top, but
for me a process with PRI above 72 (in top) should be eligible for preemption.

What value of preempt_thresh should I use to get that behavior?


And, more important: Is preempt_thresh=0 a reasonable default???

This prevents I/O bound processes from making reasonable progress if all CPU
cores/threads are busy. In my case, performance dropped from > 10 MB/s to just
a few hundred KB per second, i.e. by a factor of 30. (The %busy values in my
previous mail are misleading: At 10 MB/s the disk was about 70% busy ...)


Should preempt_thresh be set to some (possibly high, to only preempt long
running processes) value?

Regards, STefan