From owner-freebsd-current@freebsd.org  Sun Apr 22 13:43:56 2018
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 51D7BFA021A
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Sun, 22 Apr 2018 13:43:56 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from CAN01-TO1-obe.outbound.protection.outlook.com
 (mail-eopbgr670075.outbound.protection.outlook.com [40.107.67.75])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "Microsoft IT TLS CA 4" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D18C0748BA
 for <freebsd-current@freebsd.org>; Sun, 22 Apr 2018 13:43:55 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM (52.132.66.153) by
 YQBPR0101MB1699.CANPRD01.PROD.OUTLOOK.COM (52.132.70.139) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.696.13; Sun, 22 Apr 2018 13:43:53 +0000
Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM
 ([fe80::893c:efc2:d71f:945a]) by YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM
 ([fe80::893c:efc2:d71f:945a%13]) with mapi id 15.20.0696.017; Sun, 22 Apr
 2018 13:43:53 +0000
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
CC: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, "George
 Mitchell" <george+freebsd@m5p.com>, Peter <pmc@citylink.dinoex.sub.org>
Subject: Re: SCHED_ULE makes 256Mbyte i386 unusable
Thread-Topic: SCHED_ULE makes 256Mbyte i386 unusable
Thread-Index: AQHT2aRaIslt1yc1ckCslUza9Eao6KQLppMAgAA0JP2AANWhgIAAF4Lf
Date: Sun, 22 Apr 2018 13:43:53 +0000
Message-ID: <YQBPR0101MB10421CFD2FA2C1A5356492CEDD8A0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
References: <YQBPR0101MB1042F252A539E8D55EB44585DD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
 <20180421201128.GO6887@kib.kiev.ua>
 <YQBPR0101MB10421529BB346952BCE7F20EDD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>,
 <20180422120241.GR6887@kib.kiev.ua>
In-Reply-To: <20180422120241.GR6887@kib.kiev.ua>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; YQBPR0101MB1699;
 7:K437CgpTt7z7Pq2hwuVylwu8PRWykaEpDOiUPdNkBsv10qZ2QrtcqskvwJVl/khJ96c2NXhmpy83zNm3RHxUPpeerfMk7cgzBDs03ar025n3u+zn08awqx70S/sXXG19BaVlcrGtmH761pDam9DHOJKc9c6xpJZ8XKDX+n8K97QrmnASSs63/tP1lr/n1HxaOrtF5YBSC7Yh8UDgcZ2BaI/z0OZEAJOE2ZInUW/CkqKqxL05BGo37I7b20BxPOr6
x-ms-exchange-antispam-srfa-diagnostics: SOS;
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(7020095)(4652020)(8989080)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(2017052603328)(7153060)(7193020);
 SRVR:YQBPR0101MB1699; 
x-ms-traffictypediagnostic: YQBPR0101MB1699:
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=rmacklem@uoguelph.ca; 
x-microsoft-antispam-prvs: <YQBPR0101MB1699684E866D46EAA50D9B23DD8A0@YQBPR0101MB1699.CANPRD01.PROD.OUTLOOK.COM>
x-exchange-antispam-report-test: UriScan:(158342451672863);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231232)(944501410)(52105095)(3002001)(10201501046)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(20161123558120)(20161123562045)(6072148)(201708071742011);
 SRVR:YQBPR0101MB1699; BCL:0; PCL:0; RULEID:; SRVR:YQBPR0101MB1699; 
x-forefront-prvs: 0650714AAA
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(376002)(39380400002)(39860400002)(396003)(366004)(346002)(51914003)(86362001)(5660300001)(6246003)(9686003)(26005)(305945005)(8936002)(7696005)(5250100002)(39060400002)(446003)(786003)(53936002)(316002)(476003)(11346002)(74316002)(81166006)(8676002)(186003)(54906003)(99286004)(25786009)(6916009)(6436002)(6506007)(93886005)(33656002)(2906002)(3660700001)(102836004)(3280700002)(74482002)(14454004)(59450400001)(4326008)(55016002)(229853002)(76176011)(478600001)(1411001)(2900100001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR0101MB1699;
 H:YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en;
 MLV:ovrnspm; PTR:InfoNoRecords; 
received-spf: None (protection.outlook.com: uoguelph.ca does not designate
 permitted sender hosts)
x-microsoft-antispam-message-info: OnxlmrpFg7QE1602QIHbLozt4Ow1yqv9NXToiIVc3bGHmsgTlY04ChpY7hM+zINtfiEPaLzTOyqsp0wPVN5Ql4HvKwPkz4EvkpNDNDjkH36oc9B+ZPVa7+FV193pSEaGYdhZWOpXXqLK1fmzf5PpG6HRuRrHD9JWdSpw+UIZtWIBaB0a8WqTfbFPjl4mN1FQ
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Office365-Filtering-Correlation-Id: 836cacd9-f1f7-47a0-610c-08d5a85716a8
X-OriginatorOrg: uoguelph.ca
X-MS-Exchange-CrossTenant-Network-Message-Id: 836cacd9-f1f7-47a0-610c-08d5a85716a8
X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Apr 2018 13:43:53.8079 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d
X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1699
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Apr 2018 13:43:56 -0000

Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote:
>> >> I decided to start a new thread on current related to SCHED_ULE, sinc=
e I see
>> >> more than just performance degradation and on a recent current kernel=
.
>> >> (I cc'd a couple of the people discussing performance problems in fre=
ebsd-stable
>> >>  recently under a subject line of "Re: kern.sched.quantum: Creepy, sa=
distic scheduler".
>> >>
>> >> When testing a pNFS server on a single core i386 with 256Mbytes using=
 a Dec. 2017
>> >> current/head kernel, I would see about a 30% performance degradation =
(elapsed
>> >> run time for a kernel build over NFSv4.1) when the server kernel was =
built with
>> >> options SCHED_ULE
>> >> instead of
>> >> options SCHED_4BSD
So, now that I have decreased the number of nfsd kernel threads to 32, it w=
orks
with both schedulers and with essentially the same performance. (ie. The 30=
%
performance degradation has disappeared.)

>> >>
>> >> Now, with a kernel from a couple of days ago, the
>> >> options SCHED_ULE
>> >> kernel becomes unusable shortly after starting testing.
>> >> I have seen two variants of this:
>> >> - Became essentially hung. All I could do was ping the machine from t=
he network.
>> >> - Reported "vm_thread_new: kstack allocation failed
>> >>   and then any attempt to do anything gets "No more processes".
>> >This is strange.  It usually means that you get KVA either exhausted or
>> >severly fragmented.
>> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
>> kernel is working ok now. I haven't done enough to compare performance y=
et.
>> Maybe I'll post again when I have some numbers.
>>
>> >Enter ddb, it should be operational since pings are replied.  Try to se=
e
>> >where the threads are stuck.
>> I didn't do this, since reducing the number of kernel threads seems to h=
ave fixed
>> the problem. For the pNFS server, the nfsd threads will spawn additional=
 kernel
>> threads to do proxies to the mirrored DS servers.
>>
>> >> with the only difference being a kernel built with
>> >> options SCHED_4BSD
>> >> everything works and performs the same as the Dec 2017 kernel.
>> >>
>> >> I can try rolling back through the revisions, but it would be nice if=
 someone
>> >> could suggest where to start, because it takes a couple of hours to b=
uild a
>> >> kernel on this system.
>> >>
>> >> So, something has made things worse for a head/current kernel this wi=
nter, rick
>> >
>> >There are at least two potentially relevant changes.
>> >
>> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
>> I've been running this machine with KSTACK_PAGES=3D4 for some time, so n=
o change.
W.r.t. Rodney Grimes comments about this (which didn't end up in this messa=
ges
in the thread):
I didn't see any instability when using KSTACK_PAGES=3D4 for this until thi=
s cropped
up and seemed to be scheduler related (but not really, it seems).
I bumped it to KSTACK_PAGES=3D4 because I needed that for the pNFS Metadata
Server code.

Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one =
big
item getting allocated on the stack, but many moderate sized ones.
(A part of it is multiple instances of "struct vattr", some buried in "stru=
ct nfsvattr",
 that NFS needs to use. I don't think these are large enough to justify mal=
loc/free,
 but it has to use several of them.)

One case I did try fixing was about 6 cases where "struct nfsstate" ended u=
p on
the stack. I changes the code to malloc/free them and then when testing, to
my surprise I had a 20% performance hit and shelved the patch.
Now that I know that the server was running near its limit, I might try thi=
s one
again, to see if the performance hit doesn't occur when the machine has ade=
quate
memory. If the performance hit goes away, I could commit this, but it would=
n't have that much effect on the kstack usage. (It's interesting how this p=
atch ended
up related to the issue this thread discussed.)

>>
>> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
>> Could this change have resulted in the system being able to allocate few=
er
>> kernel threads/stacks for some reason?
>Well, it could, as anything can be buggy. But the intent of the change
>was to give 4G KVA, and it did.
Righto. No concern here. I suspect the Dec. 2017 kernel was close to the li=
mit
(see performance issue that went away, noted above) and any change could
have pushed it across the line, I think.

>>
>> >Consequences of the first one are obvious, it is much harder to find
>> >the place to map the stack.  Second change, on the other hand, provides
>> >almost full 4G for KVA and should have mostly compensate for the negati=
ve
>> >effects of the first.
>> >
>> >And, I cannot see how changing the scheduler would fix or even affect t=
hat
>> >behaviour.
>> My hunch is that the system was running near its limit for kernel thread=
s/stacks.
>> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying t=
o get
>> to a higher peak number of threads and hit the limit.
>> SCHED_4BSD happened to result in timing such that it stayed just below t=
he
>> limit and worked.
>> I can think of a couple of things that might affect this:
>> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly=
, then
>>       they wouldn't terminate and release their resources before more ne=
w ones
>>       are spawned.
>Scheduler has nothing to do with the threads termination.  It might
>select running threads in a way that causes the undesired pattern to
>appear which might create some amount of backlog for termination, but
>I doubt it.
>
>> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then t=
he burst
>>       could try and spawn more mirror DS worker threads at about the sam=
e time.
>>
>> Anyhow, thanks for the help, rick

Have a good day, rick=