From owner-freebsd-current@freebsd.org Sun Apr 22 13:43:56 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 51D7BFA021A for ; Sun, 22 Apr 2018 13:43:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670075.outbound.protection.outlook.com [40.107.67.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D18C0748BA for ; Sun, 22 Apr 2018 13:43:55 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM (52.132.66.153) by YQBPR0101MB1699.CANPRD01.PROD.OUTLOOK.COM (52.132.70.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.696.13; Sun, 22 Apr 2018 13:43:53 +0000 Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a]) by YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a%13]) with mapi id 15.20.0696.017; Sun, 22 Apr 2018 13:43:53 +0000 From: Rick Macklem To: Konstantin Belousov CC: "freebsd-current@freebsd.org" , "George Mitchell" , Peter Subject: Re: SCHED_ULE makes 256Mbyte i386 unusable Thread-Topic: SCHED_ULE makes 256Mbyte i386 unusable Thread-Index: AQHT2aRaIslt1yc1ckCslUza9Eao6KQLppMAgAA0JP2AANWhgIAAF4Lf Date: Sun, 22 Apr 2018 13:43:53 +0000 Message-ID: References: <20180421201128.GO6887@kib.kiev.ua> , <20180422120241.GR6887@kib.kiev.ua> In-Reply-To: <20180422120241.GR6887@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YQBPR0101MB1699; 7:K437CgpTt7z7Pq2hwuVylwu8PRWykaEpDOiUPdNkBsv10qZ2QrtcqskvwJVl/khJ96c2NXhmpy83zNm3RHxUPpeerfMk7cgzBDs03ar025n3u+zn08awqx70S/sXXG19BaVlcrGtmH761pDam9DHOJKc9c6xpJZ8XKDX+n8K97QrmnASSs63/tP1lr/n1HxaOrtF5YBSC7Yh8UDgcZ2BaI/z0OZEAJOE2ZInUW/CkqKqxL05BGo37I7b20BxPOr6 x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989080)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(2017052603328)(7153060)(7193020); SRVR:YQBPR0101MB1699; x-ms-traffictypediagnostic: YQBPR0101MB1699: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231232)(944501410)(52105095)(3002001)(10201501046)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(20161123558120)(20161123562045)(6072148)(201708071742011); SRVR:YQBPR0101MB1699; BCL:0; PCL:0; RULEID:; SRVR:YQBPR0101MB1699; x-forefront-prvs: 0650714AAA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(39380400002)(39860400002)(396003)(366004)(346002)(51914003)(86362001)(5660300001)(6246003)(9686003)(26005)(305945005)(8936002)(7696005)(5250100002)(39060400002)(446003)(786003)(53936002)(316002)(476003)(11346002)(74316002)(81166006)(8676002)(186003)(54906003)(99286004)(25786009)(6916009)(6436002)(6506007)(93886005)(33656002)(2906002)(3660700001)(102836004)(3280700002)(74482002)(14454004)(59450400001)(4326008)(55016002)(229853002)(76176011)(478600001)(1411001)(2900100001); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR0101MB1699; H:YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; MLV:ovrnspm; PTR:InfoNoRecords; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: OnxlmrpFg7QE1602QIHbLozt4Ow1yqv9NXToiIVc3bGHmsgTlY04ChpY7hM+zINtfiEPaLzTOyqsp0wPVN5Ql4HvKwPkz4EvkpNDNDjkH36oc9B+ZPVa7+FV193pSEaGYdhZWOpXXqLK1fmzf5PpG6HRuRrHD9JWdSpw+UIZtWIBaB0a8WqTfbFPjl4mN1FQ spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 836cacd9-f1f7-47a0-610c-08d5a85716a8 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 836cacd9-f1f7-47a0-610c-08d5a85716a8 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Apr 2018 13:43:53.8079 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1699 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Apr 2018 13:43:56 -0000 Konstantin Belousov wrote: >On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: >> >> I decided to start a new thread on current related to SCHED_ULE, sinc= e I see >> >> more than just performance degradation and on a recent current kernel= . >> >> (I cc'd a couple of the people discussing performance problems in fre= ebsd-stable >> >> recently under a subject line of "Re: kern.sched.quantum: Creepy, sa= distic scheduler". >> >> >> >> When testing a pNFS server on a single core i386 with 256Mbytes using= a Dec. 2017 >> >> current/head kernel, I would see about a 30% performance degradation = (elapsed >> >> run time for a kernel build over NFSv4.1) when the server kernel was = built with >> >> options SCHED_ULE >> >> instead of >> >> options SCHED_4BSD So, now that I have decreased the number of nfsd kernel threads to 32, it w= orks with both schedulers and with essentially the same performance. (ie. The 30= % performance degradation has disappeared.) >> >> >> >> Now, with a kernel from a couple of days ago, the >> >> options SCHED_ULE >> >> kernel becomes unusable shortly after starting testing. >> >> I have seen two variants of this: >> >> - Became essentially hung. All I could do was ping the machine from t= he network. >> >> - Reported "vm_thread_new: kstack allocation failed >> >> and then any attempt to do anything gets "No more processes". >> >This is strange. It usually means that you get KVA either exhausted or >> >severly fragmented. >> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE >> kernel is working ok now. I haven't done enough to compare performance y= et. >> Maybe I'll post again when I have some numbers. >> >> >Enter ddb, it should be operational since pings are replied. Try to se= e >> >where the threads are stuck. >> I didn't do this, since reducing the number of kernel threads seems to h= ave fixed >> the problem. For the pNFS server, the nfsd threads will spawn additional= kernel >> threads to do proxies to the mirrored DS servers. >> >> >> with the only difference being a kernel built with >> >> options SCHED_4BSD >> >> everything works and performs the same as the Dec 2017 kernel. >> >> >> >> I can try rolling back through the revisions, but it would be nice if= someone >> >> could suggest where to start, because it takes a couple of hours to b= uild a >> >> kernel on this system. >> >> >> >> So, something has made things worse for a head/current kernel this wi= nter, rick >> > >> >There are at least two potentially relevant changes. >> > >> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. >> I've been running this machine with KSTACK_PAGES=3D4 for some time, so n= o change. W.r.t. Rodney Grimes comments about this (which didn't end up in this messa= ges in the thread): I didn't see any instability when using KSTACK_PAGES=3D4 for this until thi= s cropped up and seemed to be scheduler related (but not really, it seems). I bumped it to KSTACK_PAGES=3D4 because I needed that for the pNFS Metadata Server code. Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one = big item getting allocated on the stack, but many moderate sized ones. (A part of it is multiple instances of "struct vattr", some buried in "stru= ct nfsvattr", that NFS needs to use. I don't think these are large enough to justify mal= loc/free, but it has to use several of them.) One case I did try fixing was about 6 cases where "struct nfsstate" ended u= p on the stack. I changes the code to malloc/free them and then when testing, to my surprise I had a 20% performance hit and shelved the patch. Now that I know that the server was running near its limit, I might try thi= s one again, to see if the performance hit doesn't occur when the machine has ade= quate memory. If the performance hit goes away, I could commit this, but it would= n't have that much effect on the kstack usage. (It's interesting how this p= atch ended up related to the issue this thread discussed.) >> >> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. >> Could this change have resulted in the system being able to allocate few= er >> kernel threads/stacks for some reason? >Well, it could, as anything can be buggy. But the intent of the change >was to give 4G KVA, and it did. Righto. No concern here. I suspect the Dec. 2017 kernel was close to the li= mit (see performance issue that went away, noted above) and any change could have pushed it across the line, I think. >> >> >Consequences of the first one are obvious, it is much harder to find >> >the place to map the stack. Second change, on the other hand, provides >> >almost full 4G for KVA and should have mostly compensate for the negati= ve >> >effects of the first. >> > >> >And, I cannot see how changing the scheduler would fix or even affect t= hat >> >behaviour. >> My hunch is that the system was running near its limit for kernel thread= s/stacks. >> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying t= o get >> to a higher peak number of threads and hit the limit. >> SCHED_4BSD happened to result in timing such that it stayed just below t= he >> limit and worked. >> I can think of a couple of things that might affect this: >> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly= , then >> they wouldn't terminate and release their resources before more ne= w ones >> are spawned. >Scheduler has nothing to do with the threads termination. It might >select running threads in a way that causes the undesired pattern to >appear which might create some amount of backlog for termination, but >I doubt it. > >> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then t= he burst >> could try and spawn more mirror DS worker threads at about the sam= e time. >> >> Anyhow, thanks for the help, rick Have a good day, rick=