From owner-freebsd-current@freebsd.org Sat Apr 21 23:30:57 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0957FACF4E for ; Sat, 21 Apr 2018 23:30:57 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670061.outbound.protection.outlook.com [40.107.67.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 65FED7554E for ; Sat, 21 Apr 2018 23:30:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM (52.132.66.153) by YQBPR0101MB1922.CANPRD01.PROD.OUTLOOK.COM (52.132.71.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.696.13; Sat, 21 Apr 2018 23:30:55 +0000 Received: from YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a]) by YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM ([fe80::893c:efc2:d71f:945a%13]) with mapi id 15.20.0696.015; Sat, 21 Apr 2018 23:30:55 +0000 From: Rick Macklem To: Konstantin Belousov CC: "freebsd-current@freebsd.org" , "George Mitchell" , Peter Subject: Re: SCHED_ULE makes 256Mbyte i386 unusable Thread-Topic: SCHED_ULE makes 256Mbyte i386 unusable Thread-Index: AQHT2aRaIslt1yc1ckCslUza9Eao6KQLppMAgAA0JP0= Date: Sat, 21 Apr 2018 23:30:55 +0000 Message-ID: References: , <20180421201128.GO6887@kib.kiev.ua> In-Reply-To: <20180421201128.GO6887@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YQBPR0101MB1922; 7:2HIN80XkpcSA2Al9RXcH42mXohi6VpXRUrh5BdY9s6BGvrWnySvsQ//tfKSAoM7/c4puZuDqbP6JTATC72e4F+0+CPpDd9TuS2n/O3Lqs9E7khWqWYHaXQkObvbcwfUq7Um2Yr4174e7Mv7l/dV2ndXgGokeLQxSwhfjflGafwLm3yW7D60YwAZmrbs0e5krS2lskyRKY98p4+CqIVRPD48Bw4nDlW/TlbthrVm0A3SYz9IMFGQOfYU6WW+TwEgN x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989080)(5600026)(4534165)(4627221)(201703031133081)(201702281549075)(8990040)(2017052603328)(7153060)(7193020); SRVR:YQBPR0101MB1922; x-ms-traffictypediagnostic: YQBPR0101MB1922: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(10201501046)(3231232)(944501410)(52105095)(93006095)(93001095)(3002001)(6041310)(20161123564045)(20161123562045)(20161123560045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(6072148)(201708071742011); SRVR:YQBPR0101MB1922; BCL:0; PCL:0; RULEID:; SRVR:YQBPR0101MB1922; x-forefront-prvs: 064903DDDC x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39380400002)(346002)(366004)(376002)(39860400002)(396003)(51914003)(74316002)(11346002)(54906003)(229853002)(1411001)(102836004)(6506007)(76176011)(476003)(6916009)(446003)(7696005)(14454004)(74482002)(26005)(186003)(59450400001)(3280700002)(305945005)(5250100002)(33656002)(5660300001)(786003)(8676002)(9686003)(2900100001)(86362001)(99286004)(316002)(3660700001)(6246003)(55016002)(2906002)(81166006)(25786009)(8936002)(4326008)(53936002)(478600001)(39060400002)(6436002); DIR:OUT; SFP:1101; SCL:1; SRVR:YQBPR0101MB1922; H:YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; MLV:ovrnspm; PTR:InfoNoRecords; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: zJl/hVBIC5b8NTE/BvxNRPdaZxD7bnmVRTa2cWi+cd4Lgkz7DepHWp1qPmBDtpks3RT7oz8lOURvuGTsvk6q2Zy33nPKe1OsRiwQ5OjZl6erBHbky4K1W3+M6npki802xDBfOeDUdC87ArgvJxHCHuqglcu4fYUUa0VijDfwDjs1XavgJuCpIcqIem34LtPg spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 6d33983c-f13a-450b-76a8-08d5a7dfeddf X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 6d33983c-f13a-450b-76a8-08d5a7dfeddf X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Apr 2018 23:30:55.2871 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQBPR0101MB1922 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2018 23:30:58 -0000 Konstantin Belousov wrote: >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: >> I decided to start a new thread on current related to SCHED_ULE, since I= see >> more than just performance degradation and on a recent current kernel. >> (I cc'd a couple of the people discussing performance problems in freebs= d-stable >> recently under a subject line of "Re: kern.sched.quantum: Creepy, sadis= tic scheduler". >> >> When testing a pNFS server on a single core i386 with 256Mbytes using a = Dec. 2017 >> current/head kernel, I would see about a 30% performance degradation (el= apsed >> run time for a kernel build over NFSv4.1) when the server kernel was bui= lt with >> options SCHED_ULE >> instead of >> options SCHED_4BSD >> >> Now, with a kernel from a couple of days ago, the >> options SCHED_ULE >> kernel becomes unusable shortly after starting testing. >> I have seen two variants of this: >> - Became essentially hung. All I could do was ping the machine from the = network. >> - Reported "vm_thread_new: kstack allocation failed >> and then any attempt to do anything gets "No more processes". >This is strange. It usually means that you get KVA either exhausted or >severly fragmented. Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE kernel is working ok now. I haven't done enough to compare performance yet. Maybe I'll post again when I have some numbers. >Enter ddb, it should be operational since pings are replied. Try to see >where the threads are stuck. I didn't do this, since reducing the number of kernel threads seems to have= fixed the problem. For the pNFS server, the nfsd threads will spawn additional ke= rnel threads to do proxies to the mirrored DS servers. >> with the only difference being a kernel built with >> options SCHED_4BSD >> everything works and performs the same as the Dec 2017 kernel. >> >> I can try rolling back through the revisions, but it would be nice if so= meone >> could suggest where to start, because it takes a couple of hours to buil= d a >> kernel on this system. >> >> So, something has made things worse for a head/current kernel this winte= r, rick > >There are at least two potentially relevant changes. > >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. I've been running this machine with KSTACK_PAGES=3D4 for some time, so no c= hange. >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. Could this change have resulted in the system being able to allocate fewer kernel threads/stacks for some reason? >Consequences of the first one are obvious, it is much harder to find >the place to map the stack. Second change, on the other hand, provides >almost full 4G for KVA and should have mostly compensate for the negative >effects of the first. > >And, I cannot see how changing the scheduler would fix or even affect that >behaviour. My hunch is that the system was running near its limit for kernel threads/s= tacks. Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to g= et to a higher peak number of threads and hit the limit. SCHED_4BSD happened to result in timing such that it stayed just below the limit and worked. I can think of a couple of things that might affect this: 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, t= hen they wouldn't terminate and release their resources before more new o= nes are spawned. 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the = burst could try and spawn more mirror DS worker threads at about the same t= ime. Anyhow, thanks for the help, rick