Date: Sat, 21 Apr 2018 23:30:55 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, "George Mitchell" <george+freebsd@m5p.com>, Peter <pmc@citylink.dinoex.sub.org> Subject: Re: SCHED_ULE makes 256Mbyte i386 unusable Message-ID: <YQBPR0101MB10421529BB346952BCE7F20EDD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <20180421201128.GO6887@kib.kiev.ua> References: <YQBPR0101MB1042F252A539E8D55EB44585DD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <20180421201128.GO6887@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote: >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: >> I decided to start a new thread on current related to SCHED_ULE, since I= see >> more than just performance degradation and on a recent current kernel. >> (I cc'd a couple of the people discussing performance problems in freebs= d-stable >> recently under a subject line of "Re: kern.sched.quantum: Creepy, sadis= tic scheduler". >> >> When testing a pNFS server on a single core i386 with 256Mbytes using a = Dec. 2017 >> current/head kernel, I would see about a 30% performance degradation (el= apsed >> run time for a kernel build over NFSv4.1) when the server kernel was bui= lt with >> options SCHED_ULE >> instead of >> options SCHED_4BSD >> >> Now, with a kernel from a couple of days ago, the >> options SCHED_ULE >> kernel becomes unusable shortly after starting testing. >> I have seen two variants of this: >> - Became essentially hung. All I could do was ping the machine from the = network. >> - Reported "vm_thread_new: kstack allocation failed >> and then any attempt to do anything gets "No more processes". >This is strange. It usually means that you get KVA either exhausted or >severly fragmented. Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE kernel is working ok now. I haven't done enough to compare performance yet. Maybe I'll post again when I have some numbers. >Enter ddb, it should be operational since pings are replied. Try to see >where the threads are stuck. I didn't do this, since reducing the number of kernel threads seems to have= fixed the problem. For the pNFS server, the nfsd threads will spawn additional ke= rnel threads to do proxies to the mirrored DS servers. >> with the only difference being a kernel built with >> options SCHED_4BSD >> everything works and performs the same as the Dec 2017 kernel. >> >> I can try rolling back through the revisions, but it would be nice if so= meone >> could suggest where to start, because it takes a couple of hours to buil= d a >> kernel on this system. >> >> So, something has made things worse for a head/current kernel this winte= r, rick > >There are at least two potentially relevant changes. > >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. I've been running this machine with KSTACK_PAGES=3D4 for some time, so no c= hange. >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. Could this change have resulted in the system being able to allocate fewer kernel threads/stacks for some reason? >Consequences of the first one are obvious, it is much harder to find >the place to map the stack. Second change, on the other hand, provides >almost full 4G for KVA and should have mostly compensate for the negative >effects of the first. > >And, I cannot see how changing the scheduler would fix or even affect that >behaviour. My hunch is that the system was running near its limit for kernel threads/s= tacks. Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to g= et to a higher peak number of threads and hit the limit. SCHED_4BSD happened to result in timing such that it stayed just below the limit and worked. I can think of a couple of things that might affect this: 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, t= hen they wouldn't terminate and release their resources before more new o= nes are spawned. 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the = burst could try and spawn more mirror DS worker threads at about the same t= ime. Anyhow, thanks for the help, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB10421529BB346952BCE7F20EDD8B0>