Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Apr 2018 15:02:41 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, George Mitchell <george+freebsd@m5p.com>, Peter <pmc@citylink.dinoex.sub.org>
Subject:   Re: SCHED_ULE makes 256Mbyte i386 unusable
Message-ID:  <20180422120241.GR6887@kib.kiev.ua>
In-Reply-To: <YQBPR0101MB10421529BB346952BCE7F20EDD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
References:  <YQBPR0101MB1042F252A539E8D55EB44585DD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM> <20180421201128.GO6887@kib.kiev.ua> <YQBPR0101MB10421529BB346952BCE7F20EDD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote:
> >> I decided to start a new thread on current related to SCHED_ULE, since I see
> >> more than just performance degradation and on a recent current kernel.
> >> (I cc'd a couple of the people discussing performance problems in freebsd-stable
> >>  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic scheduler".
> >>
> >> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 2017
> >> current/head kernel, I would see about a 30% performance degradation (elapsed
> >> run time for a kernel build over NFSv4.1) when the server kernel was built with
> >> options SCHED_ULE
> >> instead of
> >> options SCHED_4BSD
> >>
> >> Now, with a kernel from a couple of days ago, the
> >> options SCHED_ULE
> >> kernel becomes unusable shortly after starting testing.
> >> I have seen two variants of this:
> >> - Became essentially hung. All I could do was ping the machine from the network.
> >> - Reported "vm_thread_new: kstack allocation failed
> >>   and then any attempt to do anything gets "No more processes".
> >This is strange.  It usually means that you get KVA either exhausted or
> >severly fragmented.
> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
> kernel is working ok now. I haven't done enough to compare performance yet.
> Maybe I'll post again when I have some numbers.
> 
> >Enter ddb, it should be operational since pings are replied.  Try to see
> >where the threads are stuck.
> I didn't do this, since reducing the number of kernel threads seems to have fixed
> the problem. For the pNFS server, the nfsd threads will spawn additional kernel
> threads to do proxies to the mirrored DS servers.
> 
> >> with the only difference being a kernel built with
> >> options SCHED_4BSD
> >> everything works and performs the same as the Dec 2017 kernel.
> >>
> >> I can try rolling back through the revisions, but it would be nice if someone
> >> could suggest where to start, because it takes a couple of hours to build a
> >> kernel on this system.
> >>
> >> So, something has made things worse for a head/current kernel this winter, rick
> >
> >There are at least two potentially relevant changes.
> >
> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
> I've been running this machine with KSTACK_PAGES=4 for some time, so no change.
> 
> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
> Could this change have resulted in the system being able to allocate fewer
> kernel threads/stacks for some reason?
Well, it could, as anything can be buggy. But the intent of the change
was to give 4G KVA, and it did.

> 
> >Consequences of the first one are obvious, it is much harder to find
> >the place to map the stack.  Second change, on the other hand, provides
> >almost full 4G for KVA and should have mostly compensate for the negative
> >effects of the first.
> >
> >And, I cannot see how changing the scheduler would fix or even affect that
> >behaviour.
> My hunch is that the system was running near its limit for kernel threads/stacks.
> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get
> to a higher peak number of threads and hit the limit.
> SCHED_4BSD happened to result in timing such that it stayed just below the
> limit and worked.
> I can think of a couple of things that might affect this:
> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then
>       they wouldn't terminate and release their resources before more new ones
>       are spawned.
Scheduler has nothing to do with the threads termination.  It might
select running threads in a way that causes the undesired pattern to
appear which might create some amount of backlog for termination, but
I doubt it.

> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst
>       could try and spawn more mirror DS worker threads at about the same time.
> 
> Anyhow, thanks for the help, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180422120241.GR6887>