From owner-freebsd-current@freebsd.org Sun Apr 22 12:03:14 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D63A8FBA8D7 for ; Sun, 22 Apr 2018 12:03:13 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30DC47EE49 for ; Sun, 22 Apr 2018 12:03:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id w3MC2gC3080461 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 22 Apr 2018 15:02:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua w3MC2gC3080461 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id w3MC2fOU080460; Sun, 22 Apr 2018 15:02:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 22 Apr 2018 15:02:41 +0300 From: Konstantin Belousov To: Rick Macklem Cc: "freebsd-current@freebsd.org" , George Mitchell , Peter Subject: Re: SCHED_ULE makes 256Mbyte i386 unusable Message-ID: <20180422120241.GR6887@kib.kiev.ua> References: <20180421201128.GO6887@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Apr 2018 12:03:14 -0000 On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote: > Konstantin Belousov wrote: > >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: > >> I decided to start a new thread on current related to SCHED_ULE, since I see > >> more than just performance degradation and on a recent current kernel. > >> (I cc'd a couple of the people discussing performance problems in freebsd-stable > >> recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic scheduler". > >> > >> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 2017 > >> current/head kernel, I would see about a 30% performance degradation (elapsed > >> run time for a kernel build over NFSv4.1) when the server kernel was built with > >> options SCHED_ULE > >> instead of > >> options SCHED_4BSD > >> > >> Now, with a kernel from a couple of days ago, the > >> options SCHED_ULE > >> kernel becomes unusable shortly after starting testing. > >> I have seen two variants of this: > >> - Became essentially hung. All I could do was ping the machine from the network. > >> - Reported "vm_thread_new: kstack allocation failed > >> and then any attempt to do anything gets "No more processes". > >This is strange. It usually means that you get KVA either exhausted or > >severly fragmented. > Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE > kernel is working ok now. I haven't done enough to compare performance yet. > Maybe I'll post again when I have some numbers. > > >Enter ddb, it should be operational since pings are replied. Try to see > >where the threads are stuck. > I didn't do this, since reducing the number of kernel threads seems to have fixed > the problem. For the pNFS server, the nfsd threads will spawn additional kernel > threads to do proxies to the mirrored DS servers. > > >> with the only difference being a kernel built with > >> options SCHED_4BSD > >> everything works and performs the same as the Dec 2017 kernel. > >> > >> I can try rolling back through the revisions, but it would be nice if someone > >> could suggest where to start, because it takes a couple of hours to build a > >> kernel on this system. > >> > >> So, something has made things worse for a head/current kernel this winter, rick > > > >There are at least two potentially relevant changes. > > > >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. > I've been running this machine with KSTACK_PAGES=4 for some time, so no change. > > >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. > Could this change have resulted in the system being able to allocate fewer > kernel threads/stacks for some reason? Well, it could, as anything can be buggy. But the intent of the change was to give 4G KVA, and it did. > > >Consequences of the first one are obvious, it is much harder to find > >the place to map the stack. Second change, on the other hand, provides > >almost full 4G for KVA and should have mostly compensate for the negative > >effects of the first. > > > >And, I cannot see how changing the scheduler would fix or even affect that > >behaviour. > My hunch is that the system was running near its limit for kernel threads/stacks. > Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get > to a higher peak number of threads and hit the limit. > SCHED_4BSD happened to result in timing such that it stayed just below the > limit and worked. > I can think of a couple of things that might affect this: > 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then > they wouldn't terminate and release their resources before more new ones > are spawned. Scheduler has nothing to do with the threads termination. It might select running threads in a way that causes the undesired pattern to appear which might create some amount of backlog for termination, but I doubt it. > 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst > could try and spawn more mirror DS worker threads at about the same time. > > Anyhow, thanks for the help, rick