FreeBSD Mail Archives

Date:      Thu, 30 Nov 2017 15:47:50 -0500
From:      Mark Johnston <markj@FreeBSD.org>
To:        Larry McVoy <lm@mcvoy.com>
Cc:        Warner Losh <imp@bsdimp.com>, Scott Long <scottl@netflix.com>, Kevin Bowling <kbowling@llnw.com>, Drew Gallatin <gallatin@netflix.com>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: small patch for pageout. Comments?
Message-ID:  <20171130204750.GB21606@raichu>
In-Reply-To: <20171130184923.GA30262@mcvoy.com>
References:  <20171130173424.GA811@mcvoy.com> <CANCZdfqL9ZsKTfFi%2BvsCTh3yaNjtwaYYY3fvivdbNybBnujawg@mail.gmail.com> <20171130184923.GA30262@mcvoy.com>

On Thu, Nov 30, 2017 at 10:49:23AM -0800, Larry McVoy wrote:
> On Thu, Nov 30, 2017 at 11:37:35AM -0700, Warner Losh wrote:
> > On Thu, Nov 30, 2017 at 10:34 AM, Larry McVoy <lm@mcvoy.com> wrote:
> > 
> > > In a recent numa meeting that Scott called, Jeff suggested a small
> > > patch to the pageout daemon (included below).
> > >
> > > It's rather dramatic the difference it makes for me.  If I arrange to
> > > thrash the crap out of memory, without this patch the kernel is so
> > > borked with all the processes in disk wait that I can't kill them,
> > > I can't reboot, my only option is to power off.
> > >
> > > With the patch there is still some borkage, the kernel is randomly
> > > killing processes because of out of mem, it should kill one of my
> > > processes that is causing the problem but it doesn't, it killed
> > > random stuff like dhclient, getty (logged me out), etc.
> > >
> > > But the system is responsive.
> > >
> > > What the patch does is say "if we have more than one core, don't sleep
> > > in pageout, just keep running until we freed enough mem".
> > >
> > > Comments?
> > >
> > 
> > Just to confirm why this patch works.
> > 
> > For UP systems, we have to pause here to allow work to complete, otherwise
> > we can't switch to their threads to complete the I/Os. For MP, however, we
> > can continue to schedule more work because that work can be completed on
> > other CPUs. This parallelism greatly increases the pageout rate, allowing
> > the system to keep up better when some ass-hat process (or processes) is
> > thrashing memory.
> 
> Yep.
> 
> > I'm pretty sure the UP case was also designed to not flood the lower layers
> > with work, starving other consumers. Does this result in undo flooding, and
> > would we get better results if we could schedule up to the right amount of
> > work rather flooding in the MP case?
> 
> I dunno if there is a "right amount".  I could make it a little smarter by
> keeping track of how many pages we freed and sleep if we freed none in a 
> scan (which seems really unlikely).

This situation can happen if the inactive queue is full of dirty pages.
A problem with your patch is that we might not give enough time to the
laundry thread (the thread responsible for writing the contents of dirty
pages to disk and returning them to inactive queue for the page daemon
to free) to write out dirty pages. In this case we might trigger the OOM
killer prematurely, and in fact this scenario is what motivated r300865.
So I would argue that we do in fact need to sleep if the page daemon is
failing to make progress, in order to give time for I/O to complete.

> All I know for sure is that without this you can lock up the system to
> the point it takes a power cycle to unwedge it.  With this the system
> is responsive.
> 
> Rather than worrying about the smartness, I'd argue this is an improvement,
> ship it, and then I can go look at how the system decides to kill processes
> (because that's currently busted).
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171130204750.GB21606>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation