Date: Fri, 21 Jul 2000 12:06:16 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Lars Eggert <larse@ISI.EDU> Cc: Alan Cox <alc@cs.rice.edu>, hackers@FreeBSD.ORG, cort@cs.nmt.edu Subject: Re: clearing pages in the idle loop Message-ID: <200007211906.MAA19989@earth.backplane.com> References: <20000719234124.H14543@cs.rice.edu> <39788E48.60F8A59F@isi.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
:Alan Cox wrote:
:> Last year, I tried to reproduce some of the claims/results
:> in this paper on FreeBSD/x86 and couldn't. I also tried
:> limiting the idle loop to clearing pages of one particular
:...
:
:> Finally, it's possible that having these pre-zeroed pages
:> in your L2 cache might be beneficial if they get allocated
:> and used right away. FreeBSD's idle loop zeroes the pages
:> that are next in line for allocation.
:
:That makes sense. Other factors that may have an impact:
:
: * if you always have enough zeroed pages remaining over your
: benchmark (> ~1/2 free pages), FreeBSD will never do the
: idle-time zeroing
:
: * it looks to me as if Cort's Linux code will always zero whole
: pages, while the FreeBSD code is a little smarter and only zeroes
: used regions of a page (less impact on caches?)
:
: * cache size differences between PPC and i386?
:
:I'm looking at Cort's code (arch/ppc/kernel/idle.c), and while he turns off
:the caching for pages he zeroes, I don't see him disabling the L1/2 caches
:...
:Lars
:--
:Lars Eggert <larse@isi.edu> Information Sciences Institute
Since the only effect of a cache miss is less efficient use of
the cpu, and since the page zeroing only occurs when the cpu is idle,
I would not expect to see much improvement from attempts to refine
the page-zeroing operation (beyond the simple hysteresis that FreeBSD
uses now and perhaps being able to bypass the cache).
The hysteresis in the idle loop's page-zeroing effectively
decouples the page-zeroing operation (and any loss of cache) from
the processes benefiting from the availability of pre-zero'd pages.
The real benefit occurs on a medium-to-heavily loaded machine which is
NOT cpu bound. Since nearly all page allocations require zero'd pages,
having a pool of pre-zero'd pages significantly reduces allocation
latency at just the time the process doing the allocation can best
benefit from it. In a cpu-bound system, the idle loop does not run
as often (or at all) and no pre-zeroing occurs anyway.
In regards to just zeroing the pieces of a page that need zeroing - this
is NOT an optimization designed for the idle-loop page-zeroing code. I
would not expect such an optimization to have any effect on idle-loop
page zeroing performance. The partial-zeroing code is actually designed
to handle filling in missing spots when a device-backed block (devices
use a 512 byte base blocking factor) is mapped into memory (which requires
a page-sized blocking factor).
For example, when you map the end of a file and the file size is
not page-aligned. The block device underlying the filesystem
has a 512-byte native blocking factor and the filesystem itself (UFS)
will typically have a 1K fragment blocking factor at the end of the file,
which means that the physical disk I/O via the filesystem device may not
cover an entire MMU page (4K for i386).
The filesystem code doesn't give a damn whether the filesystem buffer
it is reading the data into is zero'd beyond the EOF of the file. In
fact, we don't even bother to zero that area... UNTIL that particular
page is mapped by some user process. That is the point where the partial
page-zeroing code comes into play. It has nothing to do with the idle
loop pre-zeroing but since its a generic routine (part of the VM core),
the idle loops happens to call it generically.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200007211906.MAA19989>
