Date: Wed, 23 Oct 2002 15:22:48 +0900 From: Seigo Tanimura <tanimura@axe-inc.co.jp> To: arch@FreeBSD.org Cc: tanimura@axe-inc.co.jp, tanimura@FreeBSD.org Subject: Review Request (Re: Dynamic growth of the buffer and buffer page reclaim) Message-ID: <200210230622.g9N6MmoK065433@shojaku.t.axe-inc.co.jp> In-Reply-To: <200210220949.g9M9nroK026750@shojaku.t.axe-inc.co.jp> References: <200210220949.g9M9nroK026750@shojaku.t.axe-inc.co.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
Could anyone interested please review my patch before I either commit
it or proceed further?
Possible Issues:
1. Fragmentation of the KVA space
If fragmentation goes too bad, we have to reclaim the KVA space of
the buffers in the clean queue.
One solution: let a kernel thread (the page scanner? or the buffer
daemon?) scan the buffers in the clean queue periodically. If
there is a buffer where its first page is reclaimed(*), give up the
KVA space of the buffer and move it to the EMPTYKVA queue. The
unreclaimed pages of that buffer will hopefully stay cached in the
backing VM object.
(*) The address of the pages in a buffer must start at b_kvabase.
2. Lock of buffers and pages
A buffer may need locked across vfs_{un,re}wirepages(). The page
queue mutex locks the object and the pindex of a page in
vfs_rewirepages(), so it should hopefully safe. (but not sure)
TIA.
On Tue, 22 Oct 2002 18:49:53 +0900,
Seigo Tanimura <tanimura@axe-inc.co.jp> said:
tanimura> Introduction:
tanimura> The I/O buffer of the kernel are currently allocated in buffer_map
tanimura> sized statically upon boot, and never grows. This limits the scale of
tanimura> I/O performance on a host with large physical memory. We used to tune
tanimura> NBUF to cope with that problem. This workaround, however, results in
tanimura> a lot of wired pages not available for user processes, which is not
tanimura> acceptable for memory-bound applications.
tanimura> In order to run both I/O-bound and memory-bound processes on the same
tanimura> host, it is essential to achieve:
tanimura> A) allocation of buffer from kernel_map to break the limit of a map
tanimura> size, and
tanimura> B) page reclaim from idle buffers to regulate the number of wired
tanimura> pages.
tanimura> The patch at:
tanimura> http://people.FreeBSD.org/~tanimura/patches/dynamicbuf.diff.gz
tanimura> implements buffer allocation from kernel_map and reclaim of buffer
tanimura> pages. With this patch, make kernel-depend && make kernel completes
tanimura> about 30-60 seconds faster on my PC.
tanimura> Implementation in Detail:
tanimura> A) is easy; first you need to do s/buffer_map/kernel_map/. Since an
tanimura> arbitrary number of buffer pages can be allocated dynamically, buffer
tanimura> headers (struct buf) should be allocated dynamically as well. Glue
tanimura> them together into a list so that they can be traversed by boot()
tanimura> et. al.
tanimura> In order to accomplish B), we must find buffers both the filesystem
tanimura> and I/O codes will not touch. The clean buffer queue holds such the
tanimura> buffers. (exception: if the vnode associated with a clean buffer is
tanimura> held by the namecache, it may access the buffer page.) Thus, we
tanimura> should unwire the pages of a buffer prior to enqueuing it to the clean
tanimura> queue, and rewire the pages down in bremfree() if the pages are not
tanimura> reclaimed.
tanimura> Although unwiring gives a page a chance of being reclaimed, we can go
tanimura> further. In Solaris, it is known that file cache pages should be
tanimura> reclaimed prior to the other kinds of pages (anonymous, executable,
tanimura> etc.) for a better performance. Mainly due to a lack of time to work
tanimura> on distinguishing the kind of a page to be unwired, I simply pass all
tanimura> unwired pages to vm_page_dontneed(). This approach places most of the
tanimura> unwired buffer pages at just one step to the cache queue.
tanimura> Experimental Evaluation and Results:
tanimura> The times taken to complete make kernel-depend && make kernel just
tanimura> after booting into single-user mode have been measured on my ThinkPad
tanimura> 600E (CPU: Pentium II 366MHz, RAM: 160MB) by time(1). The number
tanimura> passed to the -j option of make(1) has been varied from 1 to 30 in
tanimura> order to control the pressure of the memory demand for user processes.
tanimura> The baseline is the kernel without my patch.
tanimura> The following table shows the results. All of the times are in
tanimura> seconds.
tanimura> -j baseline w/ my patch
tanimura> real user sys real user sys
tanimura> 1 1608.21 1387.94 125.96 1577.88 1391.02 100.90
tanimura> 10 1576.10 1360.17 132.76 1531.79 1347.30 103.60
tanimura> 20 1568.01 1280.89 133.22 1509.36 1276.75 104.69
tanimura> 30 1923.42 1215.00 155.50 1865.13 1219.07 113.43
tanimura> Most of the improvements in the real times are accomplished by the
tanimura> speedup of system calls. The hit ratio of getblk() may be increased,
tanimura> but not examined yet.
tanimura> Another interesting results are the numbers of swaps, shown below.
tanimura> -j baseline w/ my patch
tanimura> 1 0 0
tanimura> 10 0 0
tanimura> 20 141 77
tanimura> 30 530 465
tanimura> Since the baseline kernel does not free buffer pages at all(*), it may
tanimura> be putting a pressure on the pages too much.
tanimura> (*) bfreekva() is called only when the whole KVA is too fragmented.
tanimura> Userland Interfaces:
tanimura> The sysctl variable vfs.bufspace now reports the size of the pages
tanimura> allocated for buffer, both wired and unwired. A new sysctl variable,
tanimura> vfs.bufwiredspace tells the size of the buffer pages wired down.
tanimura> vfs.bufkvaspace returns the size of the KVA space for buffer.
tanimura> Future Works:
tanimura> The handling of unwired pages can be improved by scanning only buffer
tanimura> pages. In that case, we may have to run the vm page scanner more
tanimura> frequently, as does Solaris.
tanimura> vfs.bufspace does not track the buffer pages reclaimed by the page
tanimura> scanner. They are counted when the buffer associated with those pages
tanimura> are removed from the clean queue, which is too late.
tanimura> Benchmark tools concentrating on disk I/O performance (bonnie, iozone,
tanimura> postmark, etc) may be more suitable than make kernel for evaluation.
tanimura> Comments and flames are welcome. Thanks a lot.
tanimura> --
tanimura> Seigo Tanimura <tanimura@axe-inc.co.jp> <tanimura@FreeBSD.org>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210230622.g9N6MmoK065433>
