Date: Wed, 23 Oct 2002 15:22:48 +0900 From: Seigo Tanimura <tanimura@axe-inc.co.jp> To: arch@FreeBSD.org Cc: tanimura@axe-inc.co.jp, tanimura@FreeBSD.org Subject: Review Request (Re: Dynamic growth of the buffer and buffer page reclaim) Message-ID: <200210230622.g9N6MmoK065433@shojaku.t.axe-inc.co.jp> In-Reply-To: <200210220949.g9M9nroK026750@shojaku.t.axe-inc.co.jp> References: <200210220949.g9M9nroK026750@shojaku.t.axe-inc.co.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
Could anyone interested please review my patch before I either commit
it or proceed further?
Possible Issues:
1. Fragmentation of the KVA space
   If fragmentation goes too bad, we have to reclaim the KVA space of
   the buffers in the clean queue.
   One solution: let a kernel thread (the page scanner? or the buffer
   daemon?) scan the buffers in the clean queue periodically.  If
   there is a buffer where its first page is reclaimed(*), give up the
   KVA space of the buffer and move it to the EMPTYKVA queue.  The
   unreclaimed pages of that buffer will hopefully stay cached in the
   backing VM object.
   (*) The address of the pages in a buffer must start at b_kvabase.
2. Lock of buffers and pages
   A buffer may need locked across vfs_{un,re}wirepages().  The page
   queue mutex locks the object and the pindex of a page in
   vfs_rewirepages(), so it should hopefully safe. (but not sure)
TIA.
On Tue, 22 Oct 2002 18:49:53 +0900,
  Seigo Tanimura <tanimura@axe-inc.co.jp> said:
tanimura> Introduction:
tanimura> The I/O buffer of the kernel are currently allocated in buffer_map
tanimura> sized statically upon boot, and never grows.  This limits the scale of
tanimura> I/O performance on a host with large physical memory.  We used to tune
tanimura> NBUF to cope with that problem.  This workaround, however, results in
tanimura> a lot of wired pages not available for user processes, which is not
tanimura> acceptable for memory-bound applications.
tanimura> In order to run both I/O-bound and memory-bound processes on the same
tanimura> host, it is essential to achieve:
tanimura> A) allocation of buffer from kernel_map to break the limit of a map
tanimura>    size, and
tanimura> B) page reclaim from idle buffers to regulate the number of wired
tanimura>    pages.
tanimura> The patch at:
tanimura> http://people.FreeBSD.org/~tanimura/patches/dynamicbuf.diff.gz
tanimura> implements buffer allocation from kernel_map and reclaim of buffer
tanimura> pages.  With this patch, make kernel-depend && make kernel completes
tanimura> about 30-60 seconds faster on my PC.
tanimura> Implementation in Detail:
tanimura> A) is easy; first you need to do s/buffer_map/kernel_map/.  Since an
tanimura> arbitrary number of buffer pages can be allocated dynamically, buffer
tanimura> headers (struct buf) should be allocated dynamically as well.  Glue
tanimura> them together into a list so that they can be traversed by boot()
tanimura> et. al.
tanimura> In order to accomplish B), we must find buffers both the filesystem
tanimura> and I/O codes will not touch.  The clean buffer queue holds such the
tanimura> buffers.  (exception: if the vnode associated with a clean buffer is
tanimura> held by the namecache, it may access the buffer page.)  Thus, we
tanimura> should unwire the pages of a buffer prior to enqueuing it to the clean
tanimura> queue, and rewire the pages down in bremfree() if the pages are not
tanimura> reclaimed.
tanimura> Although unwiring gives a page a chance of being reclaimed,  we can go
tanimura> further.  In Solaris, it is known that file cache pages should be
tanimura> reclaimed prior to the other kinds of pages (anonymous, executable,
tanimura> etc.) for a better performance.  Mainly due to a lack of time to work
tanimura> on distinguishing the kind of a page to be unwired, I simply pass all
tanimura> unwired pages to vm_page_dontneed().  This approach places most of the
tanimura> unwired buffer pages at just one step to the cache queue.
tanimura> Experimental Evaluation and Results:
tanimura> The times taken to complete make kernel-depend && make kernel just
tanimura> after booting into single-user mode have been measured on my ThinkPad
tanimura> 600E (CPU: Pentium II 366MHz, RAM: 160MB) by time(1).  The number
tanimura> passed to the -j option of make(1) has been varied from 1 to 30 in
tanimura> order to control the pressure of the memory demand for user processes.
tanimura> The baseline is the kernel without my patch.
tanimura> The following table shows the results.  All of the times are in
tanimura> seconds.
tanimura> -j	baseline		w/ my patch
tanimura> 	real	user	sys	real	user	sys
tanimura> 1	1608.21	1387.94	125.96	1577.88	1391.02	100.90
tanimura> 10	1576.10	1360.17	132.76	1531.79	1347.30	103.60
tanimura> 20	1568.01	1280.89	133.22	1509.36	1276.75	104.69
tanimura> 30	1923.42	1215.00	155.50	1865.13	1219.07	113.43
tanimura> Most of the improvements in the real times are accomplished by the
tanimura> speedup of system calls.  The hit ratio of getblk() may be increased,
tanimura> but not examined yet.
tanimura> Another interesting results are the numbers of swaps, shown below.
tanimura> -j	baseline		w/ my patch
tanimura> 1	0			0
tanimura> 10	0			0
tanimura> 20	141			77
tanimura> 30	530			465
tanimura> Since the baseline kernel does not free buffer pages at all(*), it may
tanimura> be putting a pressure on the pages too much.
tanimura> (*) bfreekva() is called only when the whole KVA is too fragmented.
tanimura> Userland Interfaces:
tanimura> The sysctl variable vfs.bufspace now reports the size of the pages
tanimura> allocated for buffer, both wired and unwired.  A new sysctl variable,
tanimura> vfs.bufwiredspace tells the size of the buffer pages wired down.
tanimura> vfs.bufkvaspace returns the size of the KVA space for buffer.
tanimura> Future Works:
tanimura> The handling of unwired pages can be improved by scanning only buffer
tanimura> pages.  In that case, we may have to run the vm page scanner more
tanimura> frequently, as does Solaris.
tanimura> vfs.bufspace does not track the buffer pages reclaimed by the page
tanimura> scanner.  They are counted when the buffer associated with those pages
tanimura> are removed from the clean queue, which is too late.
tanimura> Benchmark tools concentrating on disk I/O performance (bonnie, iozone,
tanimura> postmark, etc) may be more suitable than make kernel for evaluation.
tanimura> Comments and flames are welcome.  Thanks a lot.
tanimura> -- 
tanimura> Seigo Tanimura <tanimura@axe-inc.co.jp> <tanimura@FreeBSD.org>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210230622.g9N6MmoK065433>
