Date: Mon, 28 Oct 2002 00:54:57 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Jeff Roberson <jroberson@chesapeake.net> Cc: Seigo Tanimura <tanimura@axe-inc.co.jp>, Bruce Evans <bde@zeta.org.au>, <current@FreeBSD.ORG>, <tanimura@FreeBSD.ORG> Subject: Re: Dynamic growth of the buffer and buffer page reclaim Message-ID: <200210280854.g9S8svSr094312@apollo.backplane.com> References: <20021023163758.R22147-100000@mail.chesapeake.net>
next in thread | previous in thread | raw e-mail | index | archive | help
:I was going to comment on fragmentation issues, but that seems to have
:been very well covered. I would like to point out that removing the
:buffer_map not only contributes to kernel map fragmentation, but also
:contention for the kernel map. It might also prevent us from removing
:giant from the kernel map because it would add another interrupt time
:consumer.
Yes. Whatever the case any sort of temporary KVA mapping management
system would need its own submap. It would be insane to use the
kernel_map or kmem_map for this.
In regards to Seigo's patch:
The scaleability issue is entirely related to the KVA mapping portion
of the buffer cache. Only I/O *WRITE* performance is specifically
limited by the size of the buffer_map, due to the limited number of
dirty buffers allowed in the map. This in turn is a restriction
required by filesystems which must keep track of 'dirty' buffers
in order to sequence out writes. Currently the only way around this
limitation is to use mmap/MAP_NOSYNC. In otherwords, we support
dirty VM pages that are not associated with the buffer cache but
most of the filesystem algorithms are still based around the
assumption that dirty pages will be mapped into dirty buffers.
I/O *READ* caching is limited only by the VM Page cache.
The reason you got slightly better numbers with your patch
has nothing to do with I/O performance, it is simply related to
the cost of the buffer instantiations and teardowns that occur in
the limit buffer_map space mapping pages out of the VM page cache.
Since you could have more buffers, there were fewer instantiations
and teardowns. It's that simple.
Unfortunately, this performance gain is *DIRECTLY* tied to the number
of pages wired into the buffer cache. It is precisely the wired pages
portion of the instantiation and teardown that eats the extra cpu.
So the moment you regulate the number of wired pages in the system, you
will blow the performance you are getting.
I can demonstrate the issue with a simple test. Create a large file
with dd, larger then physical memory:
dd if=/dev/zero of=test bs=1m count=4096 # create a 4G file.
Then dd (read) portions of the file and observe the performance.
Do this several times to get stable numbers.
dd if=test of=/dev/null bs=1m count=16 # repeat several times
dd if=test of=/dev/null bs=1m count=32 # etc...
You will find that read performance will drop in two significant
places: (1) When the data no longer fits in the buffer cache and
the buffer cache is forced to teardown wirings and rewire other
pages from the VM page cache. Still no physical I/O is being done.
(2) When the data no longer fits in the VM page cache and the system
is forced to perform physical I/O.
Its case (1) that you are manipulating with your patch, and as you can
see it is entirely dependant on the number of wired pages that the
system is able to maintain in the buffer cache.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210280854.g9S8svSr094312>
