Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Jun 2013 18:14:49 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Alfred Perlstein <bright@mu.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r251282 - head/sys/kern
Message-ID:  <20130604170410.M1018@besplex.bde.org>
In-Reply-To: <20130604052219.GP3047@kib.kiev.ua>
References:  <201306030416.r534GmCA001872@svn.freebsd.org> <51AC1B49.9090001@mu.org> <20130603075539.GK3047@kib.kiev.ua> <51AC60CA.6050105@mu.org> <20130604052219.GP3047@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 4 Jun 2013, Konstantin Belousov wrote:

> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote:
>> On 6/3/13 12:55 AM, Konstantin Belousov wrote:
>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote:
>>>> Hey Konstaintin, shouldn't this be scaled against the actual amount of
>>>> KVA we have instead of an arbitrary limit?
>>> The commit changes the buffer cache to scale according to the available
>>> KVA, making the scaling less dumb.
>>>
>>> I do not understand what exactly do you want to do, please describe the
>>> algorithm you propose to implement instead of my change.
>>
>> Sure, how about deriving the hardcoded "32" from the maxkva a machine
>> can have?
>>
>> Is that possible?
> I do not see why this would be useful. Initially I thought about simply
> capping nbuf at 100000 without referencing any "memory". Then I realized
> that this would somewhat conflict with (unlikely) changes to the value
> of BKVASIZE due to "factor".

The presence of BKVASIZE in 'factor' is a bug.  My version never had this
bug (see below for a patch).  The scaling should be to maximize nbuf,
subject to non-arbitrary limits on physical memory and kva, and now an
arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf.  Your new
limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE.

Expanding BKVASIZE should expand kva use, but on i386 this will soon
hit a non-arbitary kva limit so nbuf will not be as high as preferred.
nbuf needs to be very large mainly to support file systems with small
buffers.  Even 100000 only gives 50MB of buffering if the fs block
size is 512.  This would shrink to only 12.5MB if BKVASIZE is expanded
by a factor of 4 and the bug is not fixed.  If 25000 buffers after
expanding BKVASIZE is enough, then that should be the arbitary limit
(independent of BKVASIZE) so as to save physical memory.

On i386 systems with 1GB RAM, nbuf defaults to about 7000.  With an
fs block size of 512, that can buffer 3.5MB.  Expanding BKVASIZE by a
factor of 4 shrinks this to 0.875MB in -current.  That is ridiculously
small.  VMIO limits the lossage from this.

BKVASIZE was originally 8KB.  I forget if nbuf was halved by not modifying
the scale factor when it was expanded to 16KB.  Probably not.  I used to
modify the scale factor to get twice as many as the default nbuf, but
once the default nbuf expanded to a few thousand it became large enough
for most purposes so I no longer do this.

Except on arches with extremely limit kva like i386, KVASIZE should be
MAXBSIZE, and all of the complications for expanding a buffer's kva
beyond BKVASIZE and handling the fragmentation problems caused by this
shouldn't exist.  Limits like vfs.maxbufspace should be scaled by
NOMBSIZE = current BKVASIZE instead of BKVASIZE.  Oops, my NOMBSIZE
has similar logical problems to BKVASIZE.  It needs to be precisely
16KB to preserve the defaults for nbuf and maxbufspace, but the nominal
(ffs) buffer size is now 32KB.  So the heuristic scale factors should
use the historical magic number 16K.  I changed sequential_heuristic()
to use a hard-coded 16K instead of BKVASIZE.  The correct number here
depends on disk technology.

The patch has many style fixes:

@ Index: vfs_bio.c
@ ===================================================================
@ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
@ retrieving revision 1.436
@ diff -u -2 -r1.436 vfs_bio.c
@ --- vfs_bio.c	17 Jun 2004 17:16:49 -0000	1.436
@ +++ vfs_bio.c	3 Jun 2013 16:04:34 -0000
@ @@ -419,64 +493,54 @@
@ 
@  /*
@ - * Calculating buffer cache scaling values and reserve space for buffer
@ + * Calculate buffer cache scaling values and reserve space for buffer
@   * headers.  This is called during low level kernel initialization and
@   * may be called more then once.  We CANNOT write to the memory area
@   * being reserved at this time.
@   */
@ -caddr_t
@ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est)
@ +void *
@ +vfs_bio_alloc(void *va)

The API name was verbose and bogus.  The prefix for this subsystem is
vfs_bio_, not kern_vfs_bio_.  This is a generic allocation routine.
It always allocated swbufs and has expanded to do more allocations.

@  {
@ -	/*
@ -	 * physmem_est is in pages.  Convert it to kilobytes (assumes
@ -	 * PAGE_SIZE is >= 1K)
@ -	 */
@ -	physmem_est = physmem_est * (PAGE_SIZE / 1024);

I use the global physmem.  This change may be too i386-specific.  In
with 8-16 RAM, a significant amount of RAM may be eaten by the kernel
image or perhaps just unavailable to the kernel.  Now memories are
larger and the amount eaten is relatively insignificant (since we are
only using a small fraction of physmem, only the relative error matters).

@ +	int factor, memsize;
@ 
@  	/*
@ -	 * The nominal buffer size (and minimum KVA allocation) is BKVASIZE.
@ -	 * For the first 64MB of ram nominally allocate sufficient buffers to
@ -	 * cover 1/4 of our ram.  Beyond the first 64MB allocate additional
@ -	 * buffers to cover 1/20 of our ram over 64MB.  When auto-sizing
@ -	 * the buffer cache we limit the eventual kva reservation to
@ +	 * The nominal buffer size is NOMBSIZE.
@ +	 * For the first 4MB of RAM, allocate 50 buffers.
@ +	 * For the next 60MB of RAM, nominally allocate sufficient buffers to
@ +	 * cover 1/4 of the RAM.  Beyond the first 64MB allocate additional
@ +	 * buffers to cover 1/20 of the RAM.  When auto-sizing
@ +	 * the buffer cache, limit the eventual kva reservation to
@  	 * maxbcache bytes.

Fix old bitrot in this comment, and update for my changes.
- the first 4MB wasn't mentioned
- the minimum clamp wasn't mentioned
- replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without
   changing the default nbuf (subject to maxbcache).

The bitrot of the magic number 1/20 is not fixed.  The code uses 1/10.
This is fixed in -current.  I should have noticed this before, since
I used to have to change the code to get 1/10 instead of 1/20.

I hope you updated the comment for recent changes, but I don't like
having magic numbers in comments.

@  	 *
@ -	 * factor represents the 1/4 x ram conversion.
@ +	 * `factor' represents the 1/4 of RAM conversion.
@  	 */
@ +#define	NOMBSIZE	imax(PAGE_SIZE, 16384)	/* XXX */
@  	if (nbuf == 0) {
@ -		int factor = 4 * BKVASIZE / 1024;
@ +		/*
@ +		 * physmem is in pages.  Convert it to a size in kilobytes.
@ +		 * XXX no ptob().
@ +		 */
@ +		memsize = ctob(physmem) / 1024;
@ 
@ +		factor = 4 * NOMBSIZE / 1024;
@  		nbuf = 50;
@ -		if (physmem_est > 4096)
@ -			nbuf += min((physmem_est - 4096) / factor,
@ -			    65536 / factor);
@ -		if (physmem_est > 65536)
@ -			nbuf += (physmem_est - 65536) * 2 / (factor * 5);
@ -
@ -		if (maxbcache && nbuf > maxbcache / BKVASIZE)
@ -			nbuf = maxbcache / BKVASIZE;

The maxbcache limit must be applied to the non-default nbuf too.

@ +		if (memsize > 4096)
@ +			nbuf += imin((memsize - 4096) / factor,
@ +			    (65536 - 4096) / factor);
@ +		if (memsize > 65536)
@ +			nbuf += (memsize - 65536) * 2 / (factor * 5);
@  	}
@ -
@ -#if 0
@ -	/*
@ -	 * Do not allow the buffer_map to be more then 1/2 the size of the
@ -	 * kernel_map.
@ -	 */
@ -	if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / 
@ -	    (BKVASIZE * 2)) {
@ -		nbuf = (kernel_map->max_offset - kernel_map->min_offset) / 
@ -		    (BKVASIZE * 2);
@ -		printf("Warning: nbufs capped at %d\n", nbuf);
@ -	}
@ -#endif

Already removed in -current.

@ +	if (nbuf > maxbcache / BKVASIZE)
@ +		nbuf = maxbcache / BKVASIZE;
@ 
@  	/*
@  	 * swbufs are used as temporary holders for I/O, such as paging I/O.
@ -	 * We have no less then 16 and no more then 256.
@ +	 * There must be between 16 and 256 of them.
@  	 */
@ -	nswbuf = max(min(nbuf/4, 256), 16);
@ +	nswbuf = imax(imin(nbuf / 4, 256), 16);
@  #ifdef NSWBUF_MIN
@  	if (nswbuf < NSWBUF_MIN)
@  		nswbuf = NSWBUF_MIN;
@  #endif
@ +
@  #ifdef DIRECTIO
@  	ffs_rawread_setup();
@ @@ -484,12 +548,12 @@
@ 
@  	/*
@ -	 * Reserve space for the buffer cache buffers
@ +	 * Reserve space for swbuf headers and buffer cache buffers.
@  	 */
@ -	swbuf = (void *)v;
@ -	v = (caddr_t)(swbuf + nswbuf);
@ -	buf = (void *)v;
@ -	v = (caddr_t)(buf + nbuf);
@ +	swbuf = va;
@ +	va = swbuf + nswbuf;
@ +	buf = va;
@ +	va = buf + nbuf;
@ 
@ -	return(v);
@ +	return (va);
@  }
@

Don't use the caddr_t mistake.  The API was also changed to not use it.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130604170410.M1018>