From owner-svn-src-head@FreeBSD.ORG Wed Jun 5 12:11:36 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 61C38FB9; Wed, 5 Jun 2013 12:11:36 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 40ADC145C; Wed, 5 Jun 2013 12:11:36 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id F29B91A3C20; Wed, 5 Jun 2013 05:11:25 -0700 (PDT) Message-ID: <51AF2AE8.4080308@mu.org> Date: Wed, 05 Jun 2013 05:11:20 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Bruce Evans Subject: Re: svn commit: r251282 - head/sys/kern References: <201306030416.r534GmCA001872@svn.freebsd.org> <51AC1B49.9090001@mu.org> <20130603075539.GK3047@kib.kiev.ua> <51AC60CA.6050105@mu.org> <20130604052219.GP3047@kib.kiev.ua> <20130604170410.M1018@besplex.bde.org> In-Reply-To: <20130604170410.M1018@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Konstantin Belousov , svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jun 2013 12:11:36 -0000 Konstantin, is there any way to take some of Bruce's feedback into account to get rid of the hard coded max? -Alfred On 6/4/13 1:14 AM, Bruce Evans wrote: > On Tue, 4 Jun 2013, Konstantin Belousov wrote: > >> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote: >>> On 6/3/13 12:55 AM, Konstantin Belousov wrote: >>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote: >>>>> Hey Konstaintin, shouldn't this be scaled against the actual >>>>> amount of >>>>> KVA we have instead of an arbitrary limit? >>>> The commit changes the buffer cache to scale according to the >>>> available >>>> KVA, making the scaling less dumb. >>>> >>>> I do not understand what exactly do you want to do, please describe >>>> the >>>> algorithm you propose to implement instead of my change. >>> >>> Sure, how about deriving the hardcoded "32" from the maxkva a machine >>> can have? >>> >>> Is that possible? >> I do not see why this would be useful. Initially I thought about simply >> capping nbuf at 100000 without referencing any "memory". Then I realized >> that this would somewhat conflict with (unlikely) changes to the value >> of BKVASIZE due to "factor". > > The presence of BKVASIZE in 'factor' is a bug. My version never had this > bug (see below for a patch). The scaling should be to maximize nbuf, > subject to non-arbitrary limits on physical memory and kva, and now an > arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf. Your new > limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE. > > Expanding BKVASIZE should expand kva use, but on i386 this will soon > hit a non-arbitary kva limit so nbuf will not be as high as preferred. > nbuf needs to be very large mainly to support file systems with small > buffers. Even 100000 only gives 50MB of buffering if the fs block > size is 512. This would shrink to only 12.5MB if BKVASIZE is expanded > by a factor of 4 and the bug is not fixed. If 25000 buffers after > expanding BKVASIZE is enough, then that should be the arbitary limit > (independent of BKVASIZE) so as to save physical memory. > > On i386 systems with 1GB RAM, nbuf defaults to about 7000. With an > fs block size of 512, that can buffer 3.5MB. Expanding BKVASIZE by a > factor of 4 shrinks this to 0.875MB in -current. That is ridiculously > small. VMIO limits the lossage from this. > > BKVASIZE was originally 8KB. I forget if nbuf was halved by not > modifying > the scale factor when it was expanded to 16KB. Probably not. I used to > modify the scale factor to get twice as many as the default nbuf, but > once the default nbuf expanded to a few thousand it became large enough > for most purposes so I no longer do this. > > Except on arches with extremely limit kva like i386, KVASIZE should be > MAXBSIZE, and all of the complications for expanding a buffer's kva > beyond BKVASIZE and handling the fragmentation problems caused by this > shouldn't exist. Limits like vfs.maxbufspace should be scaled by > NOMBSIZE = current BKVASIZE instead of BKVASIZE. Oops, my NOMBSIZE > has similar logical problems to BKVASIZE. It needs to be precisely > 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal > (ffs) buffer size is now 32KB. So the heuristic scale factors should > use the historical magic number 16K. I changed sequential_heuristic() > to use a hard-coded 16K instead of BKVASIZE. The correct number here > depends on disk technology. > > The patch has many style fixes: > > @ Index: vfs_bio.c > @ =================================================================== > @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v > @ retrieving revision 1.436 > @ diff -u -2 -r1.436 vfs_bio.c > @ --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436 > @ +++ vfs_bio.c 3 Jun 2013 16:04:34 -0000 > @ @@ -419,64 +493,54 @@ > @ @ /* > @ - * Calculating buffer cache scaling values and reserve space for > buffer > @ + * Calculate buffer cache scaling values and reserve space for buffer > @ * headers. This is called during low level kernel initialization and > @ * may be called more then once. We CANNOT write to the memory area > @ * being reserved at this time. > @ */ > @ -caddr_t > @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est) > @ +void * > @ +vfs_bio_alloc(void *va) > > The API name was verbose and bogus. The prefix for this subsystem is > vfs_bio_, not kern_vfs_bio_. This is a generic allocation routine. > It always allocated swbufs and has expanded to do more allocations. > > @ { > @ - /* > @ - * physmem_est is in pages. Convert it to kilobytes (assumes > @ - * PAGE_SIZE is >= 1K) > @ - */ > @ - physmem_est = physmem_est * (PAGE_SIZE / 1024); > > I use the global physmem. This change may be too i386-specific. In > with 8-16 RAM, a significant amount of RAM may be eaten by the kernel > image or perhaps just unavailable to the kernel. Now memories are > larger and the amount eaten is relatively insignificant (since we are > only using a small fraction of physmem, only the relative error matters). > > @ + int factor, memsize; > @ @ /* > @ - * The nominal buffer size (and minimum KVA allocation) is > BKVASIZE. > @ - * For the first 64MB of ram nominally allocate sufficient > buffers to > @ - * cover 1/4 of our ram. Beyond the first 64MB allocate > additional > @ - * buffers to cover 1/20 of our ram over 64MB. When auto-sizing > @ - * the buffer cache we limit the eventual kva reservation to > @ + * The nominal buffer size is NOMBSIZE. > @ + * For the first 4MB of RAM, allocate 50 buffers. > @ + * For the next 60MB of RAM, nominally allocate sufficient > buffers to > @ + * cover 1/4 of the RAM. Beyond the first 64MB allocate > additional > @ + * buffers to cover 1/20 of the RAM. When auto-sizing > @ + * the buffer cache, limit the eventual kva reservation to > @ * maxbcache bytes. > > Fix old bitrot in this comment, and update for my changes. > - the first 4MB wasn't mentioned > - the minimum clamp wasn't mentioned > - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without > changing the default nbuf (subject to maxbcache). > > The bitrot of the magic number 1/20 is not fixed. The code uses 1/10. > This is fixed in -current. I should have noticed this before, since > I used to have to change the code to get 1/10 instead of 1/20. > > I hope you updated the comment for recent changes, but I don't like > having magic numbers in comments. > > @ * > @ - * factor represents the 1/4 x ram conversion. > @ + * `factor' represents the 1/4 of RAM conversion. > @ */ > @ +#define NOMBSIZE imax(PAGE_SIZE, 16384) /* XXX */ > @ if (nbuf == 0) { > @ - int factor = 4 * BKVASIZE / 1024; > @ + /* > @ + * physmem is in pages. Convert it to a size in kilobytes. > @ + * XXX no ptob(). > @ + */ > @ + memsize = ctob(physmem) / 1024; > @ @ + factor = 4 * NOMBSIZE / 1024; > @ nbuf = 50; > @ - if (physmem_est > 4096) > @ - nbuf += min((physmem_est - 4096) / factor, > @ - 65536 / factor); > @ - if (physmem_est > 65536) > @ - nbuf += (physmem_est - 65536) * 2 / (factor * 5); > @ - > @ - if (maxbcache && nbuf > maxbcache / BKVASIZE) > @ - nbuf = maxbcache / BKVASIZE; > > The maxbcache limit must be applied to the non-default nbuf too. > > @ + if (memsize > 4096) > @ + nbuf += imin((memsize - 4096) / factor, > @ + (65536 - 4096) / factor); > @ + if (memsize > 65536) > @ + nbuf += (memsize - 65536) * 2 / (factor * 5); > @ } > @ - > @ -#if 0 > @ - /* > @ - * Do not allow the buffer_map to be more then 1/2 the size of the > @ - * kernel_map. > @ - */ > @ - if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / > @ - (BKVASIZE * 2)) { > @ - nbuf = (kernel_map->max_offset - kernel_map->min_offset) / > @ - (BKVASIZE * 2); > @ - printf("Warning: nbufs capped at %d\n", nbuf); > @ - } > @ -#endif > > Already removed in -current. > > @ + if (nbuf > maxbcache / BKVASIZE) > @ + nbuf = maxbcache / BKVASIZE; > @ @ /* > @ * swbufs are used as temporary holders for I/O, such as paging > I/O. > @ - * We have no less then 16 and no more then 256. > @ + * There must be between 16 and 256 of them. > @ */ > @ - nswbuf = max(min(nbuf/4, 256), 16); > @ + nswbuf = imax(imin(nbuf / 4, 256), 16); > @ #ifdef NSWBUF_MIN > @ if (nswbuf < NSWBUF_MIN) > @ nswbuf = NSWBUF_MIN; > @ #endif > @ + > @ #ifdef DIRECTIO > @ ffs_rawread_setup(); > @ @@ -484,12 +548,12 @@ > @ @ /* > @ - * Reserve space for the buffer cache buffers > @ + * Reserve space for swbuf headers and buffer cache buffers. > @ */ > @ - swbuf = (void *)v; > @ - v = (caddr_t)(swbuf + nswbuf); > @ - buf = (void *)v; > @ - v = (caddr_t)(buf + nbuf); > @ + swbuf = va; > @ + va = swbuf + nswbuf; > @ + buf = va; > @ + va = buf + nbuf; > @ @ - return(v); > @ + return (va); > @ } > @ > > Don't use the caddr_t mistake. The API was also changed to not use it. > > Bruce >