From owner-svn-src-head@FreeBSD.ORG Fri Jun 7 20:02:01 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5208F750; Fri, 7 Jun 2013 20:02:01 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3BA25122B; Fri, 7 Jun 2013 20:02:01 +0000 (UTC) Received: from kruse-124.4.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) by elvis.mu.org (Postfix) with ESMTPSA id EC7BC1A3C3E; Fri, 7 Jun 2013 13:01:50 -0700 (PDT) Message-ID: <51B23C2D.7000203@mu.org> Date: Fri, 07 Jun 2013 13:01:49 -0700 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: svn commit: r251282 - head/sys/kern References: <201306030416.r534GmCA001872@svn.freebsd.org> <51AC1B49.9090001@mu.org> <20130603075539.GK3047@kib.kiev.ua> <51AC60CA.6050105@mu.org> <20130604052219.GP3047@kib.kiev.ua> <20130604170410.M1018@besplex.bde.org> <51AF2AE8.4080308@mu.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Konstantin Belousov , svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Jun 2013 20:02:01 -0000 On 6/5/13 10:55 AM, Adrian Chadd wrote: > ... can people please boot an i386 kernel w/ this stuff in it, with > the RAM constrained to something small (say, 128mb), and verify that > you can at least start a buildworld (minus clang, of course) without > panicing? > > There's a recent PR > (http://www.freebsd.org/cgi/query-pr.cgi?pr=179112) which shows > regressions on i386 + small RAM platforms. > > I know it's a different area, but I'd really appreciate it if people > better tested this stuff out before they make i386 (and by extension, > any platform with limited KVA and small amounts of RAM) unusable > through carelessness. Adrian, I'm pretty sure that kib's patch actually limits memory. The concern I have is that it's a hard limit that appears not based on KVA, but rather 32GB. I can be wrong, but Bruce's feedback looks like it backs up my concern. -Alfred > > Thanks, > > > > Adrian > > On 5 June 2013 05:11, Alfred Perlstein wrote: >> Konstantin, is there any way to take some of Bruce's feedback into account >> to get rid of the hard coded max? >> >> -Alfred >> >> >> On 6/4/13 1:14 AM, Bruce Evans wrote: >>> On Tue, 4 Jun 2013, Konstantin Belousov wrote: >>> >>>> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote: >>>>> On 6/3/13 12:55 AM, Konstantin Belousov wrote: >>>>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote: >>>>>>> Hey Konstaintin, shouldn't this be scaled against the actual amount of >>>>>>> KVA we have instead of an arbitrary limit? >>>>>> The commit changes the buffer cache to scale according to the available >>>>>> KVA, making the scaling less dumb. >>>>>> >>>>>> I do not understand what exactly do you want to do, please describe the >>>>>> algorithm you propose to implement instead of my change. >>>>> >>>>> Sure, how about deriving the hardcoded "32" from the maxkva a machine >>>>> can have? >>>>> >>>>> Is that possible? >>>> I do not see why this would be useful. Initially I thought about simply >>>> capping nbuf at 100000 without referencing any "memory". Then I realized >>>> that this would somewhat conflict with (unlikely) changes to the value >>>> of BKVASIZE due to "factor". >>> >>> The presence of BKVASIZE in 'factor' is a bug. My version never had this >>> bug (see below for a patch). The scaling should be to maximize nbuf, >>> subject to non-arbitrary limits on physical memory and kva, and now an >>> arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf. Your new >>> limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE. >>> >>> Expanding BKVASIZE should expand kva use, but on i386 this will soon >>> hit a non-arbitary kva limit so nbuf will not be as high as preferred. >>> nbuf needs to be very large mainly to support file systems with small >>> buffers. Even 100000 only gives 50MB of buffering if the fs block >>> size is 512. This would shrink to only 12.5MB if BKVASIZE is expanded >>> by a factor of 4 and the bug is not fixed. If 25000 buffers after >>> expanding BKVASIZE is enough, then that should be the arbitary limit >>> (independent of BKVASIZE) so as to save physical memory. >>> >>> On i386 systems with 1GB RAM, nbuf defaults to about 7000. With an >>> fs block size of 512, that can buffer 3.5MB. Expanding BKVASIZE by a >>> factor of 4 shrinks this to 0.875MB in -current. That is ridiculously >>> small. VMIO limits the lossage from this. >>> >>> BKVASIZE was originally 8KB. I forget if nbuf was halved by not modifying >>> the scale factor when it was expanded to 16KB. Probably not. I used to >>> modify the scale factor to get twice as many as the default nbuf, but >>> once the default nbuf expanded to a few thousand it became large enough >>> for most purposes so I no longer do this. >>> >>> Except on arches with extremely limit kva like i386, KVASIZE should be >>> MAXBSIZE, and all of the complications for expanding a buffer's kva >>> beyond BKVASIZE and handling the fragmentation problems caused by this >>> shouldn't exist. Limits like vfs.maxbufspace should be scaled by >>> NOMBSIZE = current BKVASIZE instead of BKVASIZE. Oops, my NOMBSIZE >>> has similar logical problems to BKVASIZE. It needs to be precisely >>> 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal >>> (ffs) buffer size is now 32KB. So the heuristic scale factors should >>> use the historical magic number 16K. I changed sequential_heuristic() >>> to use a hard-coded 16K instead of BKVASIZE. The correct number here >>> depends on disk technology. >>> >>> The patch has many style fixes: >>> >>> @ Index: vfs_bio.c >>> @ =================================================================== >>> @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v >>> @ retrieving revision 1.436 >>> @ diff -u -2 -r1.436 vfs_bio.c >>> @ --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436 >>> @ +++ vfs_bio.c 3 Jun 2013 16:04:34 -0000 >>> @ @@ -419,64 +493,54 @@ >>> @ @ /* >>> @ - * Calculating buffer cache scaling values and reserve space for buffer >>> @ + * Calculate buffer cache scaling values and reserve space for buffer >>> @ * headers. This is called during low level kernel initialization and >>> @ * may be called more then once. We CANNOT write to the memory area >>> @ * being reserved at this time. >>> @ */ >>> @ -caddr_t >>> @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est) >>> @ +void * >>> @ +vfs_bio_alloc(void *va) >>> >>> The API name was verbose and bogus. The prefix for this subsystem is >>> vfs_bio_, not kern_vfs_bio_. This is a generic allocation routine. >>> It always allocated swbufs and has expanded to do more allocations. >>> >>> @ { >>> @ - /* >>> @ - * physmem_est is in pages. Convert it to kilobytes (assumes >>> @ - * PAGE_SIZE is >= 1K) >>> @ - */ >>> @ - physmem_est = physmem_est * (PAGE_SIZE / 1024); >>> >>> I use the global physmem. This change may be too i386-specific. In >>> with 8-16 RAM, a significant amount of RAM may be eaten by the kernel >>> image or perhaps just unavailable to the kernel. Now memories are >>> larger and the amount eaten is relatively insignificant (since we are >>> only using a small fraction of physmem, only the relative error matters). >>> >>> @ + int factor, memsize; >>> @ @ /* >>> @ - * The nominal buffer size (and minimum KVA allocation) is >>> BKVASIZE. >>> @ - * For the first 64MB of ram nominally allocate sufficient buffers >>> to >>> @ - * cover 1/4 of our ram. Beyond the first 64MB allocate additional >>> @ - * buffers to cover 1/20 of our ram over 64MB. When auto-sizing >>> @ - * the buffer cache we limit the eventual kva reservation to >>> @ + * The nominal buffer size is NOMBSIZE. >>> @ + * For the first 4MB of RAM, allocate 50 buffers. >>> @ + * For the next 60MB of RAM, nominally allocate sufficient buffers >>> to >>> @ + * cover 1/4 of the RAM. Beyond the first 64MB allocate additional >>> @ + * buffers to cover 1/20 of the RAM. When auto-sizing >>> @ + * the buffer cache, limit the eventual kva reservation to >>> @ * maxbcache bytes. >>> >>> Fix old bitrot in this comment, and update for my changes. >>> - the first 4MB wasn't mentioned >>> - the minimum clamp wasn't mentioned >>> - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without >>> changing the default nbuf (subject to maxbcache). >>> >>> The bitrot of the magic number 1/20 is not fixed. The code uses 1/10. >>> This is fixed in -current. I should have noticed this before, since >>> I used to have to change the code to get 1/10 instead of 1/20. >>> >>> I hope you updated the comment for recent changes, but I don't like >>> having magic numbers in comments. >>> >>> @ * >>> @ - * factor represents the 1/4 x ram conversion. >>> @ + * `factor' represents the 1/4 of RAM conversion. >>> @ */ >>> @ +#define NOMBSIZE imax(PAGE_SIZE, 16384) /* XXX */ >>> @ if (nbuf == 0) { >>> @ - int factor = 4 * BKVASIZE / 1024; >>> @ + /* >>> @ + * physmem is in pages. Convert it to a size in kilobytes. >>> @ + * XXX no ptob(). >>> @ + */ >>> @ + memsize = ctob(physmem) / 1024; >>> @ @ + factor = 4 * NOMBSIZE / 1024; >>> @ nbuf = 50; >>> @ - if (physmem_est > 4096) >>> @ - nbuf += min((physmem_est - 4096) / factor, >>> @ - 65536 / factor); >>> @ - if (physmem_est > 65536) >>> @ - nbuf += (physmem_est - 65536) * 2 / (factor * 5); >>> @ - >>> @ - if (maxbcache && nbuf > maxbcache / BKVASIZE) >>> @ - nbuf = maxbcache / BKVASIZE; >>> >>> The maxbcache limit must be applied to the non-default nbuf too. >>> >>> @ + if (memsize > 4096) >>> @ + nbuf += imin((memsize - 4096) / factor, >>> @ + (65536 - 4096) / factor); >>> @ + if (memsize > 65536) >>> @ + nbuf += (memsize - 65536) * 2 / (factor * 5); >>> @ } >>> @ - >>> @ -#if 0 >>> @ - /* >>> @ - * Do not allow the buffer_map to be more then 1/2 the size of the >>> @ - * kernel_map. >>> @ - */ >>> @ - if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / @ - >>> (BKVASIZE * 2)) { >>> @ - nbuf = (kernel_map->max_offset - kernel_map->min_offset) / @ - >>> (BKVASIZE * 2); >>> @ - printf("Warning: nbufs capped at %d\n", nbuf); >>> @ - } >>> @ -#endif >>> >>> Already removed in -current. >>> >>> @ + if (nbuf > maxbcache / BKVASIZE) >>> @ + nbuf = maxbcache / BKVASIZE; >>> @ @ /* >>> @ * swbufs are used as temporary holders for I/O, such as paging >>> I/O. >>> @ - * We have no less then 16 and no more then 256. >>> @ + * There must be between 16 and 256 of them. >>> @ */ >>> @ - nswbuf = max(min(nbuf/4, 256), 16); >>> @ + nswbuf = imax(imin(nbuf / 4, 256), 16); >>> @ #ifdef NSWBUF_MIN >>> @ if (nswbuf < NSWBUF_MIN) >>> @ nswbuf = NSWBUF_MIN; >>> @ #endif >>> @ + >>> @ #ifdef DIRECTIO >>> @ ffs_rawread_setup(); >>> @ @@ -484,12 +548,12 @@ >>> @ @ /* >>> @ - * Reserve space for the buffer cache buffers >>> @ + * Reserve space for swbuf headers and buffer cache buffers. >>> @ */ >>> @ - swbuf = (void *)v; >>> @ - v = (caddr_t)(swbuf + nswbuf); >>> @ - buf = (void *)v; >>> @ - v = (caddr_t)(buf + nbuf); >>> @ + swbuf = va; >>> @ + va = swbuf + nswbuf; >>> @ + buf = va; >>> @ + va = buf + nbuf; >>> @ @ - return(v); >>> @ + return (va); >>> @ } >>> @ >>> >>> Don't use the caddr_t mistake. The API was also changed to not use it. >>> >>> Bruce >>>