Date: Wed, 5 Jun 2013 10:55:00 -0700 From: Adrian Chadd <adrian@freebsd.org> To: Alfred Perlstein <bright@mu.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au> Subject: Re: svn commit: r251282 - head/sys/kern Message-ID: <CAJ-VmonMN-HS-VorKqmxN1zfyh5NBvVZiZXb3ZuKbz7h=qwA1A@mail.gmail.com> In-Reply-To: <51AF2AE8.4080308@mu.org> References: <201306030416.r534GmCA001872@svn.freebsd.org> <51AC1B49.9090001@mu.org> <20130603075539.GK3047@kib.kiev.ua> <51AC60CA.6050105@mu.org> <20130604052219.GP3047@kib.kiev.ua> <20130604170410.M1018@besplex.bde.org> <51AF2AE8.4080308@mu.org>
next in thread | previous in thread | raw e-mail | index | archive | help
... can people please boot an i386 kernel w/ this stuff in it, with the RAM constrained to something small (say, 128mb), and verify that you can at least start a buildworld (minus clang, of course) without panicing? There's a recent PR (http://www.freebsd.org/cgi/query-pr.cgi?pr=179112) which shows regressions on i386 + small RAM platforms. I know it's a different area, but I'd really appreciate it if people better tested this stuff out before they make i386 (and by extension, any platform with limited KVA and small amounts of RAM) unusable through carelessness. Thanks, Adrian On 5 June 2013 05:11, Alfred Perlstein <bright@mu.org> wrote: > > Konstantin, is there any way to take some of Bruce's feedback into account > to get rid of the hard coded max? > > -Alfred > > > On 6/4/13 1:14 AM, Bruce Evans wrote: >> >> On Tue, 4 Jun 2013, Konstantin Belousov wrote: >> >>> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote: >>>> >>>> On 6/3/13 12:55 AM, Konstantin Belousov wrote: >>>>> >>>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote: >>>>>> >>>>>> Hey Konstaintin, shouldn't this be scaled against the actual amount of >>>>>> KVA we have instead of an arbitrary limit? >>>>> >>>>> The commit changes the buffer cache to scale according to the available >>>>> KVA, making the scaling less dumb. >>>>> >>>>> I do not understand what exactly do you want to do, please describe the >>>>> algorithm you propose to implement instead of my change. >>>> >>>> >>>> Sure, how about deriving the hardcoded "32" from the maxkva a machine >>>> can have? >>>> >>>> Is that possible? >>> >>> I do not see why this would be useful. Initially I thought about simply >>> capping nbuf at 100000 without referencing any "memory". Then I realized >>> that this would somewhat conflict with (unlikely) changes to the value >>> of BKVASIZE due to "factor". >> >> >> The presence of BKVASIZE in 'factor' is a bug. My version never had this >> bug (see below for a patch). The scaling should be to maximize nbuf, >> subject to non-arbitrary limits on physical memory and kva, and now an >> arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf. Your new >> limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE. >> >> Expanding BKVASIZE should expand kva use, but on i386 this will soon >> hit a non-arbitary kva limit so nbuf will not be as high as preferred. >> nbuf needs to be very large mainly to support file systems with small >> buffers. Even 100000 only gives 50MB of buffering if the fs block >> size is 512. This would shrink to only 12.5MB if BKVASIZE is expanded >> by a factor of 4 and the bug is not fixed. If 25000 buffers after >> expanding BKVASIZE is enough, then that should be the arbitary limit >> (independent of BKVASIZE) so as to save physical memory. >> >> On i386 systems with 1GB RAM, nbuf defaults to about 7000. With an >> fs block size of 512, that can buffer 3.5MB. Expanding BKVASIZE by a >> factor of 4 shrinks this to 0.875MB in -current. That is ridiculously >> small. VMIO limits the lossage from this. >> >> BKVASIZE was originally 8KB. I forget if nbuf was halved by not modifying >> the scale factor when it was expanded to 16KB. Probably not. I used to >> modify the scale factor to get twice as many as the default nbuf, but >> once the default nbuf expanded to a few thousand it became large enough >> for most purposes so I no longer do this. >> >> Except on arches with extremely limit kva like i386, KVASIZE should be >> MAXBSIZE, and all of the complications for expanding a buffer's kva >> beyond BKVASIZE and handling the fragmentation problems caused by this >> shouldn't exist. Limits like vfs.maxbufspace should be scaled by >> NOMBSIZE = current BKVASIZE instead of BKVASIZE. Oops, my NOMBSIZE >> has similar logical problems to BKVASIZE. It needs to be precisely >> 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal >> (ffs) buffer size is now 32KB. So the heuristic scale factors should >> use the historical magic number 16K. I changed sequential_heuristic() >> to use a hard-coded 16K instead of BKVASIZE. The correct number here >> depends on disk technology. >> >> The patch has many style fixes: >> >> @ Index: vfs_bio.c >> @ =================================================================== >> @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v >> @ retrieving revision 1.436 >> @ diff -u -2 -r1.436 vfs_bio.c >> @ --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436 >> @ +++ vfs_bio.c 3 Jun 2013 16:04:34 -0000 >> @ @@ -419,64 +493,54 @@ >> @ @ /* >> @ - * Calculating buffer cache scaling values and reserve space for buffer >> @ + * Calculate buffer cache scaling values and reserve space for buffer >> @ * headers. This is called during low level kernel initialization and >> @ * may be called more then once. We CANNOT write to the memory area >> @ * being reserved at this time. >> @ */ >> @ -caddr_t >> @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est) >> @ +void * >> @ +vfs_bio_alloc(void *va) >> >> The API name was verbose and bogus. The prefix for this subsystem is >> vfs_bio_, not kern_vfs_bio_. This is a generic allocation routine. >> It always allocated swbufs and has expanded to do more allocations. >> >> @ { >> @ - /* >> @ - * physmem_est is in pages. Convert it to kilobytes (assumes >> @ - * PAGE_SIZE is >= 1K) >> @ - */ >> @ - physmem_est = physmem_est * (PAGE_SIZE / 1024); >> >> I use the global physmem. This change may be too i386-specific. In >> with 8-16 RAM, a significant amount of RAM may be eaten by the kernel >> image or perhaps just unavailable to the kernel. Now memories are >> larger and the amount eaten is relatively insignificant (since we are >> only using a small fraction of physmem, only the relative error matters). >> >> @ + int factor, memsize; >> @ @ /* >> @ - * The nominal buffer size (and minimum KVA allocation) is >> BKVASIZE. >> @ - * For the first 64MB of ram nominally allocate sufficient buffers >> to >> @ - * cover 1/4 of our ram. Beyond the first 64MB allocate additional >> @ - * buffers to cover 1/20 of our ram over 64MB. When auto-sizing >> @ - * the buffer cache we limit the eventual kva reservation to >> @ + * The nominal buffer size is NOMBSIZE. >> @ + * For the first 4MB of RAM, allocate 50 buffers. >> @ + * For the next 60MB of RAM, nominally allocate sufficient buffers >> to >> @ + * cover 1/4 of the RAM. Beyond the first 64MB allocate additional >> @ + * buffers to cover 1/20 of the RAM. When auto-sizing >> @ + * the buffer cache, limit the eventual kva reservation to >> @ * maxbcache bytes. >> >> Fix old bitrot in this comment, and update for my changes. >> - the first 4MB wasn't mentioned >> - the minimum clamp wasn't mentioned >> - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without >> changing the default nbuf (subject to maxbcache). >> >> The bitrot of the magic number 1/20 is not fixed. The code uses 1/10. >> This is fixed in -current. I should have noticed this before, since >> I used to have to change the code to get 1/10 instead of 1/20. >> >> I hope you updated the comment for recent changes, but I don't like >> having magic numbers in comments. >> >> @ * >> @ - * factor represents the 1/4 x ram conversion. >> @ + * `factor' represents the 1/4 of RAM conversion. >> @ */ >> @ +#define NOMBSIZE imax(PAGE_SIZE, 16384) /* XXX */ >> @ if (nbuf == 0) { >> @ - int factor = 4 * BKVASIZE / 1024; >> @ + /* >> @ + * physmem is in pages. Convert it to a size in kilobytes. >> @ + * XXX no ptob(). >> @ + */ >> @ + memsize = ctob(physmem) / 1024; >> @ @ + factor = 4 * NOMBSIZE / 1024; >> @ nbuf = 50; >> @ - if (physmem_est > 4096) >> @ - nbuf += min((physmem_est - 4096) / factor, >> @ - 65536 / factor); >> @ - if (physmem_est > 65536) >> @ - nbuf += (physmem_est - 65536) * 2 / (factor * 5); >> @ - >> @ - if (maxbcache && nbuf > maxbcache / BKVASIZE) >> @ - nbuf = maxbcache / BKVASIZE; >> >> The maxbcache limit must be applied to the non-default nbuf too. >> >> @ + if (memsize > 4096) >> @ + nbuf += imin((memsize - 4096) / factor, >> @ + (65536 - 4096) / factor); >> @ + if (memsize > 65536) >> @ + nbuf += (memsize - 65536) * 2 / (factor * 5); >> @ } >> @ - >> @ -#if 0 >> @ - /* >> @ - * Do not allow the buffer_map to be more then 1/2 the size of the >> @ - * kernel_map. >> @ - */ >> @ - if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / @ - >> (BKVASIZE * 2)) { >> @ - nbuf = (kernel_map->max_offset - kernel_map->min_offset) / @ - >> (BKVASIZE * 2); >> @ - printf("Warning: nbufs capped at %d\n", nbuf); >> @ - } >> @ -#endif >> >> Already removed in -current. >> >> @ + if (nbuf > maxbcache / BKVASIZE) >> @ + nbuf = maxbcache / BKVASIZE; >> @ @ /* >> @ * swbufs are used as temporary holders for I/O, such as paging >> I/O. >> @ - * We have no less then 16 and no more then 256. >> @ + * There must be between 16 and 256 of them. >> @ */ >> @ - nswbuf = max(min(nbuf/4, 256), 16); >> @ + nswbuf = imax(imin(nbuf / 4, 256), 16); >> @ #ifdef NSWBUF_MIN >> @ if (nswbuf < NSWBUF_MIN) >> @ nswbuf = NSWBUF_MIN; >> @ #endif >> @ + >> @ #ifdef DIRECTIO >> @ ffs_rawread_setup(); >> @ @@ -484,12 +548,12 @@ >> @ @ /* >> @ - * Reserve space for the buffer cache buffers >> @ + * Reserve space for swbuf headers and buffer cache buffers. >> @ */ >> @ - swbuf = (void *)v; >> @ - v = (caddr_t)(swbuf + nswbuf); >> @ - buf = (void *)v; >> @ - v = (caddr_t)(buf + nbuf); >> @ + swbuf = va; >> @ + va = swbuf + nswbuf; >> @ + buf = va; >> @ + va = buf + nbuf; >> @ @ - return(v); >> @ + return (va); >> @ } >> @ >> >> Don't use the caddr_t mistake. The API was also changed to not use it. >> >> Bruce >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonMN-HS-VorKqmxN1zfyh5NBvVZiZXb3ZuKbz7h=qwA1A>