From owner-svn-src-head@FreeBSD.ORG  Wed Jun  5 12:11:36 2013
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 61C38FB9;
 Wed,  5 Jun 2013 12:11:36 +0000 (UTC) (envelope-from bright@mu.org)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id 40ADC145C;
 Wed,  5 Jun 2013 12:11:36 +0000 (UTC)
Received: from Alfreds-MacBook-Pro-9.local
 (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218])
 by elvis.mu.org (Postfix) with ESMTPSA id F29B91A3C20;
 Wed,  5 Jun 2013 05:11:25 -0700 (PDT)
Message-ID: <51AF2AE8.4080308@mu.org>
Date: Wed, 05 Jun 2013 05:11:20 -0700
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: svn commit: r251282 - head/sys/kern
References: <201306030416.r534GmCA001872@svn.freebsd.org>
 <51AC1B49.9090001@mu.org> <20130603075539.GK3047@kib.kiev.ua>
 <51AC60CA.6050105@mu.org> <20130604052219.GP3047@kib.kiev.ua>
 <20130604170410.M1018@besplex.bde.org>
In-Reply-To: <20130604170410.M1018@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Konstantin Belousov <kostikbel@gmail.com>, svn-src-head@freebsd.org,
 svn-src-all@freebsd.org, src-committers@freebsd.org
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Jun 2013 12:11:36 -0000


Konstantin, is there any way to take some of Bruce's feedback into 
account to get rid of the hard coded max?

-Alfred

On 6/4/13 1:14 AM, Bruce Evans wrote:
> On Tue, 4 Jun 2013, Konstantin Belousov wrote:
>
>> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote:
>>> On 6/3/13 12:55 AM, Konstantin Belousov wrote:
>>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote:
>>>>> Hey Konstaintin, shouldn't this be scaled against the actual 
>>>>> amount of
>>>>> KVA we have instead of an arbitrary limit?
>>>> The commit changes the buffer cache to scale according to the 
>>>> available
>>>> KVA, making the scaling less dumb.
>>>>
>>>> I do not understand what exactly do you want to do, please describe 
>>>> the
>>>> algorithm you propose to implement instead of my change.
>>>
>>> Sure, how about deriving the hardcoded "32" from the maxkva a machine
>>> can have?
>>>
>>> Is that possible?
>> I do not see why this would be useful. Initially I thought about simply
>> capping nbuf at 100000 without referencing any "memory". Then I realized
>> that this would somewhat conflict with (unlikely) changes to the value
>> of BKVASIZE due to "factor".
>
> The presence of BKVASIZE in 'factor' is a bug.  My version never had this
> bug (see below for a patch).  The scaling should be to maximize nbuf,
> subject to non-arbitrary limits on physical memory and kva, and now an
> arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf.  Your new
> limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE.
>
> Expanding BKVASIZE should expand kva use, but on i386 this will soon
> hit a non-arbitary kva limit so nbuf will not be as high as preferred.
> nbuf needs to be very large mainly to support file systems with small
> buffers.  Even 100000 only gives 50MB of buffering if the fs block
> size is 512.  This would shrink to only 12.5MB if BKVASIZE is expanded
> by a factor of 4 and the bug is not fixed.  If 25000 buffers after
> expanding BKVASIZE is enough, then that should be the arbitary limit
> (independent of BKVASIZE) so as to save physical memory.
>
> On i386 systems with 1GB RAM, nbuf defaults to about 7000.  With an
> fs block size of 512, that can buffer 3.5MB.  Expanding BKVASIZE by a
> factor of 4 shrinks this to 0.875MB in -current.  That is ridiculously
> small.  VMIO limits the lossage from this.
>
> BKVASIZE was originally 8KB.  I forget if nbuf was halved by not 
> modifying
> the scale factor when it was expanded to 16KB.  Probably not.  I used to
> modify the scale factor to get twice as many as the default nbuf, but
> once the default nbuf expanded to a few thousand it became large enough
> for most purposes so I no longer do this.
>
> Except on arches with extremely limit kva like i386, KVASIZE should be
> MAXBSIZE, and all of the complications for expanding a buffer's kva
> beyond BKVASIZE and handling the fragmentation problems caused by this
> shouldn't exist.  Limits like vfs.maxbufspace should be scaled by
> NOMBSIZE = current BKVASIZE instead of BKVASIZE.  Oops, my NOMBSIZE
> has similar logical problems to BKVASIZE.  It needs to be precisely
> 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal
> (ffs) buffer size is now 32KB.  So the heuristic scale factors should
> use the historical magic number 16K.  I changed sequential_heuristic()
> to use a hard-coded 16K instead of BKVASIZE.  The correct number here
> depends on disk technology.
>
> The patch has many style fixes:
>
> @ Index: vfs_bio.c
> @ ===================================================================
> @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
> @ retrieving revision 1.436
> @ diff -u -2 -r1.436 vfs_bio.c
> @ --- vfs_bio.c    17 Jun 2004 17:16:49 -0000    1.436
> @ +++ vfs_bio.c    3 Jun 2013 16:04:34 -0000
> @ @@ -419,64 +493,54 @@
> @ @  /*
> @ - * Calculating buffer cache scaling values and reserve space for 
> buffer
> @ + * Calculate buffer cache scaling values and reserve space for buffer
> @   * headers.  This is called during low level kernel initialization and
> @   * may be called more then once.  We CANNOT write to the memory area
> @   * being reserved at this time.
> @   */
> @ -caddr_t
> @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est)
> @ +void *
> @ +vfs_bio_alloc(void *va)
>
> The API name was verbose and bogus.  The prefix for this subsystem is
> vfs_bio_, not kern_vfs_bio_.  This is a generic allocation routine.
> It always allocated swbufs and has expanded to do more allocations.
>
> @  {
> @ -    /*
> @ -     * physmem_est is in pages.  Convert it to kilobytes (assumes
> @ -     * PAGE_SIZE is >= 1K)
> @ -     */
> @ -    physmem_est = physmem_est * (PAGE_SIZE / 1024);
>
> I use the global physmem.  This change may be too i386-specific. In
> with 8-16 RAM, a significant amount of RAM may be eaten by the kernel
> image or perhaps just unavailable to the kernel.  Now memories are
> larger and the amount eaten is relatively insignificant (since we are
> only using a small fraction of physmem, only the relative error matters).
>
> @ +    int factor, memsize;
> @ @      /*
> @ -     * The nominal buffer size (and minimum KVA allocation) is 
> BKVASIZE.
> @ -     * For the first 64MB of ram nominally allocate sufficient 
> buffers to
> @ -     * cover 1/4 of our ram.  Beyond the first 64MB allocate 
> additional
> @ -     * buffers to cover 1/20 of our ram over 64MB.  When auto-sizing
> @ -     * the buffer cache we limit the eventual kva reservation to
> @ +     * The nominal buffer size is NOMBSIZE.
> @ +     * For the first 4MB of RAM, allocate 50 buffers.
> @ +     * For the next 60MB of RAM, nominally allocate sufficient 
> buffers to
> @ +     * cover 1/4 of the RAM.  Beyond the first 64MB allocate 
> additional
> @ +     * buffers to cover 1/20 of the RAM.  When auto-sizing
> @ +     * the buffer cache, limit the eventual kva reservation to
> @       * maxbcache bytes.
>
> Fix old bitrot in this comment, and update for my changes.
> - the first 4MB wasn't mentioned
> - the minimum clamp wasn't mentioned
> - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without
>   changing the default nbuf (subject to maxbcache).
>
> The bitrot of the magic number 1/20 is not fixed.  The code uses 1/10.
> This is fixed in -current.  I should have noticed this before, since
> I used to have to change the code to get 1/10 instead of 1/20.
>
> I hope you updated the comment for recent changes, but I don't like
> having magic numbers in comments.
>
> @       *
> @ -     * factor represents the 1/4 x ram conversion.
> @ +     * `factor' represents the 1/4 of RAM conversion.
> @       */
> @ +#define    NOMBSIZE    imax(PAGE_SIZE, 16384)    /* XXX */
> @      if (nbuf == 0) {
> @ -        int factor = 4 * BKVASIZE / 1024;
> @ +        /*
> @ +         * physmem is in pages.  Convert it to a size in kilobytes.
> @ +         * XXX no ptob().
> @ +         */
> @ +        memsize = ctob(physmem) / 1024;
> @ @ +        factor = 4 * NOMBSIZE / 1024;
> @          nbuf = 50;
> @ -        if (physmem_est > 4096)
> @ -            nbuf += min((physmem_est - 4096) / factor,
> @ -                65536 / factor);
> @ -        if (physmem_est > 65536)
> @ -            nbuf += (physmem_est - 65536) * 2 / (factor * 5);
> @ -
> @ -        if (maxbcache && nbuf > maxbcache / BKVASIZE)
> @ -            nbuf = maxbcache / BKVASIZE;
>
> The maxbcache limit must be applied to the non-default nbuf too.
>
> @ +        if (memsize > 4096)
> @ +            nbuf += imin((memsize - 4096) / factor,
> @ +                (65536 - 4096) / factor);
> @ +        if (memsize > 65536)
> @ +            nbuf += (memsize - 65536) * 2 / (factor * 5);
> @      }
> @ -
> @ -#if 0
> @ -    /*
> @ -     * Do not allow the buffer_map to be more then 1/2 the size of the
> @ -     * kernel_map.
> @ -     */
> @ -    if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) / 
> @ -        (BKVASIZE * 2)) {
> @ -        nbuf = (kernel_map->max_offset - kernel_map->min_offset) / 
> @ -            (BKVASIZE * 2);
> @ -        printf("Warning: nbufs capped at %d\n", nbuf);
> @ -    }
> @ -#endif
>
> Already removed in -current.
>
> @ +    if (nbuf > maxbcache / BKVASIZE)
> @ +        nbuf = maxbcache / BKVASIZE;
> @ @      /*
> @       * swbufs are used as temporary holders for I/O, such as paging 
> I/O.
> @ -     * We have no less then 16 and no more then 256.
> @ +     * There must be between 16 and 256 of them.
> @       */
> @ -    nswbuf = max(min(nbuf/4, 256), 16);
> @ +    nswbuf = imax(imin(nbuf / 4, 256), 16);
> @  #ifdef NSWBUF_MIN
> @      if (nswbuf < NSWBUF_MIN)
> @          nswbuf = NSWBUF_MIN;
> @  #endif
> @ +
> @  #ifdef DIRECTIO
> @      ffs_rawread_setup();
> @ @@ -484,12 +548,12 @@
> @ @      /*
> @ -     * Reserve space for the buffer cache buffers
> @ +     * Reserve space for swbuf headers and buffer cache buffers.
> @       */
> @ -    swbuf = (void *)v;
> @ -    v = (caddr_t)(swbuf + nswbuf);
> @ -    buf = (void *)v;
> @ -    v = (caddr_t)(buf + nbuf);
> @ +    swbuf = va;
> @ +    va = swbuf + nswbuf;
> @ +    buf = va;
> @ +    va = buf + nbuf;
> @ @ -    return(v);
> @ +    return (va);
> @  }
> @
>
> Don't use the caddr_t mistake.  The API was also changed to not use it.
>
> Bruce
>