Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Nov 2020 20:37:05 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: MAXPHYS bump for FreeBSD 13
Message-ID:  <X7Aj0a6ZtIQfBscC@kib.kiev.ua>
In-Reply-To: <aac44f21-3a6d-c697-bb52-1a669b11aa3b@FreeBSD.org>
References:  <aac44f21-3a6d-c697-bb52-1a669b11aa3b@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> > To put the specific numbers, for struct buf it means increase by 1792
> > bytes. For bio it does not, because it does not embed vm_page_t[] into
> > the structure.
> >
> > Worse, typical struct buf addend for excess vm_page pointers is going
> > to be unused, because normal size of the UFS block is 32K.  It is
> > going to be only used by clusters and physbufs.
> >
> > So I object against bumping this value without reworking buffers
> > handling of b_pages[].  Most straightforward approach is stop using
> > MAXPHYS to size this array, and use external array for clusters.
> > Pbufs can embed large array.
> 
> I am not very familiar with struct buf usage, so I'd appreciate some
> help there.
> 
> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> array of any size in pbuf_init, that should easily satisfy all of pbuf
> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> also.  Vnode pager I guess may only need replacement for
> nitems(bp->b_pages) in few places.
I planned to look at making MAXPHYS a tunable.

You are right, we would need:
1. move b_pages to the end of struct buf and declaring it as flexible.
This would make KBI worse because struct buf depends on some debugging
options, and than b_pages offset depends on config.

Another option could be to change b_pages to pointer, if we are fine with
one more indirection.  But in my plan, real array is always allocated past
struct buf, so flexible array is more correct even.

2. Preallocating both normal bufs and pbufs together with the arrays.

3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate
that buffer has 'small' b_pages.  All buffers rotated through getnewbuf()/
buf_alloc() should have it set.

4. There could be some places which either malloc() or allocate struct buf
on stack (I tend to believe that I converted all later places to formed).
They would need to get special handling.

md(4) uses pbufs.

4. My larger concern is, in fact, cam and drivers.

> 
> Could you or somebody help with vfs/ffs code, where I suppose the
> smaller page lists are used?
Do you plan to work on this ?  I can help, sure.

Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
DFLPHYS), a tunable, in the scope of this work.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X7Aj0a6ZtIQfBscC>