Date: Sat, 14 Nov 2020 21:43:00 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Scott Long <scottl@samsco.org> Cc: Alexander Motin <mav@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <X7AzRMZ5FSRpzXqo@kib.kiev.ua> In-Reply-To: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org> References: <aac44f21-3a6d-c697-bb52-1a669b11aa3b@FreeBSD.org> <X7Aj0a6ZtIQfBscC@kib.kiev.ua> <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote: > > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <kostikbel@gmail.com> wrote: > > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote: > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote: > >>> To put the specific numbers, for struct buf it means increase by 1792 > >>> bytes. For bio it does not, because it does not embed vm_page_t[] into > >>> the structure. > >>> > >>> Worse, typical struct buf addend for excess vm_page pointers is going > >>> to be unused, because normal size of the UFS block is 32K. It is > >>> going to be only used by clusters and physbufs. > >>> > >>> So I object against bumping this value without reworking buffers > >>> handling of b_pages[]. Most straightforward approach is stop using > >>> MAXPHYS to size this array, and use external array for clusters. > >>> Pbufs can embed large array. > >> > >> I am not very familiar with struct buf usage, so I'd appreciate some > >> help there. > >> > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages > >> array of any size in pbuf_init, that should easily satisfy all of pbuf > >> descendants. Cluster and vnode/swap pagers code are pbuf descendants > >> also. Vnode pager I guess may only need replacement for > >> nitems(bp->b_pages) in few places. > > I planned to look at making MAXPHYS a tunable. > > > > You are right, we would need: > > 1. move b_pages to the end of struct buf and declaring it as flexible. > > This would make KBI worse because struct buf depends on some debugging > > options, and than b_pages offset depends on config. > > > > Another option could be to change b_pages to pointer, if we are fine with > > one more indirection. But in my plan, real array is always allocated past > > struct buf, so flexible array is more correct even. > > > > I like this, and I was in the middle of writing up an email that described it. > There could be multiple malloc types or UMA zones of different sizes, > depending on the intended i/o size, or just a runtime change to the size of > a single allocation size. I do not think we need new/many zones. Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated from pbuf_zone. That should be fixed alloc size, with small b_pages[] for buf_zone, and large (MAXPHYS) for pbuf. Everything else, if any, would need to pre-calculate malloc size. > > > 2. Preallocating both normal bufs and pbufs together with the arrays. > > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate > > that buffer has 'small' b_pages. All buffers rotated through getnewbuf()/ > > buf_alloc() should have it set. > > > > This would work nicely with a variable sized allocator, yes. > > > 4. There could be some places which either malloc() or allocate struct buf > > on stack (I tend to believe that I converted all later places to formed). > > They would need to get special handling. > > > > I couldn’t find any places that allocated a buf on the stack or embedded it > into another structure. As I said, I did a pass to eliminate stack allocations for bufs. As result, for instance flushbufqueues() mallocs struct buf, but it does not use b_pages[] of the allocated sentinel. > > > md(4) uses pbufs. > > > > 4. My larger concern is, in fact, cam and drivers. > > > > Can you describe your concern? My opinion is that during this work all uses of MAXPHYS should be reviewed, and there are a lot of drivers that reference the constant. From the past experience, I expect some evil ingenious (ab)use. Same for bufs, but apart from application of pbufs in cam_periph.c, I do not think drivers have much use of it. > > >> > >> Could you or somebody help with vfs/ffs code, where I suppose the > >> smaller page lists are used? > > Do you plan to work on this ? I can help, sure. > > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as > > DFLPHYS), a tunable, in the scope of this work. > > Sounds great, thank you for looking at it. > > Scott >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X7AzRMZ5FSRpzXqo>