Date: Fri, 13 Nov 2020 21:09:37 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Warner Losh <imp@bsdimp.com> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <X67Z8at0taFmy8nf@kib.kiev.ua> In-Reply-To: <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com> References: <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 13, 2020 at 11:33:30AM -0700, Warner Losh wrote: > Greetings, > > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that > we normally use (though there are exceptions). > > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. > > 128k was good back in the 90s/2000s when memory was smaller, drives did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O > devices can easily do 1MB or more and there's performance benefits from > scheduling larger I/Os. > > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While that's > desirable, perhaps, it shouldn't gate this bump. The increase in size for > 1MB is modest enough. To put the specific numbers, for struct buf it means increase by 1792 bytes. For bio it does not, because it does not embed vm_page_t[] into the structure. Worse, typical struct buf addend for excess vm_page pointers is going to be unused, because normal size of the UFS block is 32K. It is going to be only used by clusters and physbufs. So I object against bumping this value without reworking buffers handling of b_pages[]. Most straightforward approach is stop using MAXPHYS to size this array, and use external array for clusters. Pbufs can embed large array. > > The NVMe driver currently is limited to 1MB transfers due to limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes between > 128k and 1MB, with larger being the trend. > > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on > some cards hamper bumping it beyond about 2MB. > > The AHCI driver is happy with 1MB and larger sizes. > > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large > even for our needs due to limiting factors in the upper layers making it > hard to schedule I/Os larger than 3-4MB reliably. > > So this should be a relatively low risk, and high benefit. > > I don't think other kernel tunables need to change, but I always run into > trouble with runningbufs :) > > Comments? Anything I forgot? > > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X67Z8at0taFmy8nf>