Date: Fri, 13 Nov 2020 18:23:29 -0700 From: Scott Long <scottl@samsco.org> To: Warner Losh <imp@bsdimp.com> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: MAXPHYS bump for FreeBSD 13 Message-ID: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org> In-Reply-To: <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com> References: <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I have mixed feelings on this. The Netflix workload isn=E2=80=99t = typical, and this change represents a fairly substantial increase in memory usage for bufs. It=E2=80=99s also a config tunable, so it=E2=80=99s not like this = represents a meaningful diff reduction for Netflix. The upside is that it will likely help benchmarks out of the box. Is = that enough of an upside for the downsides of memory pressure on small memory and high iops systems? I=E2=80=99m not convinced. I really would like = to see the years of talk about fixing this correctly put into action. Scott > On Nov 13, 2020, at 11:33 AM, Warner Losh <imp@bsdimp.com> wrote: >=20 > Greetings, >=20 > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os = that > we normally use (though there are exceptions). >=20 > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping > DFLTPHYS to 1MB. >=20 > 128k was good back in the 90s/2000s when memory was smaller, drives = did > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern = I/O > devices can easily do 1MB or more and there's performance benefits = from > scheduling larger I/Os. >=20 > Bumping this will mean larger struct buf and struct bio. Without some > concerted effort, it's hard to make this be a sysctl tunable. While = that's > desirable, perhaps, it shouldn't gate this bump. The increase in size = for > 1MB is modest enough. >=20 > The NVMe driver currently is limited to 1MB transfers due to = limitations in > the NVMe scatter gather lists and a desire to preallocate as much as > possible up front. Most NVMe drivers have maximum transfer sizes = between > 128k and 1MB, with larger being the trend. >=20 > The mp[rs] drivers can use larger MAXPHYS, though resource limitations = on > some cards hamper bumping it beyond about 2MB. >=20 > The AHCI driver is happy with 1MB and larger sizes. >=20 > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too = large > even for our needs due to limiting factors in the upper layers making = it > hard to schedule I/Os larger than 3-4MB reliably. >=20 > So this should be a relatively low risk, and high benefit. >=20 > I don't think other kernel tunables need to change, but I always run = into > trouble with runningbufs :) >=20 > Comments? Anything I forgot? >=20 > Warner > _______________________________________________ > freebsd-arch@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?926C3A98-03BF-46FD-9B22-9EFBDC0F44A4>