Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Nov 2020 19:16:30 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Scott Long <scottl@samsco.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: MAXPHYS bump for FreeBSD 13
Message-ID:  <CANCZdfo23vbCOhMJrekw0GNntcyf54rh8V_jxHKrfjEycrYApw@mail.gmail.com>
In-Reply-To: <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org>
References:  <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com> <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 13, 2020 at 6:23 PM Scott Long <scottl@samsco.org> wrote:

> I have mixed feelings on this.  The Netflix workload isn=E2=80=99t typica=
l, and
> this
> change represents a fairly substantial increase in memory usage for
> bufs.  It=E2=80=99s also a config tunable, so it=E2=80=99s not like this =
represents a
> meaningful
> diff reduction for Netflix.
>

This isn't motivated at all by Netflix's work load nor any needs to
minimize diffs at all. In fact, Netflix had nothing to do with the proposal
apart from me writing it up.

This is motivated more by the needs of more people to do larger I/Os than
128k, though maybe 1MB is too large. Alexander Motin proposed it today
during the Vendor Summit and I wrote up the idea for arch@.

The upside is that it will likely help benchmarks out of the box.  Is that
> enough of an upside for the downsides of memory pressure on small memory
> and high iops systems?  I=E2=80=99m not convinced.  I really would like t=
o see the
> years of talk about fixing this correctly put into action.
>

I'd love years of inaction to end too. I'd also like FreeBSD to perform a
bit better out of the box. Would your calculation have changed had the size
been 256k or 512k? Both those options use/waste substantially fewer bytes
per I/O than 1MB.

Warner


> Scott
>
>
> > On Nov 13, 2020, at 11:33 AM, Warner Losh <imp@bsdimp.com> wrote:
> >
> > Greetings,
> >
> > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
> that
> > we normally use (though there are exceptions).
> >
> > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> > DFLTPHYS to 1MB.
> >
> > 128k was good back in the 90s/2000s when memory was smaller, drives did
> > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> > devices can easily do 1MB or more and there's performance benefits from
> > scheduling larger I/Os.
> >
> > Bumping this will mean larger struct buf and struct bio. Without some
> > concerted effort, it's hard to make this be a sysctl tunable. While
> that's
> > desirable, perhaps, it shouldn't gate this bump. The increase in size f=
or
> > 1MB is modest enough.
> >
> > The NVMe driver currently is limited to 1MB transfers due to limitation=
s
> in
> > the NVMe scatter gather lists and a desire to preallocate as much as
> > possible up front. Most NVMe drivers have maximum transfer sizes betwee=
n
> > 128k and 1MB, with larger being the trend.
> >
> > The mp[rs] drivers can use larger MAXPHYS, though resource limitations =
on
> > some cards hamper bumping it beyond about 2MB.
> >
> > The AHCI driver is happy with 1MB and larger sizes.
> >
> > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too
> large
> > even for our needs due to limiting factors in the upper layers making i=
t
> > hard to schedule I/Os larger than 3-4MB reliably.
> >
> > So this should be a relatively low risk, and high benefit.
> >
> > I don't think other kernel tunables need to change, but I always run in=
to
> > trouble with runningbufs :)
> >
> > Comments? Anything I forgot?
> >
> > Warner
> > _______________________________________________
> > freebsd-arch@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfo23vbCOhMJrekw0GNntcyf54rh8V_jxHKrfjEycrYApw>