Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Nov 2020 18:23:29 -0700
From:      Scott Long <scottl@samsco.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: MAXPHYS bump for FreeBSD 13
Message-ID:  <926C3A98-03BF-46FD-9B22-9EFBDC0F44A4@samsco.org>
In-Reply-To: <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com>
References:  <CANCZdfrG7_F28GfGq05qdA8RG=7X0v%2BHr-dNuJCYX7zgkPDfNQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I have mixed feelings on this.  The Netflix workload isn=E2=80=99t =
typical, and this
change represents a fairly substantial increase in memory usage for
bufs.  It=E2=80=99s also a config tunable, so it=E2=80=99s not like this =
represents a meaningful
diff reduction for Netflix.

The upside is that it will likely help benchmarks out of the box.  Is =
that
enough of an upside for the downsides of memory pressure on small memory
and high iops systems?  I=E2=80=99m not convinced.  I really would like =
to see the
years of talk about fixing this correctly put into action.

Scott


> On Nov 13, 2020, at 11:33 AM, Warner Losh <imp@bsdimp.com> wrote:
>=20
> Greetings,
>=20
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os =
that
> we normally use (though there are exceptions).
>=20
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.
>=20
> 128k was good back in the 90s/2000s when memory was smaller, drives =
did
> smaller I/Os, etc. Now, however, it doesn't make much sense. Modern =
I/O
> devices can easily do 1MB or more and there's performance benefits =
from
> scheduling larger I/Os.
>=20
> Bumping this will mean larger struct buf and struct bio. Without some
> concerted effort, it's hard to make this be a sysctl tunable. While =
that's
> desirable, perhaps, it shouldn't gate this bump. The increase in size =
for
> 1MB is modest enough.
>=20
> The NVMe driver currently is limited to 1MB transfers due to =
limitations in
> the NVMe scatter gather lists and a desire to preallocate as much as
> possible up front. Most NVMe drivers have maximum transfer sizes =
between
> 128k and 1MB, with larger being the trend.
>=20
> The mp[rs] drivers can use larger MAXPHYS, though resource limitations =
on
> some cards hamper bumping it beyond about 2MB.
>=20
> The AHCI driver is happy with 1MB and larger sizes.
>=20
> Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too =
large
> even for our needs due to limiting factors in the upper layers making =
it
> hard to schedule I/Os larger than 3-4MB reliably.
>=20
> So this should be a relatively low risk, and high benefit.
>=20
> I don't think other kernel tunables need to change, but I always run =
into
> trouble with runningbufs :)
>=20
> Comments? Anything I forgot?
>=20
> Warner
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to =
"freebsd-arch-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?926C3A98-03BF-46FD-9B22-9EFBDC0F44A4>