Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Feb 2024 18:05:08 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Drew Gallatin <gallatin@freebsd.org>
Cc:        Richard Scheffenegger <rscheff@freebsd.org>,  "freebsd-net@FreeBSD.org" <freebsd-net@freebsd.org>, FreeBSD Transport <freebsd-transport@freebsd.org>,  rmacklem@freebsd.org, kp@freebsd.org
Subject:   Re: Increasing TCP TSO size support
Message-ID:  <CAM5tNy7pSDGQK-JzceB1S-nX1xy8dz5j5m_jwXt5uwr7WN-q0w@mail.gmail.com>
In-Reply-To: <e5df5725-ac9c-4e88-ade5-b0a561bfacd6@app.fastmail.com>
References:  <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> <CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ@mail.gmail.com> <e5df5725-ac9c-4e88-ade5-b0a561bfacd6@app.fastmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Feb 2, 2024 at 4:48=E2=80=AFPM Drew Gallatin <gallatin@freebsd.org>=
 wrote:
>
>
>
> On Fri, Feb 2, 2024, at 6:13 PM, Rick Macklem wrote:
>
>  A factor here is the if_hw_tsomaxsegcount limit. For example, a 1Mbyte N=
FS write request
> or read reply will result in a 514 element mbuf chain. Each of these (mos=
tly 2K mbuf clusters)
> are non-contiguous data segments. (I suspect most NICs do not handle this=
 many segments well,
> if at all.)
>
>
> Excellent point
>
>
> The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, for th=
e ktls), but I do not
> know what it would take to make these work for non-KTLS TSO?
>
>
>
> Sendfile already uses M_EXTPG mbufs... When I was initially doing M_EXTPG=
 stuff for kTLS, I added support for using M_EXTPG mbufs in sendfile regard=
less of whether or not kTLS was in use.  That reduced CPU use marginally on=
 64-bit platforms (due to reducing socket buffer lengths, and hence reducin=
g pointer chasing), and quite a bit more on 32-bit platforms (due to also n=
ot needing to map memory into the kernel map, and by reducing pointer chasi=
ng even more, as more pages fit into an M_EXTPG mbuf when a paddr_t is 32-b=
its.
>
>
> I do not know how the TSO loop in tcp_output handles M_EXTPG mbufs.
> Does it assume each M_EXTPG mbuf is one contiguous data segment?
>
>
> No, its fully aware of how to handle M_EXTPG mbufs.  Look at tcp_m_copy()=
.  We added code in the segment counting part of that function to count the=
 hdr/trailer parts of an M_EXTPG mbuf, and to deal with the start/end page =
being misaligned.
>
> I do see that ip_output() will call mb_unmapped_to_ext() when the NIC doe=
s not have IFCAP_MEXTPG set.
> (If IFCAP_MEXTPG is set, do the pages need to be contiguous so that it ca=
n become
> a single contiguous data segment for TSO or ???)
>
>
> No, it just means that a NIC driver has been verified to call not mtod() =
an M_EXTPGS mbuf and deref the resulting data pointer. (which would make it=
 go "boom").
>
> But the page size is only 4K on most platforms.  So while an M_EXTPGS mbu=
f can hold 5 pages (..from memory, too lazy to do the math right now) and r=
educes socket buffer mbuf chain lengths by a factor of 10 or so (2k vs 20k =
per mbuf), the S/G list that a NIC will need to consume would likely decrea=
se only by a factor of 2.  And even then only if the busdma code to map mbu=
fs for DMA is not coalescing adjacent mbufs.  I know busdma does some coale=
scing, but I can't recall if it coalesces physcally adjacent mbufs.

I'm guessing the factor of 2 comes from the fact that each page is a
contiguous segment?

The NFS code could easily use 5 contiguous pages, so maybe it would be
worthwhile
to try and make some NIC drivers capable of handling contiguous pages
as one segment
for TSO output? (It means that tcp_outpout() would need to know this
case was possible,
Maybe a new if_hw_tsoXX that covers the max number of segments if
pages are contig?)

However, given your previous post, it might not matter much, since the
larger TSO
segment might not make much difference?

>
> If TSO and the code beneath it (NIC and maybe mb_unmapped_to_ext() being =
called) were to
> all work ok for M_EXTPG mbufs, it would be easy to enable that for NFS (n=
on-TLS case).
>
>
>
> It does.  You should enable it for at least TCP.
Good work!!

I will try it someday relatively soon. Even if it only reduces the use
of mbuf clusters,
that sounds like it would be worthwhile.

rick
>
> Drew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy7pSDGQK-JzceB1S-nX1xy8dz5j5m_jwXt5uwr7WN-q0w>