Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 02 Feb 2024 21:19:41 -0500
From:      "Drew Gallatin" <gallatin@freebsd.org>
To:        "Rick Macklem" <rick.macklem@gmail.com>
Cc:        "Richard Scheffenegger" <rscheff@freebsd.org>, "freebsd-net@FreeBSD.org" <freebsd-net@freebsd.org>, "FreeBSD Transport" <freebsd-transport@freebsd.org>, rmacklem@freebsd.org, kp@freebsd.org
Subject:   Re: Increasing TCP TSO size support
Message-ID:  <2fac0ac3-ba3a-4bca-b0d4-fafb0c5b75fd@app.fastmail.com>
In-Reply-To:  <CAM5tNy7pSDGQK-JzceB1S-nX1xy8dz5j5m_jwXt5uwr7WN-q0w@mail.gmail.com>
References:  <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> <CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ@mail.gmail.com> <e5df5725-ac9c-4e88-ade5-b0a561bfacd6@app.fastmail.com> <CAM5tNy7pSDGQK-JzceB1S-nX1xy8dz5j5m_jwXt5uwr7WN-q0w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--40239dadffc4465dbc528566ad3b21da
Content-Type: text/plain



On Fri, Feb 2, 2024, at 9:05 PM, Rick Macklem wrote:
> > But the page size is only 4K on most platforms.  So while an M_EXTPGS mbuf can hold 5 pages (..from memory, too lazy to do the math right now) and reduces socket buffer mbuf chain lengths by a factor of 10 or so (2k vs 20k per mbuf), the S/G list that a NIC will need to consume would likely decrease only by a factor of 2.  And even then only if the busdma code to map mbufs for DMA is not coalescing adjacent mbufs.  I know busdma does some coalescing, but I can't recall if it coalesces physcally adjacent mbufs.
> 
> I'm guessing the factor of 2 comes from the fact that each page is a
> contiguous segment?

Actually, no, I'm being dumb.  I was thinking that pages would be split up, but that's wrong.  Without M_EXTPGS, each mbuf generated by sendfile (or nfs) would be an M_EXT with a wrapper around a single 4K page.  So the scatter/gather list would be exactly the same.

The win would be if the pages themselves were contiguous (which they often are), and if the bus_dma mbuf mapping code coalesced those segments, and if the device could handle DMA across a 4K boundary.  That's what would get you shorter s/g lists.

I think tcp_m_copy() can handle this now, as if_hw_tsomaxsegsize is set by the driver to express how long the max contiguous segment they can handle is.

BTW, I really hate the mixing of bus dma restrictions with the hw_tsomax stuff.  It always makes my head explode..

Drew

--40239dadffc4465dbc528566ad3b21da
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso=
Normal,p.MsoNoSpacing{margin:0}</style></head><body><div><br></div><div>=
<br></div><div>On Fri, Feb 2, 2024, at 9:05 PM, Rick Macklem wrote:<br><=
/div><blockquote type=3D"cite" id=3D"qt" style=3D""><div>&gt; But the pa=
ge size is only 4K on most platforms.&nbsp; So while an M_EXTPGS mbuf ca=
n hold 5 pages (..from memory, too lazy to do the math right now) and re=
duces socket buffer mbuf chain lengths by a factor of 10 or so (2k vs 20=
k per mbuf), the S/G list that a NIC will need to consume would likely d=
ecrease only by a factor of 2.&nbsp; And even then only if the busdma co=
de to map mbufs for DMA is not coalescing adjacent mbufs.&nbsp; I know b=
usdma does some coalescing, but I can't recall if it coalesces physcally=
 adjacent mbufs.<br></div><div><br></div><div>I'm guessing the factor of=
 2 comes from the fact that each page is a<br></div><div>contiguous segm=
ent?<br></div></blockquote><div><br></div><div>Actually, no, I'm being d=
umb.&nbsp; I was thinking that pages would be split up, but that's wrong=
.&nbsp; Without M_EXTPGS, each mbuf generated by sendfile (or nfs) would=
 be an M_EXT with a wrapper around a single 4K page.&nbsp; So the scatte=
r/gather list would be exactly the same.<br></div><div><br></div><div>Th=
e win would be if the pages themselves were contiguous (which they often=
 are), and if the bus_dma mbuf mapping code coalesced those segments, an=
d if the device could handle DMA across a 4K boundary.&nbsp; That's what=
 would get you shorter s/g lists.<br><br></div><div>I think tcp_m_copy()=
 can handle this now, as if_hw_tsomaxsegsize is set by the driver to exp=
ress how long the max contiguous segment they can handle is.<br></div><d=
iv><br></div><div>BTW, I really hate the mixing of bus dma restrictions =
with the hw_tsomax stuff.&nbsp; It always makes my head explode..<br></d=
iv><div><br></div><div>Drew<br></div><div><br></div></body></html>
--40239dadffc4465dbc528566ad3b21da--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2fac0ac3-ba3a-4bca-b0d4-fafb0c5b75fd>