Date: Fri, 2 Feb 2024 15:13:12 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: "Scheffenegger, Richard" <rscheff@freebsd.org> Cc: "freebsd-net@FreeBSD.org" <freebsd-net@freebsd.org>, FreeBSD Transport <freebsd-transport@freebsd.org>, rmacklem@freebsd.org, gallatin@freebsd.org, kp@freebsd.org Subject: Re: Increasing TCP TSO size support Message-ID: <CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ@mail.gmail.com> In-Reply-To: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> References: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000873cc506106e420e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Feb 2, 2024 at 1:21=E2=80=AFAM Scheffenegger, Richard <rscheff@free= bsd.org> wrote: > > Hi, > > We have run a test for a RPC workload with 1MB IO sizes, and collected th= e > tcp_default_output() len(gth) during the first pass in the output loop. > > In such a scenario, where the application frequently introduces small > pauses (since the next large IO is only sent after the corresponding > request from the client has been received and processed) between sending > additional data, the current TSO limit of 64kB TSO maximum (45*1448 in > effect) requires multiple passes in the output routine to send all the > allowable (cwnd limited) data. > > I'll try to get a data collection with better granulariy above 90 000 > bytes - but even here the average strongly indicates that a majority of > transmission opportunities are in the 512 kB area - probably also having = to > do with LRO and ACK thinning effects by the client. > > With other words, the tcp output has to run about 9 times with TSO, to > transmit all elegible data - increasing the FreeBSD supported maximum TSO > size to what current hardware could handle (256kB..1MB) would reduce the > CPU burden here. > > > Is increasing the sofware supported TSO size to allow for what the NICs > could nowadays do something anyone apart from us would be interested in (= in > particular, those who work with the drivers)? > Reposted after joining freebsd-net@... A factor here is the if_hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request or read reply will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf clusters) are non-contiguous data segments. (I suspect most NICs do not handle this many segments well, if at all.) The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, for the ktls), but I do not know what it would take to make these work for non-KTLS TSO? I do not know how the TSO loop in tcp_output handles M_EXTPG mbufs. Does it assume each M_EXTPG mbuf is one contiguous data segment? I do see that ip_output() will call mb_unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set. (If IFCAP_MEXTPG is set, do the pages need to be contiguous so that it can become a single contiguous data segment for TSO or ???) If TSO and the code beneath it (NIC and maybe mb_unmapped_to_ext() being called) were to all work ok for M_EXTPG mbufs, it would be easy to enable that for NFS (non-TLS case). I do not want to hijack this thread, but do others know how TSO interacts with M_EXTPG mbufs? rick > Best regards, > > Richard > > > > > tso size (transmissions < 1448 would not be accounted here at all) > > # count > > <1000 0 > <2000 23 > <3000 111 > <4000 40 > <5000 30 > <7000 14 > <8000 134 > <9000 442 > <10000 9396 > <20000 46227 > <30000 25646 > <40000 33060 > <60000 23162 > <70000 24368 > <80000 19772 > <90000 40101 > >=3D90000 75384169 > Average: 578844.44 > > CAUTION: This email originated from outside of the University of Guelph. > Do not click links or open attachments unless you recognize the sender an= d > know the content is safe. If in doubt, forward suspicious emails to > IThelp@uoguelph.ca. > > --000000000000873cc506106e420e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_defa= ult" style=3D"font-family:monospace"><br></div></div><br><div class=3D"gmai= l_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Feb 2, 2024 at 1:21= =E2=80=AFAM Scheffenegger, Richard <<a href=3D"mailto:rscheff@freebsd.or= g">rscheff@freebsd.org</a>> wrote:<br></div><blockquote class=3D"gmail_q= uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2= 04);padding-left:1ex"><u></u> =20 =20 =20 <div> <p><br> </p> <p>Hi,</p> <p>We have run a test for a RPC workload with 1MB IO sizes, and collected the tcp_default_output() len(gth) during the first pass in the output loop.</p> <p>In such a scenario, where the application frequently introduces small pauses (since the next large IO is only sent after the corresponding request from the client has been received and processed) between sending additional data, the current TSO limit of 64kB TSO maximum (45*1448 in effect) requires multiple passes in the output routine to send all the allowable (cwnd limited) data.<br> </p> <p>I'll try to get a data collection with better granulariy above 9= 0 000 bytes - but even here the average strongly indicates that a majority of transmission opportunities are in the 512 kB area - probably also having to do with LRO and ACK thinning effects by the client.<br> </p> <p>With other words, the tcp output has to run about 9 times with TSO, to transmit all elegible data - increasing the FreeBSD supported maximum TSO size to what current hardware could handle (256kB..1MB) would reduce the CPU burden here.</p> <p><br> </p> <p>Is increasing the sofware supported TSO size to allow for what the NICs could nowadays do something anyone apart from us would be interested in (in particular, those who work with the drivers)?</p></= div></blockquote><div><span class=3D"gmail_default" style=3D"font-family:mo= nospace">Reposted after joining freebsd-net@...</span></div><div>=C2=A0</di= v><div>=C2=A0<span style=3D"font-family:monospace">A factor here is the if_= hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request</span></di= v><div class=3D"gmail_default" style=3D"font-family:monospace">or read repl= y will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf cl= usters)</div><div class=3D"gmail_default" style=3D"font-family:monospace">a= re non-contiguous data segments. (I suspect most NICs do not handle this ma= ny segments well,</div><div class=3D"gmail_default" style=3D"font-family:mo= nospace">if at all.)</div><div class=3D"gmail_default" style=3D"font-family= :monospace"><br></div><div class=3D"gmail_default" style=3D"font-family:mon= ospace">The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, = for the ktls), but I do not</div><div class=3D"gmail_default" style=3D"font= -family:monospace">know what it would take to make these work for non-KTLS = TSO?</div><div class=3D"gmail_default" style=3D"font-family:monospace">I do= not know how the TSO loop in tcp_output handles M_EXTPG mbufs.</div><div c= lass=3D"gmail_default" style=3D"font-family:monospace">Does it assume each = M_EXTPG mbuf is one contiguous data segment?</div><div class=3D"gmail_defau= lt" style=3D"font-family:monospace">I do see that ip_output() will call mb_= unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set.</div><div cl= ass=3D"gmail_default" style=3D"font-family:monospace">(If IFCAP_MEXTPG is s= et, do the pages need to be contiguous so that it can become</div><div clas= s=3D"gmail_default" style=3D"font-family:monospace">a single contiguous dat= a segment for TSO or ???)</div><div class=3D"gmail_default" style=3D"font-f= amily:monospace"><br></div><div class=3D"gmail_default" style=3D"font-famil= y:monospace">If TSO and the code beneath it (NIC and maybe mb_unmapped_to_e= xt() being called) were to</div><div class=3D"gmail_default" style=3D"font-= family:monospace">all work ok for M_EXTPG mbufs, it would be easy to enable= that for NFS (non-TLS case).</div><div class=3D"gmail_default" style=3D"fo= nt-family:monospace"><br></div><div class=3D"gmail_default" style=3D"font-f= amily:monospace">I do not want to hijack this thread, but do others know ho= w TSO interacts with M_EXTPG</div><div class=3D"gmail_default" style=3D"fon= t-family:monospace">mbufs?</div><div class=3D"gmail_default" style=3D"font-= family:monospace"><br></div><div class=3D"gmail_default" style=3D"font-fami= ly:monospace">rick</div><div class=3D"gmail_default" style=3D"font-family:m= onospace"><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0= px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div> <p><br> </p> <p>Best regards,</p> <p>=C2=A0 Richard</p> <p><br> </p> <p><br> </p> <p><br> </p> <p>tso size (transmissions < 1448 would not be accounted here at all)</p> <p>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 # count<br> </p> <p> </p> <table width=3D"128" cellspacing=3D"0" cellpadding=3D"0" border=3D"0"> <colgroup><col style=3D"width:48pt" width=3D"64" span=3D"2"> </colgro= up><tbody> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt;width:48pt" width=3D"64" height=3D"19"= ><1000</td> <td style=3D"width:48pt" width=3D"64" align=3D"right">0</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><2000</td> <td align=3D"right">23</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><3000</td> <td align=3D"right">111</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><4000</td> <td align=3D"right">40</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><5000</td> <td align=3D"right">30</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><7000</td> <td align=3D"right">14</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><8000</td> <td align=3D"right">134</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><9000</td> <td align=3D"right">442</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><10000</td> <td align=3D"right">9396</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><20000</td> <td align=3D"right">46227</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><30000</td> <td align=3D"right">25646</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><40000</td> <td align=3D"right">33060</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><60000</td> <td align=3D"right">23162</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><70000</td> <td align=3D"right">24368</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><80000</td> <td align=3D"right">19772</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19"><90000</td> <td align=3D"right">40101</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19">>=3D90000</td> <td align=3D"right">75384169</td> </tr> <tr style=3D"height:14.4pt" height=3D"19"> <td style=3D"height:14.4pt" height=3D"19">Average:</td> <td>578844.44</td> </tr> </tbody> </table> </div> <p></p><div style=3D"background-color:rgb(255,235,156);width:100%;border-st= yle:solid;border-color:rgb(156,101,0);border-width:1pt;padding:2pt;font-siz= e:10pt;line-height:12pt;font-family:Calibri;color:black;text-align:left"><s= pan style=3D"font-weight:bold">CAUTION:</span> This email originated from o= utside of the University of Guelph. Do not click links or open attachments = unless you recognize the sender and know the content is safe. If in doubt, = forward suspicious emails to <a href=3D"mailto:IThelp@uoguelph.ca" target= =3D"_blank">IThelp@uoguelph.ca</a>.</div><br><p></p></blockquote></div></di= v></div> --000000000000873cc506106e420e--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ>