Date: Fri, 02 Feb 2024 17:14:15 -0500 From: "Drew Gallatin" <gallatin@freebsd.org> To: "Richard Scheffenegger" <rscheff@freebsd.org>, "freebsd-net@FreeBSD.org" <freebsd-net@freebsd.org>, "FreeBSD Transport" <freebsd-transport@freebsd.org>, rmacklem@freebsd.org, kp@FreeBSD.org Subject: Re: Increasing TCP TSO size support Message-ID: <95e76a2c-44c8-4fbb-ab45-8bcffe80d4a3@app.fastmail.com> In-Reply-To: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> References: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--16179d38f0f74410b648a6fe3220c987 Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable What is the link speed that you're working with? A long time ago, when I worked for a now-defunct 10GbE NIC vendor, I exp= erimented with the benefits of TSO as we varied the max TSO size. I ca= nnot recall the platform (it could have been OSX, Solaris, FreeBSD or Li= nux). At the time (~2006?) the CPU saving benefits of increasing the ma= x TSO size from 8k to 64k was fairly minimal. In fact, I seem to reca= ll that there was almost no benefit to TSO sizes larger than 16K. I was wondering if you see any difference in your benchmark if you cap m= ax TSO size to 8k, 16k,32k, and the default of 64k. Any change in CPU = use, or in your benchmark's performance would be interesting to hear abo= ut. Naively, I'd expect the benchmark performance to remain unchanged until = you'd reduced the TSO size so much as to make the host slower than the w= ire, thereby inserting gaps between TSOs. That would be reflected in th= e CPU use as well.. Drew On Fri, Feb 2, 2024, at 4:21 AM, Scheffenegger, Richard wrote: >=20 >=20 > Hi, >=20 > We have run a test for a RPC workload with 1MB IO sizes, and collected= the tcp_default_output() len(gth) during the first pass in the output l= oop. >=20 > In such a scenario, where the application frequently introduces small = pauses (since the next large IO is only sent after the corresponding req= uest from the client has been received and processed) between sending ad= ditional data, the current TSO limit of 64kB TSO maximum (45*1448 in eff= ect) requires multiple passes in the output routine to send all the allo= wable (cwnd limited) data. >=20 > I'll try to get a data collection with better granulariy above 90 000 = bytes - but even here the average strongly indicates that a majority of = transmission opportunities are in the 512 kB area - probably also having= to do with LRO and ACK thinning effects by the client. >=20 > With other words, the tcp output has to run about 9 times with TSO, to= transmit all elegible data - increasing the FreeBSD supported maximum T= SO size to what current hardware could handle (256kB..1MB) would reduce = the CPU burden here. >=20 >=20 >=20 > Is increasing the sofware supported TSO size to allow for what the NIC= s could nowadays do something anyone apart from us would be interested i= n (in particular, those who work with the drivers)? >=20 >=20 >=20 > Best regards, >=20 > Richard >=20 >=20 >=20 >=20 >=20 >=20 >=20 > tso size (transmissions < 1448 would not be accounted here at all) >=20 > # count >=20 >=20 >=20 > <1000 > 0 > <2000 > 23 > <3000 > 111 > <4000 > 40 > <5000 > 30 > <7000 > 14 > <8000 > 134 > <9000 > 442 > <10000 > 9396 > <20000 > 46227 > <30000 > 25646 > <40000 > 33060 > <60000 > 23162 > <70000 > 24368 > <80000 > 19772 > <90000 > 40101 > >=3D90000 > 75384169 > Average: > 578844.44 >=20 > *Attachments:* > =E2=80=A2 OpenPGP_0x17BE5899E0B1439B.asc > =E2=80=A2 OpenPGP_signature.asc --16179d38f0f74410b648a6fe3220c987 Content-Type: text/html Content-Transfer-Encoding: quoted-printable <!DOCTYPE html><html><head><title></title><style type=3D"text/css">p.Mso= Normal,p.MsoNoSpacing{margin:0}</style></head><body><div>What is the lin= k speed that you're working with?<br></div><div><br></div><div>A long ti= me ago, when I worked for a now-defunct 10GbE NIC vendor, I experimented= with the benefits of TSO as we varied the max TSO size. I c= annot recall the platform (it could have been OSX, Solaris, FreeBSD or L= inux). At the time (~2006?) the CPU saving benefits of increasing = the max TSO size from 8k to 64k was fairly minimal. In= fact, I seem to recall that there was almost no benefit to TSO sizes la= rger than 16K.<br></div><div><br></div><div>I was wondering if you see a= ny difference in your benchmark if you cap max TSO size to 8k, 16k= ,32k, and the default of 64k. Any change in CPU use, or in your be= nchmark's performance would be interesting to hear about.<br></div><div>= <br></div><div>Naively, I'd expect the benchmark performance to remain u= nchanged until you'd reduced the TSO size so much as to make the host sl= ower than the wire, thereby inserting gaps between TSOs. That woul= d be reflected in the CPU use as well..<br></div><div><br></div><div>Dre= w<br></div><div><br></div><div>On Fri, Feb 2, 2024, at 4:21 AM, Scheffen= egger, Richard wrote:<br></div><blockquote type=3D"cite" id=3D"qt" style= =3D""><p><br></p><p>Hi,<br></p><p>We have run a test for a RPC workload = with 1MB IO sizes, and collected the tcp_default_output() len(gth) during the first pass in the output loop.<br></p><p>In such a scenario, where the applic= ation frequently introduces small pauses (since the next large IO is only sent after the corresponding request from the client has been received and processed) between sending additional data, the current TSO limit of 64kB TSO maximum (45*1448 in effect) requires multiple passes in the output routine to send all the allowable (cwnd limited) data.<br></p><p>I'll try to get a data collection with better gran= ulariy above 90 000 bytes - but even here the average strongly indicates that a majority of transmission opportunities are in the 512 kB area - probably also having to do with LRO and ACK thinning effects by the client.<br></p><p>With other words, the tcp output has to run = about 9 times with TSO, to transmit all elegible data - increasing the FreeBSD supported maximum TSO size to what current hardware could handle (256kB..1MB) would reduce the CPU burden here.<br></p><p><br></p><= p>Is increasing the sofware supported TSO size to allow for what the NICs could nowadays do something anyone apart from us would be interested in (in particular, those who work with the drivers)?<br= ></p><p><br></p><p>Best regards,<br></p><p> Richard<br></p><p><br>= </p><p><br></p><p><br></p><p>tso size (transmissions < 1448 would not= be accounted here at all)<br></p><p> &= nbsp; # count<br></p><p><br></p><t= able width=3D"128" cellspacing=3D"0" cellpadding=3D"0" border=3D"0"><col= group><col style=3D"width:48pt;" width=3D"64" span=3D"2"></colgroup><tbo= dy><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D"height:14.4pt= ;width:48pt;" width=3D"64" height=3D"19"><1000<br></td><td style=3D"w= idth:48pt;" width=3D"64" align=3D"right">0<br></td></tr><tr style=3D"hei= ght:14.4pt;" height=3D"19"><td style=3D"height:14.4pt;" height=3D"19">&l= t;2000<br></td><td align=3D"right">23<br></td></tr><tr style=3D"height:1= 4.4pt;" height=3D"19"><td style=3D"height:14.4pt;" height=3D"19"><300= 0<br></td><td align=3D"right">111<br></td></tr><tr style=3D"height:14.4p= t;" height=3D"19"><td style=3D"height:14.4pt;" height=3D"19"><4000<br= ></td><td align=3D"right">40<br></td></tr><tr style=3D"height:14.4pt;" h= eight=3D"19"><td style=3D"height:14.4pt;" height=3D"19"><5000<br></td= ><td align=3D"right">30<br></td></tr><tr style=3D"height:14.4pt;" height= =3D"19"><td style=3D"height:14.4pt;" height=3D"19"><7000<br></td><td = align=3D"right">14<br></td></tr><tr style=3D"height:14.4pt;" height=3D"1= 9"><td style=3D"height:14.4pt;" height=3D"19"><8000<br></td><td align= =3D"right">134<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><= td style=3D"height:14.4pt;" height=3D"19"><9000<br></td><td align=3D"= right">442<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td s= tyle=3D"height:14.4pt;" height=3D"19"><10000<br></td><td align=3D"rig= ht">9396<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td sty= le=3D"height:14.4pt;" height=3D"19"><20000<br></td><td align=3D"right= ">46227<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td styl= e=3D"height:14.4pt;" height=3D"19"><30000<br></td><td align=3D"right"= >25646<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style= =3D"height:14.4pt;" height=3D"19"><40000<br></td><td align=3D"right">= 33060<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D= "height:14.4pt;" height=3D"19"><60000<br></td><td align=3D"right">231= 62<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D"= height:14.4pt;" height=3D"19"><70000<br></td><td align=3D"right">2436= 8<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D"h= eight:14.4pt;" height=3D"19"><80000<br></td><td align=3D"right">19772= <br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D"he= ight:14.4pt;" height=3D"19"><90000<br></td><td align=3D"right">40101<= br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D"hei= ght:14.4pt;" height=3D"19">>=3D90000<br></td><td align=3D"right">7538= 4169<br></td></tr><tr style=3D"height:14.4pt;" height=3D"19"><td style=3D= "height:14.4pt;" height=3D"19">Average:<br></td><td>578844.44<br></td></= tr></tbody></table><div><br></div><div><b>Attachments:</b><br></div><ul>= <li>OpenPGP_0x17BE5899E0B1439B.asc<br></li><li>OpenPGP_signature.asc<br>= </li></ul></blockquote><div><br></div></body></html> --16179d38f0f74410b648a6fe3220c987--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?95e76a2c-44c8-4fbb-ab45-8bcffe80d4a3>