Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Feb 2024 15:13:12 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        "Scheffenegger, Richard" <rscheff@freebsd.org>
Cc:        "freebsd-net@FreeBSD.org" <freebsd-net@freebsd.org>,  FreeBSD Transport <freebsd-transport@freebsd.org>, rmacklem@freebsd.org, gallatin@freebsd.org,  kp@freebsd.org
Subject:   Re: Increasing TCP TSO size support
Message-ID:  <CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ@mail.gmail.com>
In-Reply-To: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org>
References:  <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000873cc506106e420e
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 2, 2024 at 1:21=E2=80=AFAM Scheffenegger, Richard <rscheff@free=
bsd.org>
wrote:

>
> Hi,
>
> We have run a test for a RPC workload with 1MB IO sizes, and collected th=
e
> tcp_default_output() len(gth) during the first pass in the output loop.
>
> In such a scenario, where the application frequently introduces small
> pauses (since the next large IO is only sent after the corresponding
> request from the client has been received and processed) between sending
> additional data, the current TSO limit of 64kB TSO maximum (45*1448 in
> effect) requires multiple passes in the output routine to send all the
> allowable (cwnd limited) data.
>
> I'll try to get a data collection with better granulariy above 90 000
> bytes - but even here the average strongly indicates that a majority of
> transmission opportunities are in the 512 kB area - probably also having =
to
> do with LRO and ACK thinning effects by the client.
>
> With other words, the tcp output has to run about 9 times with TSO, to
> transmit all elegible data - increasing the FreeBSD supported maximum TSO
> size to what current hardware could handle (256kB..1MB) would reduce the
> CPU burden here.
>
>
> Is increasing the sofware supported TSO size to allow for what the NICs
> could nowadays do something anyone apart from us would be interested in (=
in
> particular, those who work with the drivers)?
>
Reposted after joining freebsd-net@...

 A factor here is the if_hw_tsomaxsegcount limit. For example, a 1Mbyte NFS
write request
or read reply will result in a 514 element mbuf chain. Each of these
(mostly 2K mbuf clusters)
are non-contiguous data segments. (I suspect most NICs do not handle this
many segments well,
if at all.)

The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, for the
ktls), but I do not
know what it would take to make these work for non-KTLS TSO?
I do not know how the TSO loop in tcp_output handles M_EXTPG mbufs.
Does it assume each M_EXTPG mbuf is one contiguous data segment?
I do see that ip_output() will call mb_unmapped_to_ext() when the NIC does
not have IFCAP_MEXTPG set.
(If IFCAP_MEXTPG is set, do the pages need to be contiguous so that it can
become
a single contiguous data segment for TSO or ???)

If TSO and the code beneath it (NIC and maybe mb_unmapped_to_ext() being
called) were to
all work ok for M_EXTPG mbufs, it would be easy to enable that for NFS
(non-TLS case).

I do not want to hijack this thread, but do others know how TSO interacts
with M_EXTPG
mbufs?

rick


> Best regards,
>
>   Richard
>
>
>
>
> tso size (transmissions < 1448 would not be accounted here at all)
>
>                     # count
>
> <1000 0
> <2000 23
> <3000 111
> <4000 40
> <5000 30
> <7000 14
> <8000 134
> <9000 442
> <10000 9396
> <20000 46227
> <30000 25646
> <40000 33060
> <60000 23162
> <70000 24368
> <80000 19772
> <90000 40101
> >=3D90000 75384169
> Average: 578844.44
>
> CAUTION: This email originated from outside of the University of Guelph.
> Do not click links or open attachments unless you recognize the sender an=
d
> know the content is safe. If in doubt, forward suspicious emails to
> IThelp@uoguelph.ca.
>
>

--000000000000873cc506106e420e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_defa=
ult" style=3D"font-family:monospace"><br></div></div><br><div class=3D"gmai=
l_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Feb 2, 2024 at 1:21=
=E2=80=AFAM Scheffenegger, Richard &lt;<a href=3D"mailto:rscheff@freebsd.or=
g">rscheff@freebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex"><u></u>

 =20

   =20
 =20
  <div>
    <p><br>
    </p>
    <p>Hi,</p>
    <p>We have run a test for a RPC workload with 1MB IO sizes, and
      collected the tcp_default_output() len(gth) during the first pass
      in the output loop.</p>
    <p>In such a scenario, where the application frequently introduces
      small pauses (since the next large IO is only sent after the
      corresponding request from the client has been received and
      processed) between sending additional data, the current TSO limit
      of 64kB TSO maximum (45*1448 in effect) requires multiple passes
      in the output routine to send all the allowable (cwnd limited)
      data.<br>
    </p>
    <p>I&#39;ll try to get a data collection with better granulariy above 9=
0
      000 bytes - but even here the average strongly indicates that a
      majority of transmission opportunities are in the 512 kB area -
      probably also having to do with LRO and ACK thinning effects by
      the client.<br>
    </p>
    <p>With other words, the tcp output has to run about 9 times with
      TSO, to transmit all elegible data - increasing the FreeBSD
      supported maximum TSO size to what current hardware could handle
      (256kB..1MB) would reduce the CPU burden here.</p>
    <p><br>
    </p>
    <p>Is increasing the sofware supported TSO size to allow for what
      the NICs could nowadays do something anyone apart from us would be
      interested in (in particular, those who work with the drivers)?</p></=
div></blockquote><div><span class=3D"gmail_default" style=3D"font-family:mo=
nospace">Reposted after joining freebsd-net@...</span></div><div>=C2=A0</di=
v><div>=C2=A0<span style=3D"font-family:monospace">A factor here is the if_=
hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request</span></di=
v><div class=3D"gmail_default" style=3D"font-family:monospace">or read repl=
y will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf cl=
usters)</div><div class=3D"gmail_default" style=3D"font-family:monospace">a=
re non-contiguous data segments. (I suspect most NICs do not handle this ma=
ny segments well,</div><div class=3D"gmail_default" style=3D"font-family:mo=
nospace">if at all.)</div><div class=3D"gmail_default" style=3D"font-family=
:monospace"><br></div><div class=3D"gmail_default" style=3D"font-family:mon=
ospace">The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, =
for the ktls), but I do not</div><div class=3D"gmail_default" style=3D"font=
-family:monospace">know what it would take to make these work for non-KTLS =
TSO?</div><div class=3D"gmail_default" style=3D"font-family:monospace">I do=
 not know how the TSO loop in tcp_output handles M_EXTPG mbufs.</div><div c=
lass=3D"gmail_default" style=3D"font-family:monospace">Does it assume each =
M_EXTPG mbuf is one contiguous data segment?</div><div class=3D"gmail_defau=
lt" style=3D"font-family:monospace">I do see that ip_output() will call mb_=
unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set.</div><div cl=
ass=3D"gmail_default" style=3D"font-family:monospace">(If IFCAP_MEXTPG is s=
et, do the pages need to be contiguous so that it can become</div><div clas=
s=3D"gmail_default" style=3D"font-family:monospace">a single contiguous dat=
a segment for TSO or ???)</div><div class=3D"gmail_default" style=3D"font-f=
amily:monospace"><br></div><div class=3D"gmail_default" style=3D"font-famil=
y:monospace">If TSO and the code beneath it (NIC and maybe mb_unmapped_to_e=
xt() being called) were to</div><div class=3D"gmail_default" style=3D"font-=
family:monospace">all work ok for M_EXTPG mbufs, it would be easy to enable=
 that for NFS (non-TLS case).</div><div class=3D"gmail_default" style=3D"fo=
nt-family:monospace"><br></div><div class=3D"gmail_default" style=3D"font-f=
amily:monospace">I do not want to hijack this thread, but do others know ho=
w TSO interacts with M_EXTPG</div><div class=3D"gmail_default" style=3D"fon=
t-family:monospace">mbufs?</div><div class=3D"gmail_default" style=3D"font-=
family:monospace"><br></div><div class=3D"gmail_default" style=3D"font-fami=
ly:monospace">rick</div><div class=3D"gmail_default" style=3D"font-family:m=
onospace"><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0=
px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
    <p><br>
    </p>
    <p>Best regards,</p>
    <p>=C2=A0 Richard</p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p><br>
    </p>
    <p>tso size (transmissions &lt; 1448 would not be accounted here at
      all)</p>
    <p>=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=
=A0=C2=A0 =C2=A0=C2=A0=C2=A0 # count<br>
    </p>
    <p> </p>
    <table width=3D"128" cellspacing=3D"0" cellpadding=3D"0" border=3D"0">
      <colgroup><col style=3D"width:48pt" width=3D"64" span=3D"2"> </colgro=
up><tbody>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt;width:48pt" width=3D"64" height=3D"19"=
>&lt;1000</td>
          <td style=3D"width:48pt" width=3D"64" align=3D"right">0</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;2000</td>
          <td align=3D"right">23</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;3000</td>
          <td align=3D"right">111</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;4000</td>
          <td align=3D"right">40</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;5000</td>
          <td align=3D"right">30</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;7000</td>
          <td align=3D"right">14</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;8000</td>
          <td align=3D"right">134</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;9000</td>
          <td align=3D"right">442</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;10000</td>
          <td align=3D"right">9396</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;20000</td>
          <td align=3D"right">46227</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;30000</td>
          <td align=3D"right">25646</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;40000</td>
          <td align=3D"right">33060</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;60000</td>
          <td align=3D"right">23162</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;70000</td>
          <td align=3D"right">24368</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;80000</td>
          <td align=3D"right">19772</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&lt;90000</td>
          <td align=3D"right">40101</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">&gt;=3D90000</td>
          <td align=3D"right">75384169</td>
        </tr>
        <tr style=3D"height:14.4pt" height=3D"19">
          <td style=3D"height:14.4pt" height=3D"19">Average:</td>
          <td>578844.44</td>
        </tr>
      </tbody>
    </table>
  </div>

<p></p><div style=3D"background-color:rgb(255,235,156);width:100%;border-st=
yle:solid;border-color:rgb(156,101,0);border-width:1pt;padding:2pt;font-siz=
e:10pt;line-height:12pt;font-family:Calibri;color:black;text-align:left"><s=
pan style=3D"font-weight:bold">CAUTION:</span> This email originated from o=
utside of the University of Guelph. Do not click links or open attachments =
unless you recognize the sender and know the content is safe. If in doubt, =
forward suspicious emails to <a href=3D"mailto:IThelp@uoguelph.ca" target=
=3D"_blank">IThelp@uoguelph.ca</a>.</div><br><p></p></blockquote></div></di=
v></div>

--000000000000873cc506106e420e--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy6TbvXqrRRD=XpDBRGk81rzW5k38AzXeKFKLDL01fOYQQ>