Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Dec 2023 14:29:17 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>,  "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>,  "<dev-commits-src-main@freebsd.org>" <dev-commits-src-main@freebsd.org>
Subject:   Re: git: 9e6d11ce9a51 - main - vtnet: Adjust rx buffer so IP header 32-bit aligned
Message-ID:  <CANCZdfpDox1-=fMtc-6Jyow3=_ty6sm6%2BesOtfiyC76Tx-zAug@mail.gmail.com>
In-Reply-To: <CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q@mail.gmail.com>
References:  <202312210421.3BL4Lb9T036317@gitrepo.freebsd.org> <ZYRWH4YEb6eO5hGD@nuc> <CANCZdfqtGww8-6ObNo_rwTvcRJWH8UT%2B-f21tXodM54sqaWLpw@mail.gmail.com> <ZYRflMqw11nYhG7l@nuc> <CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000700aaa060d0bcb6f
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 21, 2023 at 9:53=E2=80=AFAM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston <markj@freebsd.org> w=
rote:
>
>> On Thu, Dec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote:
>> > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <markj@freebsd.o=
rg>
>> wrote:
>> >
>> > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:
>> > > > The branch main has been updated by imp:
>> > > >
>> > > > URL:
>> > >
>> https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4=
e9188a2ba4
>> > > >
>> > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4
>> > > > Author:     Warner Losh <imp@FreeBSD.org>
>> > > > AuthorDate: 2023-12-20 19:09:09 +0000
>> > > > Commit:     Warner Losh <imp@FreeBSD.org>
>> > > > CommitDate: 2023-12-21 04:16:45 +0000
>> > > >
>> > > >     vtnet: Adjust rx buffer so IP header 32-bit aligned
>> > > >
>> > > >     Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating
>> them.
>> > > >     This improves performance everywhere, and allows armv7 to work
>> at
>> > > all.
>> > > >
>> > > >     PR:                     271288 (PR had a different fix than I
>> wound
>> > > up with)
>> > > >     MFC After:              3 days
>> > > >     Sponsored by:           Netflix
>> > > >     Differential Revision:  https://reviews.freebsd.org/D43136
>> > > > ---
>> > > >  sys/dev/virtio/network/if_vtnet.c | 2 +-
>> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/sys/dev/virtio/network/if_vtnet.c
>> > > b/sys/dev/virtio/network/if_vtnet.c
>> > > > index 287af7751066..360176e4f845 100644
>> > > > --- a/sys/dev/virtio/network/if_vtnet.c
>> > > > +++ b/sys/dev/virtio/network/if_vtnet.c
>> > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc *sc, i=
nt
>> > > nbufs, struct mbuf **m_tailp)
>> > > >                       m_freem(m_head);
>> > > >                       return (NULL);
>> > > >               }
>> > > > -
>> > > >               m->m_len =3D size;
>> > > > +             m_adj(m, ETHER_ALIGN);
>> > >
>> > > The driver is expecting to get a cluster of size
>> sc->vtnet_rx_clustersz,
>> > > but now it's getting one of size sc->vtnet_rx_clustersz - 2.  I don'=
t
>> > > see how this change can be sufficient on its own: what prevents virt=
io
>> > > from writing sc->vtnet_rx_clustersz bytes and thereby overwriting th=
e
>> > > two bytes following the cluster?
>> > >
>> >
>> > The only trouble that I saw was here:
>> >
>> >                 /*
>> >                  * Every mbuf should have the expected cluster size
>> since
>> > that
>> >                  * is also used to allocate the replacements.
>> >                  */
>> >                 KASSERT(m->m_len =3D=3D clustersz,
>> >                     ("%s: mbuf size %d not expected cluster size %d",
>> > __func__,
>> >                     m->m_len, clustersz));
>> >
>> > and even that's fine: We can adjust that assert. The next lines I thin=
k
>> > make it fine:
>> >                 m->m_len =3D MIN(m->m_len, len);
>> >
>> > so we only use what we say is there to land bytes into the mbuf.  I'm
>> now
>> > not sure though what to do about a few lines later:
>> >
>> >        m_new =3D vtnet_rx_alloc_buf(sc, nreplace, &m_tail);
>> >         if (m_new =3D=3D NULL) {
>> >                 m_prev->m_len =3D clustersz;
>> >                 return (ENOBUFS);
>> >         }
>> >
>> > which likely needs to be adjusted (or the old saved size).
>>
>> I'm not certain about what it should be doing instead, but yes this code
>> looks wrong now.
>>
>
> Yea. We need to do this at time of use not time of alloc.
>
> > I likely didn't see asserts in my testing because I didn't have LRO slo=
w
>> > path packets.
>> >
>> > Do you see other places that would need adjusting?
>>
>> Presumably vtnet_rx_cluster_size() needs to be adjusted too?
>>
>> I also don't understand why we do this adjustment unconditionally
>> instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all
>> other NIC drivers do.
>>
>
> None of the ones I looked at had this... maybe I got unlucky. I'd think
> we'd have better performance if things are aligned all the way... is ther=
e
> a reason not to?
>

Yea, I see a lot of other drivers now...  I'll follow the trend. I think it
gives the ability to receive a full jumbo frame if I don't adjust.

https://reviews.freebsd.org/D43161

has the updated patches, including backing out this change and moving it
from alloc time to use time which seems better since we're chaining and
should work for all the cases we're talking about here.

Warner


> Warner
>
>>
>> > All the other places
>> > seem to use m_len correctly on the rx path, but I'm not a nic driver
>> guy,
>> > and I might be missing something.
>>
>

--000000000000700aaa060d0bcb6f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Thu, Dec 21, 2023 at 9:53=E2=80=AF=
AM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com">imp@bsdimp.com</a>&gt;=
 wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D=
"gmail_attr">On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston &lt;<a hre=
f=3D"mailto:markj@freebsd.org" target=3D"_blank">markj@freebsd.org</a>&gt; =
wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0=
px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, D=
ec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote:<br>
&gt; On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston &lt;<a href=3D"m=
ailto:markj@freebsd.org" rel=3D"noreferrer" target=3D"_blank">markj@freebsd=
.org</a>&gt; wrote:<br>
&gt; <br>
&gt; &gt; On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:<br>
&gt; &gt; &gt; The branch main has been updated by imp:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; URL:<br>
&gt; &gt; <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51=
d75ed6a94e180f2fb4e9188a2ba4" rel=3D"noreferrer noreferrer" target=3D"_blan=
k">https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4=
e9188a2ba4</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4<br>
&gt; &gt; &gt; Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&g=
t;<br>
&gt; &gt; &gt; AuthorDate: 2023-12-20 19:09:09 +0000<br>
&gt; &gt; &gt; Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&g=
t;<br>
&gt; &gt; &gt; CommitDate: 2023-12-21 04:16:45 +0000<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0vtnet: Adjust rx buffer so IP header 32-b=
it aligned<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Call madj(m, ETHER_ALIGN) to offset rx bu=
ffers when allocating them.<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0This improves performance everywhere, and=
 allows armv7 to work at<br>
&gt; &gt; all.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0271288 (PR had a different fix than I=
 wound<br>
&gt; &gt; up with)<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0MFC After:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 3 days<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Netflix<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 <a href=3D"h=
ttps://reviews.freebsd.org/D43136" rel=3D"noreferrer noreferrer" target=3D"=
_blank">https://reviews.freebsd.org/D43136</a><br>;
&gt; &gt; &gt; ---<br>
&gt; &gt; &gt;=C2=A0 sys/dev/virtio/network/if_vtnet.c | 2 +-<br>
&gt; &gt; &gt;=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; diff --git a/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; b/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; index 287af7751066..360176e4f845 100644<br>
&gt; &gt; &gt; --- a/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; +++ b/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc =
*sc, int<br>
&gt; &gt; nbufs, struct mbuf **m_tailp)<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0m_freem(m_head);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0return (NULL);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; &gt; &gt; -<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m-&gt;=
m_len =3D size;<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_adj(m, ET=
HER_ALIGN);<br>
&gt; &gt;<br>
&gt; &gt; The driver is expecting to get a cluster of size sc-&gt;vtnet_rx_=
clustersz,<br>
&gt; &gt; but now it&#39;s getting one of size sc-&gt;vtnet_rx_clustersz - =
2.=C2=A0 I don&#39;t<br>
&gt; &gt; see how this change can be sufficient on its own: what prevents v=
irtio<br>
&gt; &gt; from writing sc-&gt;vtnet_rx_clustersz bytes and thereby overwrit=
ing the<br>
&gt; &gt; two bytes following the cluster?<br>
&gt; &gt;<br>
&gt; <br>
&gt; The only trouble that I saw was here:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Every =
mbuf should have the expected cluster size since<br>
&gt; that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * is als=
o used to allocate the replacements.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0KASSERT(m=
-&gt;m_len =3D=3D clustersz,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0(&quot;%s: mbuf size %d not expected cluster size %d&quot;,<br>
&gt; __func__,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0m-&gt;m_len, clustersz));<br>
&gt; <br>
&gt; and even that&#39;s fine: We can adjust that assert. The next lines I =
think<br>
&gt; make it fine:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m-&gt;m_l=
en =3D MIN(m-&gt;m_len, len);<br>
&gt; <br>
&gt; so we only use what we say is there to land bytes into the mbuf.=C2=A0=
 I&#39;m now<br>
&gt; not sure though what to do about a few lines later:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 m_new =3D vtnet_rx_alloc_buf(sc, nreplace, =
&amp;m_tail);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (m_new =3D=3D NULL) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_prev-&g=
t;m_len =3D clustersz;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (E=
NOBUFS);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; <br>
&gt; which likely needs to be adjusted (or the old saved size).<br>
<br>
I&#39;m not certain about what it should be doing instead, but yes this cod=
e<br>
looks wrong now.<br></blockquote></div></div><div dir=3D"auto"><br></div><d=
iv dir=3D"auto">Yea. We need to do this at time of use not time of alloc.</=
div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote=
"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex">
&gt; I likely didn&#39;t see asserts in my testing because I didn&#39;t hav=
e LRO slow<br>
&gt; path packets.<br>
&gt; <br>
&gt; Do you see other places that would need adjusting?<br>
<br>
Presumably vtnet_rx_cluster_size() needs to be adjusted too?<br>
<br>
I also don&#39;t understand why we do this adjustment unconditionally<br>
instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all<br=
>
other NIC drivers do.<br></blockquote></div></div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto">None of the ones I looked at had this... maybe I got u=
nlucky. I&#39;d think we&#39;d have better performance if things are aligne=
d all the way... is there a reason not to?</div></div></blockquote><div><br=
></div><div>Yea, I see a lot of other drivers now...=C2=A0 I&#39;ll follow =
the trend. I think it gives the ability to receive a full jumbo frame if I =
don&#39;t adjust.</div><div><br></div><div><a href=3D"https://reviews.freeb=
sd.org/D43161">https://reviews.freebsd.org/D43161</a><br></div><div><br></d=
iv><div>has the updated patches, including backing out this change and movi=
ng it from alloc time to use time which seems better since we&#39;re chaini=
ng and should work for all the cases we&#39;re talking about here.</div><di=
v><br></div><div>Warner</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto">Warner</div><div d=
ir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex">
<br>
&gt; All the other places<br>
&gt; seem to use m_len correctly on the rx path, but I&#39;m not a nic driv=
er guy,<br>
&gt; and I might be missing something.<br>
</blockquote></div></div></div>
</blockquote></div></div>

--000000000000700aaa060d0bcb6f--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpDox1-=fMtc-6Jyow3=_ty6sm6%2BesOtfiyC76Tx-zAug>