Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Dec 2023 09:53:05 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>,  "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>,  "<dev-commits-src-main@freebsd.org>" <dev-commits-src-main@freebsd.org>
Subject:   Re: git: 9e6d11ce9a51 - main - vtnet: Adjust rx buffer so IP header 32-bit aligned
Message-ID:  <CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q@mail.gmail.com>
In-Reply-To: <ZYRflMqw11nYhG7l@nuc>
References:  <202312210421.3BL4Lb9T036317@gitrepo.freebsd.org> <ZYRWH4YEb6eO5hGD@nuc> <CANCZdfqtGww8-6ObNo_rwTvcRJWH8UT%2B-f21tXodM54sqaWLpw@mail.gmail.com> <ZYRflMqw11nYhG7l@nuc>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000b137bf060d07efbc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston <markj@freebsd.org> wro=
te:

> On Thu, Dec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote:
> > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <markj@freebsd.or=
g> wrote:
> >
> > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:
> > > > The branch main has been updated by imp:
> > > >
> > > > URL:
> > >
> https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4e=
9188a2ba4
> > > >
> > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4
> > > > Author:     Warner Losh <imp@FreeBSD.org>
> > > > AuthorDate: 2023-12-20 19:09:09 +0000
> > > > Commit:     Warner Losh <imp@FreeBSD.org>
> > > > CommitDate: 2023-12-21 04:16:45 +0000
> > > >
> > > >     vtnet: Adjust rx buffer so IP header 32-bit aligned
> > > >
> > > >     Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating
> them.
> > > >     This improves performance everywhere, and allows armv7 to work =
at
> > > all.
> > > >
> > > >     PR:                     271288 (PR had a different fix than I
> wound
> > > up with)
> > > >     MFC After:              3 days
> > > >     Sponsored by:           Netflix
> > > >     Differential Revision:  https://reviews.freebsd.org/D43136
> > > > ---
> > > >  sys/dev/virtio/network/if_vtnet.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/sys/dev/virtio/network/if_vtnet.c
> > > b/sys/dev/virtio/network/if_vtnet.c
> > > > index 287af7751066..360176e4f845 100644
> > > > --- a/sys/dev/virtio/network/if_vtnet.c
> > > > +++ b/sys/dev/virtio/network/if_vtnet.c
> > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc *sc, in=
t
> > > nbufs, struct mbuf **m_tailp)
> > > >                       m_freem(m_head);
> > > >                       return (NULL);
> > > >               }
> > > > -
> > > >               m->m_len =3D size;
> > > > +             m_adj(m, ETHER_ALIGN);
> > >
> > > The driver is expecting to get a cluster of size
> sc->vtnet_rx_clustersz,
> > > but now it's getting one of size sc->vtnet_rx_clustersz - 2.  I don't
> > > see how this change can be sufficient on its own: what prevents virti=
o
> > > from writing sc->vtnet_rx_clustersz bytes and thereby overwriting the
> > > two bytes following the cluster?
> > >
> >
> > The only trouble that I saw was here:
> >
> >                 /*
> >                  * Every mbuf should have the expected cluster size sin=
ce
> > that
> >                  * is also used to allocate the replacements.
> >                  */
> >                 KASSERT(m->m_len =3D=3D clustersz,
> >                     ("%s: mbuf size %d not expected cluster size %d",
> > __func__,
> >                     m->m_len, clustersz));
> >
> > and even that's fine: We can adjust that assert. The next lines I think
> > make it fine:
> >                 m->m_len =3D MIN(m->m_len, len);
> >
> > so we only use what we say is there to land bytes into the mbuf.  I'm n=
ow
> > not sure though what to do about a few lines later:
> >
> >        m_new =3D vtnet_rx_alloc_buf(sc, nreplace, &m_tail);
> >         if (m_new =3D=3D NULL) {
> >                 m_prev->m_len =3D clustersz;
> >                 return (ENOBUFS);
> >         }
> >
> > which likely needs to be adjusted (or the old saved size).
>
> I'm not certain about what it should be doing instead, but yes this code
> looks wrong now.
>

Yea. We need to do this at time of use not time of alloc.

> I likely didn't see asserts in my testing because I didn't have LRO slow
> > path packets.
> >
> > Do you see other places that would need adjusting?
>
> Presumably vtnet_rx_cluster_size() needs to be adjusted too?
>
> I also don't understand why we do this adjustment unconditionally
> instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all
> other NIC drivers do.
>

None of the ones I looked at had this... maybe I got unlucky. I'd think
we'd have better performance if things are aligned all the way... is there
a reason not to?

Warner

>
> > All the other places
> > seem to use m_len correctly on the rx path, but I'm not a nic driver gu=
y,
> > and I might be missing something.
>

--000000000000b137bf060d07efbc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston &l=
t;<a href=3D"mailto:markj@freebsd.org">markj@freebsd.org</a>&gt; wrote:<br>=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">On Thu, Dec 21, 2023 at 08:31:36AM -07=
00, Warner Losh wrote:<br>
&gt; On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston &lt;<a href=3D"m=
ailto:markj@freebsd.org" target=3D"_blank" rel=3D"noreferrer">markj@freebsd=
.org</a>&gt; wrote:<br>
&gt; <br>
&gt; &gt; On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:<br>
&gt; &gt; &gt; The branch main has been updated by imp:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; URL:<br>
&gt; &gt; <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51=
d75ed6a94e180f2fb4e9188a2ba4" rel=3D"noreferrer noreferrer" target=3D"_blan=
k">https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4=
e9188a2ba4</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4<br>
&gt; &gt; &gt; Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&g=
t;<br>
&gt; &gt; &gt; AuthorDate: 2023-12-20 19:09:09 +0000<br>
&gt; &gt; &gt; Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&g=
t;<br>
&gt; &gt; &gt; CommitDate: 2023-12-21 04:16:45 +0000<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0vtnet: Adjust rx buffer so IP header 32-b=
it aligned<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Call madj(m, ETHER_ALIGN) to offset rx bu=
ffers when allocating them.<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0This improves performance everywhere, and=
 allows armv7 to work at<br>
&gt; &gt; all.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0271288 (PR had a different fix than I=
 wound<br>
&gt; &gt; up with)<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0MFC After:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 3 days<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Netflix<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 <a href=3D"h=
ttps://reviews.freebsd.org/D43136" rel=3D"noreferrer noreferrer" target=3D"=
_blank">https://reviews.freebsd.org/D43136</a><br>;
&gt; &gt; &gt; ---<br>
&gt; &gt; &gt;=C2=A0 sys/dev/virtio/network/if_vtnet.c | 2 +-<br>
&gt; &gt; &gt;=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; diff --git a/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; b/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; index 287af7751066..360176e4f845 100644<br>
&gt; &gt; &gt; --- a/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; +++ b/sys/dev/virtio/network/if_vtnet.c<br>
&gt; &gt; &gt; @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc =
*sc, int<br>
&gt; &gt; nbufs, struct mbuf **m_tailp)<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0m_freem(m_head);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0return (NULL);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; &gt; &gt; -<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m-&gt;=
m_len =3D size;<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_adj(m, ET=
HER_ALIGN);<br>
&gt; &gt;<br>
&gt; &gt; The driver is expecting to get a cluster of size sc-&gt;vtnet_rx_=
clustersz,<br>
&gt; &gt; but now it&#39;s getting one of size sc-&gt;vtnet_rx_clustersz - =
2.=C2=A0 I don&#39;t<br>
&gt; &gt; see how this change can be sufficient on its own: what prevents v=
irtio<br>
&gt; &gt; from writing sc-&gt;vtnet_rx_clustersz bytes and thereby overwrit=
ing the<br>
&gt; &gt; two bytes following the cluster?<br>
&gt; &gt;<br>
&gt; <br>
&gt; The only trouble that I saw was here:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Every =
mbuf should have the expected cluster size since<br>
&gt; that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * is als=
o used to allocate the replacements.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0KASSERT(m=
-&gt;m_len =3D=3D clustersz,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0(&quot;%s: mbuf size %d not expected cluster size %d&quot;,<br>
&gt; __func__,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0m-&gt;m_len, clustersz));<br>
&gt; <br>
&gt; and even that&#39;s fine: We can adjust that assert. The next lines I =
think<br>
&gt; make it fine:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m-&gt;m_l=
en =3D MIN(m-&gt;m_len, len);<br>
&gt; <br>
&gt; so we only use what we say is there to land bytes into the mbuf.=C2=A0=
 I&#39;m now<br>
&gt; not sure though what to do about a few lines later:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 m_new =3D vtnet_rx_alloc_buf(sc, nreplace, =
&amp;m_tail);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (m_new =3D=3D NULL) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_prev-&g=
t;m_len =3D clustersz;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (E=
NOBUFS);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; <br>
&gt; which likely needs to be adjusted (or the old saved size).<br>
<br>
I&#39;m not certain about what it should be doing instead, but yes this cod=
e<br>
looks wrong now.<br></blockquote></div></div><div dir=3D"auto"><br></div><d=
iv dir=3D"auto">Yea. We need to do this at time of use not time of alloc.</=
div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote=
"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">
&gt; I likely didn&#39;t see asserts in my testing because I didn&#39;t hav=
e LRO slow<br>
&gt; path packets.<br>
&gt; <br>
&gt; Do you see other places that would need adjusting?<br>
<br>
Presumably vtnet_rx_cluster_size() needs to be adjusted too?<br>
<br>
I also don&#39;t understand why we do this adjustment unconditionally<br>
instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all<br=
>
other NIC drivers do.<br></blockquote></div></div><div dir=3D"auto"><br></d=
iv><div dir=3D"auto">None of the ones I looked at had this... maybe I got u=
nlucky. I&#39;d think we&#39;d have better performance if things are aligne=
d all the way... is there a reason not to?</div><div dir=3D"auto"><br></div=
><div dir=3D"auto">Warner</div><div dir=3D"auto"><div class=3D"gmail_quote"=
><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1=
px #ccc solid;padding-left:1ex">
<br>
&gt; All the other places<br>
&gt; seem to use m_len correctly on the rx path, but I&#39;m not a nic driv=
er guy,<br>
&gt; and I might be missing something.<br>
</blockquote></div></div></div>

--000000000000b137bf060d07efbc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q>