Date: Thu, 21 Dec 2023 09:53:05 -0700 From: Warner Losh <imp@bsdimp.com> To: Mark Johnston <markj@freebsd.org> Cc: Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>, "<dev-commits-src-main@freebsd.org>" <dev-commits-src-main@freebsd.org> Subject: Re: git: 9e6d11ce9a51 - main - vtnet: Adjust rx buffer so IP header 32-bit aligned Message-ID: <CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q@mail.gmail.com> In-Reply-To: <ZYRflMqw11nYhG7l@nuc> References: <202312210421.3BL4Lb9T036317@gitrepo.freebsd.org> <ZYRWH4YEb6eO5hGD@nuc> <CANCZdfqtGww8-6ObNo_rwTvcRJWH8UT%2B-f21tXodM54sqaWLpw@mail.gmail.com> <ZYRflMqw11nYhG7l@nuc>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000b137bf060d07efbc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston <markj@freebsd.org> wro= te: > On Thu, Dec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote: > > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <markj@freebsd.or= g> wrote: > > > > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote: > > > > The branch main has been updated by imp: > > > > > > > > URL: > > > > https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4e= 9188a2ba4 > > > > > > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4 > > > > Author: Warner Losh <imp@FreeBSD.org> > > > > AuthorDate: 2023-12-20 19:09:09 +0000 > > > > Commit: Warner Losh <imp@FreeBSD.org> > > > > CommitDate: 2023-12-21 04:16:45 +0000 > > > > > > > > vtnet: Adjust rx buffer so IP header 32-bit aligned > > > > > > > > Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating > them. > > > > This improves performance everywhere, and allows armv7 to work = at > > > all. > > > > > > > > PR: 271288 (PR had a different fix than I > wound > > > up with) > > > > MFC After: 3 days > > > > Sponsored by: Netflix > > > > Differential Revision: https://reviews.freebsd.org/D43136 > > > > --- > > > > sys/dev/virtio/network/if_vtnet.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/sys/dev/virtio/network/if_vtnet.c > > > b/sys/dev/virtio/network/if_vtnet.c > > > > index 287af7751066..360176e4f845 100644 > > > > --- a/sys/dev/virtio/network/if_vtnet.c > > > > +++ b/sys/dev/virtio/network/if_vtnet.c > > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc *sc, in= t > > > nbufs, struct mbuf **m_tailp) > > > > m_freem(m_head); > > > > return (NULL); > > > > } > > > > - > > > > m->m_len =3D size; > > > > + m_adj(m, ETHER_ALIGN); > > > > > > The driver is expecting to get a cluster of size > sc->vtnet_rx_clustersz, > > > but now it's getting one of size sc->vtnet_rx_clustersz - 2. I don't > > > see how this change can be sufficient on its own: what prevents virti= o > > > from writing sc->vtnet_rx_clustersz bytes and thereby overwriting the > > > two bytes following the cluster? > > > > > > > The only trouble that I saw was here: > > > > /* > > * Every mbuf should have the expected cluster size sin= ce > > that > > * is also used to allocate the replacements. > > */ > > KASSERT(m->m_len =3D=3D clustersz, > > ("%s: mbuf size %d not expected cluster size %d", > > __func__, > > m->m_len, clustersz)); > > > > and even that's fine: We can adjust that assert. The next lines I think > > make it fine: > > m->m_len =3D MIN(m->m_len, len); > > > > so we only use what we say is there to land bytes into the mbuf. I'm n= ow > > not sure though what to do about a few lines later: > > > > m_new =3D vtnet_rx_alloc_buf(sc, nreplace, &m_tail); > > if (m_new =3D=3D NULL) { > > m_prev->m_len =3D clustersz; > > return (ENOBUFS); > > } > > > > which likely needs to be adjusted (or the old saved size). > > I'm not certain about what it should be doing instead, but yes this code > looks wrong now. > Yea. We need to do this at time of use not time of alloc. > I likely didn't see asserts in my testing because I didn't have LRO slow > > path packets. > > > > Do you see other places that would need adjusting? > > Presumably vtnet_rx_cluster_size() needs to be adjusted too? > > I also don't understand why we do this adjustment unconditionally > instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all > other NIC drivers do. > None of the ones I looked at had this... maybe I got unlucky. I'd think we'd have better performance if things are aligned all the way... is there a reason not to? Warner > > > All the other places > > seem to use m_len correctly on the rx path, but I'm not a nic driver gu= y, > > and I might be missing something. > --000000000000b137bf060d07efbc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"gmail_attr">On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston &l= t;<a href=3D"mailto:markj@freebsd.org">markj@freebsd.org</a>> wrote:<br>= </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex">On Thu, Dec 21, 2023 at 08:31:36AM -07= 00, Warner Losh wrote:<br> > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <<a href=3D"m= ailto:markj@freebsd.org" target=3D"_blank" rel=3D"noreferrer">markj@freebsd= .org</a>> wrote:<br> > <br> > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:<br> > > > The branch main has been updated by imp:<br> > > ><br> > > > URL:<br> > > <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51= d75ed6a94e180f2fb4e9188a2ba4" rel=3D"noreferrer noreferrer" target=3D"_blan= k">https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4= e9188a2ba4</a><br> > > ><br> > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4<br> > > > Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;<br> > > > AuthorDate: 2023-12-20 19:09:09 +0000<br> > > > Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;<br> > > > CommitDate: 2023-12-21 04:16:45 +0000<br> > > ><br> > > >=C2=A0 =C2=A0 =C2=A0vtnet: Adjust rx buffer so IP header 32-b= it aligned<br> > > ><br> > > >=C2=A0 =C2=A0 =C2=A0Call madj(m, ETHER_ALIGN) to offset rx bu= ffers when allocating them.<br> > > >=C2=A0 =C2=A0 =C2=A0This improves performance everywhere, and= allows armv7 to work at<br> > > all.<br> > > ><br> > > >=C2=A0 =C2=A0 =C2=A0PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0271288 (PR had a different fix than I= wound<br> > > up with)<br> > > >=C2=A0 =C2=A0 =C2=A0MFC After:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 3 days<br> > > >=C2=A0 =C2=A0 =C2=A0Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Netflix<br> > > >=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 <a href=3D"h= ttps://reviews.freebsd.org/D43136" rel=3D"noreferrer noreferrer" target=3D"= _blank">https://reviews.freebsd.org/D43136</a><br> > > > ---<br> > > >=C2=A0 sys/dev/virtio/network/if_vtnet.c | 2 +-<br> > > >=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)<br> > > ><br> > > > diff --git a/sys/dev/virtio/network/if_vtnet.c<br> > > b/sys/dev/virtio/network/if_vtnet.c<br> > > > index 287af7751066..360176e4f845 100644<br> > > > --- a/sys/dev/virtio/network/if_vtnet.c<br> > > > +++ b/sys/dev/virtio/network/if_vtnet.c<br> > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc = *sc, int<br> > > nbufs, struct mbuf **m_tailp)<br> > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0m_freem(m_head);<br> > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0return (NULL);<br> > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br> > > > -<br> > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->= m_len =3D size;<br> > > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_adj(m, ET= HER_ALIGN);<br> > ><br> > > The driver is expecting to get a cluster of size sc->vtnet_rx_= clustersz,<br> > > but now it's getting one of size sc->vtnet_rx_clustersz - = 2.=C2=A0 I don't<br> > > see how this change can be sufficient on its own: what prevents v= irtio<br> > > from writing sc->vtnet_rx_clustersz bytes and thereby overwrit= ing the<br> > > two bytes following the cluster?<br> > ><br> > <br> > The only trouble that I saw was here:<br> > <br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Every = mbuf should have the expected cluster size since<br> > that<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * is als= o used to allocate the replacements.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0KASSERT(m= ->m_len =3D=3D clustersz,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0("%s: mbuf size %d not expected cluster size %d",<br> > __func__,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0m->m_len, clustersz));<br> > <br> > and even that's fine: We can adjust that assert. The next lines I = think<br> > make it fine:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->m_l= en =3D MIN(m->m_len, len);<br> > <br> > so we only use what we say is there to land bytes into the mbuf.=C2=A0= I'm now<br> > not sure though what to do about a few lines later:<br> > <br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 m_new =3D vtnet_rx_alloc_buf(sc, nreplace, = &m_tail);<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (m_new =3D=3D NULL) {<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_prev-&g= t;m_len =3D clustersz;<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (E= NOBUFS);<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br> > <br> > which likely needs to be adjusted (or the old saved size).<br> <br> I'm not certain about what it should be doing instead, but yes this cod= e<br> looks wrong now.<br></blockquote></div></div><div dir=3D"auto"><br></div><d= iv dir=3D"auto">Yea. We need to do this at time of use not time of alloc.</= div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote= "><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:= 1px #ccc solid;padding-left:1ex"> > I likely didn't see asserts in my testing because I didn't hav= e LRO slow<br> > path packets.<br> > <br> > Do you see other places that would need adjusting?<br> <br> Presumably vtnet_rx_cluster_size() needs to be adjusted too?<br> <br> I also don't understand why we do this adjustment unconditionally<br> instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all<br= > other NIC drivers do.<br></blockquote></div></div><div dir=3D"auto"><br></d= iv><div dir=3D"auto">None of the ones I looked at had this... maybe I got u= nlucky. I'd think we'd have better performance if things are aligne= d all the way... is there a reason not to?</div><div dir=3D"auto"><br></div= ><div dir=3D"auto">Warner</div><div dir=3D"auto"><div class=3D"gmail_quote"= ><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1= px #ccc solid;padding-left:1ex"> <br> > All the other places<br> > seem to use m_len correctly on the rx path, but I'm not a nic driv= er guy,<br> > and I might be missing something.<br> </blockquote></div></div></div> --000000000000b137bf060d07efbc--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrF6OSDqo37xM3hiEpZrRzzfh3kJPA00kBdVj3RExuG9Q>