Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 2024 10:14:41 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Cc:        Warner Losh <imp@freebsd.org>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 3be59adbb5a2 - main - vtnet: Adjust for ethernet alignment.
Message-ID:  <CANCZdfr_PQKH_9hvqDi=wZgR13WC6NwQ07OKdX_1pq7XrGVgfw@mail.gmail.com>
In-Reply-To: <n4s5849r-4q46-3628-qq82-p50q3698172n@yvfgf.mnoonqbm.arg>
References:  <202401290514.40T5Eb1i061789@gitrepo.freebsd.org> <n4s5849r-4q46-3628-qq82-p50q3698172n@yvfgf.mnoonqbm.arg>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000ae2099061018c823
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 29, 2024 at 8:26=E2=80=AFAM Bjoern A. Zeeb <
bzeeb-lists@lists.zabbadoz.net> wrote:

> On Mon, 29 Jan 2024, Warner Losh wrote:
>
> > The branch main has been updated by imp:
> >
> > URL:
> https://cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2ae7600d46432d3bc822=
86e507e95
> >
> > commit 3be59adbb5a2ae7600d46432d3bc82286e507e95
> > Author:     Warner Losh <imp@FreeBSD.org>
> > AuthorDate: 2024-01-29 05:08:55 +0000
> > Commit:     Warner Losh <imp@FreeBSD.org>
> > CommitDate: 2024-01-29 05:08:55 +0000
> >
> >    vtnet: Adjust for ethernet alignment.
> >
> >    If the header that we add to the packet's size is 0 % 4 and we're
> >    strictly aligning, then we need to adjust where we store the header =
so
> >    the packet that follows will have it's struct ip header properly
> >    aligned.  We do this on allocation (and when we check the length of
> the
> >    mbufs in the lro_nomrg case). We can't just adjust the clustersz in
> the
> >    softc, because it's also used to allocate the mbufs and it needs to =
be
> >    the proper size for that. Since we otherwise use the size of the mbu=
f
> >    (or sometimes the smaller size of the received packet) to compute ho=
w
> >    much we can buffer, this ensures no overflows. The 2 byte adjustment
> >    also does not affect how many packets we can receive in the lro_nomr=
g
> >    case.
>
>
> Doesn't this still include at least these two un-asserted/un-documented
> asumptions:
>
> (a) mbuf space is large enough to hold 2 extra bytes?  Is this always
>      the case?
>

I was sure I puzzled through all the cases correctly.  We adjust the length
of the available buffer by 2 and offset everything by 2. this work because
all the vtnet header types only have 1 or 2 byte data fields. It keeps us
from
writing too much into the buffer.

However, in vtnet_rx_cluster_size, we don't adjust the frame size before
allocating. So if the mtu + vlan_header. So if the mtu + 12 + 18 is 2047 or
2048
or mtu =3D 2017 or 2018 we'll get it wrong (we don't adjust in the case whe=
re
we use vtnet_rx_header which is 14 bytes). But I think in that case, we'll
"drop
the last two bytes off the end" get it wrong (since we adjust the total
length
of the mbuf space) rather than "overflow two bytes" get it wrong. For that
case, we'd need to add two as I indicated in the comments below.

static int
vtnet_rx_cluster_size(struct vtnet_softc *sc, int mtu)
{
        int framesz;

        if (sc->vtnet_flags & VTNET_FLAG_MRG_RXBUFS)
                return (MJUMPAGESIZE);
        else if (sc->vtnet_flags & VTNET_FLAG_LRO_NOMRG)
                return (MCLBYTES);

        /*
         * Try to scale the receive mbuf cluster size from the MTU. We
         * could also use the VQ size to influence the selected size,
         * but that would only matter for very small queues.
         */
        if (vtnet_modern(sc)) {
                MPASS(sc->vtnet_hdr_size =3D=3D sizeof(struct
virtio_net_hdr_v1));
                framesz =3D sizeof(struct virtio_net_hdr_v1);
        } else
                framesz =3D sizeof(struct vtnet_rx_header);
        framesz +=3D sizeof(struct ether_vlan_header) + mtu;
// XXX if framesz % 4 =3D=3D 2 and we're strict alignment we need to add 2
// XXX or equivalently, if vnet_hdr_size % 4 =3D=3D 0 and ...
        if (framesz <=3D MCLBYTES)
                return (MCLBYTES);
        else if (framesz <=3D MJUMPAGESIZE)
                return (MJUMPAGESIZE);
        else if (framesz <=3D MJUM9BYTES)
                return (MJUM9BYTES);

        /* Sane default; avoid 16KB clusters. */
        return (MCLBYTES);
}

Do you agree? Is this what you are worried about? It's the only hole I
could find
this morning (after going over this a dozen other times trying to get it
right for
the review, and bryanv was happy neither noticed). It also explains why my
tests work: I didn't try to have a weird mtu of 2018 bytes.


> (b) the struct sizes assigned to vtnet_hdr_size are not odd numbers of
>      bytes?  Could add comments or CTASSERTs?
>

True, I'll ctassert the sizes and say we rely on things being even sized
in if_vnetvar.h.

Warner


> >    PR:                     271288
> >    Sponsored by:           Netflix
> >    Reviewed by:            bryanv
> >    Differential Revision:  https://reviews.freebsd.org/D43224
>
> --
> Bjoern A. Zeeb                                                     r15:7
>

--000000000000ae2099061018c823
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan 29, 2024 at 8:26=E2=80=AF=
AM Bjoern A. Zeeb &lt;<a href=3D"mailto:bzeeb-lists@lists.zabbadoz.net">bze=
eb-lists@lists.zabbadoz.net</a>&gt; wrote:<br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex">On Mon, 29 Jan 2024, Warner Losh wrote:<br>
<br>
&gt; The branch main has been updated by imp:<br>
&gt;<br>
&gt; URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2=
ae7600d46432d3bc82286e507e95" rel=3D"noreferrer" target=3D"_blank">https://=
cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2ae7600d46432d3bc82286e507e95<=
/a><br>
&gt;<br>
&gt; commit 3be59adbb5a2ae7600d46432d3bc82286e507e95<br>
&gt; Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
&gt; AuthorDate: 2024-01-29 05:08:55 +0000<br>
&gt; Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
&gt; CommitDate: 2024-01-29 05:08:55 +0000<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 vtnet: Adjust for ethernet alignment.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 If the header that we add to the packet&#39;s size is 0 %=
 4 and we&#39;re<br>
&gt;=C2=A0 =C2=A0 strictly aligning, then we need to adjust where we store =
the header so<br>
&gt;=C2=A0 =C2=A0 the packet that follows will have it&#39;s struct ip head=
er properly<br>
&gt;=C2=A0 =C2=A0 aligned.=C2=A0 We do this on allocation (and when we chec=
k the length of the<br>
&gt;=C2=A0 =C2=A0 mbufs in the lro_nomrg case). We can&#39;t just adjust th=
e clustersz in the<br>
&gt;=C2=A0 =C2=A0 softc, because it&#39;s also used to allocate the mbufs a=
nd it needs to be<br>
&gt;=C2=A0 =C2=A0 the proper size for that. Since we otherwise use the size=
 of the mbuf<br>
&gt;=C2=A0 =C2=A0 (or sometimes the smaller size of the received packet) to=
 compute how<br>
&gt;=C2=A0 =C2=A0 much we can buffer, this ensures no overflows. The 2 byte=
 adjustment<br>
&gt;=C2=A0 =C2=A0 also does not affect how many packets we can receive in t=
he lro_nomrg<br>
&gt;=C2=A0 =C2=A0 case.<br>
<br>
<br>
Doesn&#39;t this still include at least these two un-asserted/un-documented=
 asumptions:<br>
<br>
(a) mbuf space is large enough to hold 2 extra bytes?=C2=A0 Is this always<=
br>
=C2=A0 =C2=A0 =C2=A0the case?<br></blockquote><div><br></div><div>I was sur=
e I puzzled through all the cases correctly.=C2=A0 We adjust the length</di=
v><div>of the available buffer by 2 and offset everything by 2. this work b=
ecause</div><div>all the vtnet header types only have 1 or 2 byte data fiel=
ds. It keeps us from</div><div>writing too much into the buffer.</div><div>=
<br></div><div>However, in vtnet_rx_cluster_size, we don&#39;t adjust the f=
rame size before</div><div>allocating. So if the mtu=C2=A0+ vlan_header. So=
 if the mtu + 12 + 18 is 2047 or 2048</div><div>or mtu =3D 2017 or 2018 we&=
#39;ll get it wrong (we don&#39;t adjust in the case where</div><div>we use=
 vtnet_rx_header which is 14 bytes). But I think in that case, we&#39;ll &q=
uot;drop</div><div>the last two bytes off the end&quot; get it wrong (since=
 we adjust the total length</div><div>of the mbuf space) rather than &quot;=
overflow two bytes&quot; get it wrong. For that</div><div>case, we&#39;d ne=
ed to add two as I indicated in the comments below.</div><div><br></div><di=
v>static int<br></div><div>vtnet_rx_cluster_size(struct vtnet_softc *sc, in=
t mtu)<br>{<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 int framesz;<br><br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 if (sc-&gt;vtnet_flags &amp; VTNET_FLAG_MRG_RXBUFS)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUMPAGESIZ=
E);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (sc-&gt;vtnet_flags &amp; VTNET_=
FLAG_LRO_NOMRG)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
return (MCLBYTES);<br><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0* Try to scale the receive mbuf cluster size from the M=
TU. We<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* could also use the VQ size to=
 influence the selected size,<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* but th=
at would only matter for very small queues.<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0*/<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (vtnet_modern(sc)) {<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MPASS(sc-&gt;vtnet_hdr_siz=
e =3D=3D sizeof(struct virtio_net_hdr_v1));<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 framesz =3D sizeof(struct virtio_net_hdr_v1);<b=
r>=C2=A0 =C2=A0 =C2=A0 =C2=A0 } else<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 framesz =3D sizeof(struct vtnet_rx_header);=C2=A0<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 framesz +=3D sizeof(struct ether_vlan_header) +=
 mtu;<br>// XXX if framesz % 4 =3D=3D 2 and we&#39;re strict alignment we n=
eed to add 2</div><div>// XXX or equivalently, if vnet_hdr_size % 4 =3D=3D =
0 and ...<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (framesz &lt;=3D MCLBYTES)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MCLBYTES);<=
br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (framesz &lt;=3D MJUMPAGESIZE)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUMPAGESIZ=
E);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (framesz &lt;=3D MJUM9BYTES)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUM9BYTES)=
;<br><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Sane default; avoid 16KB clusters. =
*/<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MCLBYTES);<br>}<br></div><div><br=
></div><div>Do you agree? Is this what you are worried about? It&#39;s the =
only hole I could find</div><div>this morning (after going over this a doze=
n other times trying to get it right for</div><div>the review, and bryanv w=
as happy neither noticed). It also explains why my</div><div>tests work: I =
didn&#39;t try to have a weird mtu of 2018 bytes.</div><div>=C2=A0<br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left:1px solid rgb(204,204,204);padding-left:1ex">
(b) the struct sizes assigned to vtnet_hdr_size are not odd numbers of<br>
=C2=A0 =C2=A0 =C2=A0bytes?=C2=A0 Could add comments or CTASSERTs?<br></bloc=
kquote><div><br></div><div>True, I&#39;ll ctassert the sizes and say we rel=
y on things being even sized</div><div>in if_vnetvar.h.</div><div><br></div=
><div>Warner</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">
&gt;=C2=A0 =C2=A0 PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0271288<br>
&gt;=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Net=
flix<br>
&gt;=C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bry=
anv<br>
&gt;=C2=A0 =C2=A0 Differential Revision:=C2=A0 <a href=3D"https://reviews.f=
reebsd.org/D43224" rel=3D"noreferrer" target=3D"_blank">https://reviews.fre=
ebsd.org/D43224</a><br>
<br>
-- <br>
Bjoern A. Zeeb=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0r15:7<br>
</blockquote></div></div>

--000000000000ae2099061018c823--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfr_PQKH_9hvqDi=wZgR13WC6NwQ07OKdX_1pq7XrGVgfw>