Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 2024 14:13:25 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>
Cc:        Warner Losh <imp@freebsd.org>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 3be59adbb5a2 - main - vtnet: Adjust for ethernet alignment.
Message-ID:  <CANCZdfrMbeCBeCA=xuZBapR5rR-ASRtn3e3BCw%2BE8NU2AerzpA@mail.gmail.com>
In-Reply-To: <CANCZdfr_PQKH_9hvqDi=wZgR13WC6NwQ07OKdX_1pq7XrGVgfw@mail.gmail.com>
References:  <202401290514.40T5Eb1i061789@gitrepo.freebsd.org> <n4s5849r-4q46-3628-qq82-p50q3698172n@yvfgf.mnoonqbm.arg> <CANCZdfr_PQKH_9hvqDi=wZgR13WC6NwQ07OKdX_1pq7XrGVgfw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000079bbe206101c1e42
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Sorry for (a) the top post and (b) replying to myself...

https://reviews.freebsd.org/D43656

is what I think you're worried about. Am I right, or is there some place
else that has you uneasy?

Warner

P.S. I'm also thinking about following that up with
https://reviews.freebsd.org/D43654 since that style seems more in fashion
these days.

On Mon, Jan 29, 2024 at 10:14=E2=80=AFAM Warner Losh <imp@bsdimp.com> wrote=
:

>
>
> On Mon, Jan 29, 2024 at 8:26=E2=80=AFAM Bjoern A. Zeeb <
> bzeeb-lists@lists.zabbadoz.net> wrote:
>
>> On Mon, 29 Jan 2024, Warner Losh wrote:
>>
>> > The branch main has been updated by imp:
>> >
>> > URL:
>> https://cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2ae7600d46432d3bc82=
286e507e95
>> >
>> > commit 3be59adbb5a2ae7600d46432d3bc82286e507e95
>> > Author:     Warner Losh <imp@FreeBSD.org>
>> > AuthorDate: 2024-01-29 05:08:55 +0000
>> > Commit:     Warner Losh <imp@FreeBSD.org>
>> > CommitDate: 2024-01-29 05:08:55 +0000
>> >
>> >    vtnet: Adjust for ethernet alignment.
>> >
>> >    If the header that we add to the packet's size is 0 % 4 and we're
>> >    strictly aligning, then we need to adjust where we store the header
>> so
>> >    the packet that follows will have it's struct ip header properly
>> >    aligned.  We do this on allocation (and when we check the length of
>> the
>> >    mbufs in the lro_nomrg case). We can't just adjust the clustersz in
>> the
>> >    softc, because it's also used to allocate the mbufs and it needs to
>> be
>> >    the proper size for that. Since we otherwise use the size of the mb=
uf
>> >    (or sometimes the smaller size of the received packet) to compute h=
ow
>> >    much we can buffer, this ensures no overflows. The 2 byte adjustmen=
t
>> >    also does not affect how many packets we can receive in the lro_nom=
rg
>> >    case.
>>
>>
>> Doesn't this still include at least these two un-asserted/un-documented
>> asumptions:
>>
>> (a) mbuf space is large enough to hold 2 extra bytes?  Is this always
>>      the case?
>>
>
> I was sure I puzzled through all the cases correctly.  We adjust the leng=
th
> of the available buffer by 2 and offset everything by 2. this work becaus=
e
> all the vtnet header types only have 1 or 2 byte data fields. It keeps us
> from
> writing too much into the buffer.
>
> However, in vtnet_rx_cluster_size, we don't adjust the frame size before
> allocating. So if the mtu + vlan_header. So if the mtu + 12 + 18 is 2047
> or 2048
> or mtu =3D 2017 or 2018 we'll get it wrong (we don't adjust in the case w=
here
> we use vtnet_rx_header which is 14 bytes). But I think in that case, we'l=
l
> "drop
> the last two bytes off the end" get it wrong (since we adjust the total
> length
> of the mbuf space) rather than "overflow two bytes" get it wrong. For tha=
t
> case, we'd need to add two as I indicated in the comments below.
>
> static int
> vtnet_rx_cluster_size(struct vtnet_softc *sc, int mtu)
> {
>         int framesz;
>
>         if (sc->vtnet_flags & VTNET_FLAG_MRG_RXBUFS)
>                 return (MJUMPAGESIZE);
>         else if (sc->vtnet_flags & VTNET_FLAG_LRO_NOMRG)
>                 return (MCLBYTES);
>
>         /*
>          * Try to scale the receive mbuf cluster size from the MTU. We
>          * could also use the VQ size to influence the selected size,
>          * but that would only matter for very small queues.
>          */
>         if (vtnet_modern(sc)) {
>                 MPASS(sc->vtnet_hdr_size =3D=3D sizeof(struct
> virtio_net_hdr_v1));
>                 framesz =3D sizeof(struct virtio_net_hdr_v1);
>         } else
>                 framesz =3D sizeof(struct vtnet_rx_header);
>         framesz +=3D sizeof(struct ether_vlan_header) + mtu;
> // XXX if framesz % 4 =3D=3D 2 and we're strict alignment we need to add =
2
> // XXX or equivalently, if vnet_hdr_size % 4 =3D=3D 0 and ...
>         if (framesz <=3D MCLBYTES)
>                 return (MCLBYTES);
>         else if (framesz <=3D MJUMPAGESIZE)
>                 return (MJUMPAGESIZE);
>         else if (framesz <=3D MJUM9BYTES)
>                 return (MJUM9BYTES);
>
>         /* Sane default; avoid 16KB clusters. */
>         return (MCLBYTES);
> }
>
> Do you agree? Is this what you are worried about? It's the only hole I
> could find
> this morning (after going over this a dozen other times trying to get it
> right for
> the review, and bryanv was happy neither noticed). It also explains why m=
y
> tests work: I didn't try to have a weird mtu of 2018 bytes.
>
>
>> (b) the struct sizes assigned to vtnet_hdr_size are not odd numbers of
>>      bytes?  Could add comments or CTASSERTs?
>>
>
> True, I'll ctassert the sizes and say we rely on things being even sized
> in if_vnetvar.h.
>
> Warner
>
>
>> >    PR:                     271288
>> >    Sponsored by:           Netflix
>> >    Reviewed by:            bryanv
>> >    Differential Revision:  https://reviews.freebsd.org/D43224
>>
>> --
>> Bjoern A. Zeeb                                                     r15:7
>>
>

--00000000000079bbe206101c1e42
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Sorry=C2=A0for (a) the top post and (b) replying to myself=
...<div><br></div><div><a href=3D"https://reviews.freebsd.org/D43656">https=
://reviews.freebsd.org/D43656</a><br></div><div><br></div><div>is what I th=
ink you&#39;re worried about. Am I right, or is there some place else that =
has you uneasy?</div><div><br></div><div>Warner</div><div><br></div><div>P.=
S. I&#39;m also thinking about following that up with=C2=A0<a href=3D"https=
://reviews.freebsd.org/D43654">https://reviews.freebsd.org/D43654</a>; since=
 that style seems more in fashion these days.</div></div><br><div class=3D"=
gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan 29, 2024 at =
10:14=E2=80=AFAM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com">imp@bsdi=
mp.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_q=
uote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jan 29, 2024 at 8:26=E2=
=80=AFAM Bjoern A. Zeeb &lt;<a href=3D"mailto:bzeeb-lists@lists.zabbadoz.ne=
t" target=3D"_blank">bzeeb-lists@lists.zabbadoz.net</a>&gt; wrote:<br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 29 Jan 2024, War=
ner Losh wrote:<br>
<br>
&gt; The branch main has been updated by imp:<br>
&gt;<br>
&gt; URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2=
ae7600d46432d3bc82286e507e95" rel=3D"noreferrer" target=3D"_blank">https://=
cgit.FreeBSD.org/src/commit/?id=3D3be59adbb5a2ae7600d46432d3bc82286e507e95<=
/a><br>
&gt;<br>
&gt; commit 3be59adbb5a2ae7600d46432d3bc82286e507e95<br>
&gt; Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
&gt; AuthorDate: 2024-01-29 05:08:55 +0000<br>
&gt; Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt;<br>
&gt; CommitDate: 2024-01-29 05:08:55 +0000<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 vtnet: Adjust for ethernet alignment.<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 If the header that we add to the packet&#39;s size is 0 %=
 4 and we&#39;re<br>
&gt;=C2=A0 =C2=A0 strictly aligning, then we need to adjust where we store =
the header so<br>
&gt;=C2=A0 =C2=A0 the packet that follows will have it&#39;s struct ip head=
er properly<br>
&gt;=C2=A0 =C2=A0 aligned.=C2=A0 We do this on allocation (and when we chec=
k the length of the<br>
&gt;=C2=A0 =C2=A0 mbufs in the lro_nomrg case). We can&#39;t just adjust th=
e clustersz in the<br>
&gt;=C2=A0 =C2=A0 softc, because it&#39;s also used to allocate the mbufs a=
nd it needs to be<br>
&gt;=C2=A0 =C2=A0 the proper size for that. Since we otherwise use the size=
 of the mbuf<br>
&gt;=C2=A0 =C2=A0 (or sometimes the smaller size of the received packet) to=
 compute how<br>
&gt;=C2=A0 =C2=A0 much we can buffer, this ensures no overflows. The 2 byte=
 adjustment<br>
&gt;=C2=A0 =C2=A0 also does not affect how many packets we can receive in t=
he lro_nomrg<br>
&gt;=C2=A0 =C2=A0 case.<br>
<br>
<br>
Doesn&#39;t this still include at least these two un-asserted/un-documented=
 asumptions:<br>
<br>
(a) mbuf space is large enough to hold 2 extra bytes?=C2=A0 Is this always<=
br>
=C2=A0 =C2=A0 =C2=A0the case?<br></blockquote><div><br></div><div>I was sur=
e I puzzled through all the cases correctly.=C2=A0 We adjust the length</di=
v><div>of the available buffer by 2 and offset everything by 2. this work b=
ecause</div><div>all the vtnet header types only have 1 or 2 byte data fiel=
ds. It keeps us from</div><div>writing too much into the buffer.</div><div>=
<br></div><div>However, in vtnet_rx_cluster_size, we don&#39;t adjust the f=
rame size before</div><div>allocating. So if the mtu=C2=A0+ vlan_header. So=
 if the mtu + 12 + 18 is 2047 or 2048</div><div>or mtu =3D 2017 or 2018 we&=
#39;ll get it wrong (we don&#39;t adjust in the case where</div><div>we use=
 vtnet_rx_header which is 14 bytes). But I think in that case, we&#39;ll &q=
uot;drop</div><div>the last two bytes off the end&quot; get it wrong (since=
 we adjust the total length</div><div>of the mbuf space) rather than &quot;=
overflow two bytes&quot; get it wrong. For that</div><div>case, we&#39;d ne=
ed to add two as I indicated in the comments below.</div><div><br></div><di=
v>static int<br></div><div>vtnet_rx_cluster_size(struct vtnet_softc *sc, in=
t mtu)<br>{<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 int framesz;<br><br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 if (sc-&gt;vtnet_flags &amp; VTNET_FLAG_MRG_RXBUFS)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUMPAGESIZ=
E);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (sc-&gt;vtnet_flags &amp; VTNET_=
FLAG_LRO_NOMRG)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
return (MCLBYTES);<br><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 /*<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0* Try to scale the receive mbuf cluster size from the M=
TU. We<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* could also use the VQ size to=
 influence the selected size,<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* but th=
at would only matter for very small queues.<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0*/<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (vtnet_modern(sc)) {<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MPASS(sc-&gt;vtnet_hdr_siz=
e =3D=3D sizeof(struct virtio_net_hdr_v1));<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 framesz =3D sizeof(struct virtio_net_hdr_v1);<b=
r>=C2=A0 =C2=A0 =C2=A0 =C2=A0 } else<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 framesz =3D sizeof(struct vtnet_rx_header);=C2=A0<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 framesz +=3D sizeof(struct ether_vlan_header) +=
 mtu;<br>// XXX if framesz % 4 =3D=3D 2 and we&#39;re strict alignment we n=
eed to add 2</div><div>// XXX or equivalently, if vnet_hdr_size % 4 =3D=3D =
0 and ...<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (framesz &lt;=3D MCLBYTES)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MCLBYTES);<=
br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (framesz &lt;=3D MJUMPAGESIZE)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUMPAGESIZ=
E);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (framesz &lt;=3D MJUM9BYTES)<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MJUM9BYTES)=
;<br><br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Sane default; avoid 16KB clusters. =
*/<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (MCLBYTES);<br>}<br></div><div><br=
></div><div>Do you agree? Is this what you are worried about? It&#39;s the =
only hole I could find</div><div>this morning (after going over this a doze=
n other times trying to get it right for</div><div>the review, and bryanv w=
as happy neither noticed). It also explains why my</div><div>tests work: I =
didn&#39;t try to have a weird mtu of 2018 bytes.</div><div>=C2=A0<br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left:1px solid rgb(204,204,204);padding-left:1ex">
(b) the struct sizes assigned to vtnet_hdr_size are not odd numbers of<br>
=C2=A0 =C2=A0 =C2=A0bytes?=C2=A0 Could add comments or CTASSERTs?<br></bloc=
kquote><div><br></div><div>True, I&#39;ll ctassert the sizes and say we rel=
y on things being even sized</div><div>in if_vnetvar.h.</div><div><br></div=
><div>Warner</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">
&gt;=C2=A0 =C2=A0 PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0271288<br>
&gt;=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Net=
flix<br>
&gt;=C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bry=
anv<br>
&gt;=C2=A0 =C2=A0 Differential Revision:=C2=A0 <a href=3D"https://reviews.f=
reebsd.org/D43224" rel=3D"noreferrer" target=3D"_blank">https://reviews.fre=
ebsd.org/D43224</a><br>
<br>
-- <br>
Bjoern A. Zeeb=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0r15:7<br>
</blockquote></div></div>
</blockquote></div>

--00000000000079bbe206101c1e42--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrMbeCBeCA=xuZBapR5rR-ASRtn3e3BCw%2BE8NU2AerzpA>