From nobody Thu Dec 21 21:29:17 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Sx3V50c6Jz54Hd9 for ; Thu, 21 Dec 2023 21:29:33 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Sx3V34mWBz4NxT for ; Thu, 21 Dec 2023 21:29:31 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20230601.gappssmtp.com header.s=20230601 header.b=E+sS6nbh; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2a00:1450:4864:20::635) smtp.mailfrom=wlosh@bsdimp.com; dmarc=none Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-a1915034144so153395066b.0 for ; Thu, 21 Dec 2023 13:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1703194169; x=1703798969; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vJGKCQI4UF9KYkd83N0y+PpcXDNzo9kxgcF6D+lzVbs=; b=E+sS6nbhvDwCAzl0g0lq56qmpFJJbPT5H2rP3JvTs7mFb58j2+Z8I1LDWSNIWsOqie isqnRko/VtklApUB1vJ5H0j8EKn022kOLFiU2PhAt7XUjzC0WWx8fyyp+WDRYBxk9UaQ E6MmCQjbIxAHr8H6lkgWDuQO8oSc/Dc5pQZbyzjlXta0RKgYVywwaF6uo2+MVT9MJ3o5 h7Oxa3xWiIrYfK/mUNrYhT0vyZyAbk8X3L+dfCY1n3s+VSU1XZRAOhQUXaKzKnNXzWwR 12fKz2p9OruxBYuHJSngzwU+Dx1givDZsUIU1bXSN3adeQUCnibveJNNTA3dzUE/OY/y yajg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703194169; x=1703798969; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vJGKCQI4UF9KYkd83N0y+PpcXDNzo9kxgcF6D+lzVbs=; b=B9K+BdtSKw9D3anFGfEnaE936cKBAdALqvo2/rIH6OuuwmBy7QyhpeEKiSEj4TYA4r ToAy0y6T7neyD+6eaux7kRgRQStxqzB4mb1MnW+ECepLurf3MhVTlYTSUmUCnIIqucyA Y8I3Sff0irFNKtVDqa3i4BFNkRyLfIvBrnYIocbYm9hibLLg0PExm2slFXQHVlbOb8qV Iv61MU5lPjx9ps+2F9bT7piG0fXgxBCPFSZ6Bi4cmqooonuOIGShLTYVBGI9XJWWak0p 6qLc3ZMOfVyjA4GG49offbd48qQ3X07qHEf4NNTsq5bkczMzT0SCg6KEY4ORcTIZE+vl UFag== X-Gm-Message-State: AOJu0YyuTt9IelcZP6lWb1ObKLZZwyrF/gntmQ8eC753E1DbPQSKmbMn l1WCXJYIenTlCOC7ESjlsXjO1d1ni3BjVRkEY2ueaV/b+k9IeQ== X-Google-Smtp-Source: AGHT+IHH9J4Mhc+irflo1f4GXcM9bfq803ecjsiUpSi68sIb/vGVXOveUy2XqMuVKyr7B2hj9zElzW8bZVyL6n+ZiIY= X-Received: by 2002:a17:907:29d5:b0:a23:59e4:bf16 with SMTP id ev21-20020a17090729d500b00a2359e4bf16mr173744ejc.83.1703194169177; Thu, 21 Dec 2023 13:29:29 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202312210421.3BL4Lb9T036317@gitrepo.freebsd.org> In-Reply-To: From: Warner Losh Date: Thu, 21 Dec 2023 14:29:17 -0700 Message-ID: Subject: Re: git: 9e6d11ce9a51 - main - vtnet: Adjust rx buffer so IP header 32-bit aligned To: Mark Johnston Cc: Warner Losh , src-committers , "" , "" Content-Type: multipart/alternative; boundary="000000000000700aaa060d0bcb6f" X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.997]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20230601.gappssmtp.com:s=20230601]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_COUNT_ONE(0.00)[1]; MLMMJ_DEST(0.00)[dev-commits-src-main@freebsd.org]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::635:from]; R_SPF_NA(0.00)[no SPF record]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; TO_DN_ALL(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; PREVIOUSLY_DELIVERED(0.00)[dev-commits-src-main@freebsd.org]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20230601.gappssmtp.com:+]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-Rspamd-Queue-Id: 4Sx3V34mWBz4NxT X-Spamd-Bar: -- --000000000000700aaa060d0bcb6f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Dec 21, 2023 at 9:53=E2=80=AFAM Warner Losh wrote: > > > On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston w= rote: > >> On Thu, Dec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote: >> > On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston >> wrote: >> > >> > > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote: >> > > > The branch main has been updated by imp: >> > > > >> > > > URL: >> > > >> https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4= e9188a2ba4 >> > > > >> > > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4 >> > > > Author: Warner Losh >> > > > AuthorDate: 2023-12-20 19:09:09 +0000 >> > > > Commit: Warner Losh >> > > > CommitDate: 2023-12-21 04:16:45 +0000 >> > > > >> > > > vtnet: Adjust rx buffer so IP header 32-bit aligned >> > > > >> > > > Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating >> them. >> > > > This improves performance everywhere, and allows armv7 to work >> at >> > > all. >> > > > >> > > > PR: 271288 (PR had a different fix than I >> wound >> > > up with) >> > > > MFC After: 3 days >> > > > Sponsored by: Netflix >> > > > Differential Revision: https://reviews.freebsd.org/D43136 >> > > > --- >> > > > sys/dev/virtio/network/if_vtnet.c | 2 +- >> > > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > > >> > > > diff --git a/sys/dev/virtio/network/if_vtnet.c >> > > b/sys/dev/virtio/network/if_vtnet.c >> > > > index 287af7751066..360176e4f845 100644 >> > > > --- a/sys/dev/virtio/network/if_vtnet.c >> > > > +++ b/sys/dev/virtio/network/if_vtnet.c >> > > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc *sc, i= nt >> > > nbufs, struct mbuf **m_tailp) >> > > > m_freem(m_head); >> > > > return (NULL); >> > > > } >> > > > - >> > > > m->m_len =3D size; >> > > > + m_adj(m, ETHER_ALIGN); >> > > >> > > The driver is expecting to get a cluster of size >> sc->vtnet_rx_clustersz, >> > > but now it's getting one of size sc->vtnet_rx_clustersz - 2. I don'= t >> > > see how this change can be sufficient on its own: what prevents virt= io >> > > from writing sc->vtnet_rx_clustersz bytes and thereby overwriting th= e >> > > two bytes following the cluster? >> > > >> > >> > The only trouble that I saw was here: >> > >> > /* >> > * Every mbuf should have the expected cluster size >> since >> > that >> > * is also used to allocate the replacements. >> > */ >> > KASSERT(m->m_len =3D=3D clustersz, >> > ("%s: mbuf size %d not expected cluster size %d", >> > __func__, >> > m->m_len, clustersz)); >> > >> > and even that's fine: We can adjust that assert. The next lines I thin= k >> > make it fine: >> > m->m_len =3D MIN(m->m_len, len); >> > >> > so we only use what we say is there to land bytes into the mbuf. I'm >> now >> > not sure though what to do about a few lines later: >> > >> > m_new =3D vtnet_rx_alloc_buf(sc, nreplace, &m_tail); >> > if (m_new =3D=3D NULL) { >> > m_prev->m_len =3D clustersz; >> > return (ENOBUFS); >> > } >> > >> > which likely needs to be adjusted (or the old saved size). >> >> I'm not certain about what it should be doing instead, but yes this code >> looks wrong now. >> > > Yea. We need to do this at time of use not time of alloc. > > > I likely didn't see asserts in my testing because I didn't have LRO slo= w >> > path packets. >> > >> > Do you see other places that would need adjusting? >> >> Presumably vtnet_rx_cluster_size() needs to be adjusted too? >> >> I also don't understand why we do this adjustment unconditionally >> instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all >> other NIC drivers do. >> > > None of the ones I looked at had this... maybe I got unlucky. I'd think > we'd have better performance if things are aligned all the way... is ther= e > a reason not to? > Yea, I see a lot of other drivers now... I'll follow the trend. I think it gives the ability to receive a full jumbo frame if I don't adjust. https://reviews.freebsd.org/D43161 has the updated patches, including backing out this change and moving it from alloc time to use time which seems better since we're chaining and should work for all the cases we're talking about here. Warner > Warner > >> >> > All the other places >> > seem to use m_len correctly on the rx path, but I'm not a nic driver >> guy, >> > and I might be missing something. >> > --000000000000700aaa060d0bcb6f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Dec 21, 2023 at 9:53=E2=80=AF= AM Warner Losh <imp@bsdimp.com>= wrote:


On Thu, Dec 21, 2023, 8:54=E2=80=AFAM Mark Johnston <markj@freebsd.org> = wrote:
On Thu, D= ec 21, 2023 at 08:31:36AM -0700, Warner Losh wrote:
> On Thu, Dec 21, 2023 at 8:13=E2=80=AFAM Mark Johnston <markj@freebsd= .org> wrote:
>
> > On Thu, Dec 21, 2023 at 04:21:37AM +0000, Warner Losh wrote:
> > > The branch main has been updated by imp:
> > >
> > > URL:
> > https://cgit.FreeBSD.org/src/commit/?id=3D9e6d11ce9a51d75ed6a94e180f2fb4= e9188a2ba4
> > >
> > > commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4
> > > Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;
> > > AuthorDate: 2023-12-20 19:09:09 +0000
> > > Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org&g= t;
> > > CommitDate: 2023-12-21 04:16:45 +0000
> > >
> > >=C2=A0 =C2=A0 =C2=A0vtnet: Adjust rx buffer so IP header 32-b= it aligned
> > >
> > >=C2=A0 =C2=A0 =C2=A0Call madj(m, ETHER_ALIGN) to offset rx bu= ffers when allocating them.
> > >=C2=A0 =C2=A0 =C2=A0This improves performance everywhere, and= allows armv7 to work at
> > all.
> > >
> > >=C2=A0 =C2=A0 =C2=A0PR:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0271288 (PR had a different fix than I= wound
> > up with)
> > >=C2=A0 =C2=A0 =C2=A0MFC After:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 3 days
> > >=C2=A0 =C2=A0 =C2=A0Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0Netflix
> > >=C2=A0 =C2=A0 =C2=A0Differential Revision:=C2=A0 https://reviews.freebsd.org/D43136
> > > ---
> > >=C2=A0 sys/dev/virtio/network/if_vtnet.c | 2 +-
> > >=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/sys/dev/virtio/network/if_vtnet.c
> > b/sys/dev/virtio/network/if_vtnet.c
> > > index 287af7751066..360176e4f845 100644
> > > --- a/sys/dev/virtio/network/if_vtnet.c
> > > +++ b/sys/dev/virtio/network/if_vtnet.c
> > > @@ -1532,8 +1532,8 @@ vtnet_rx_alloc_buf(struct vtnet_softc = *sc, int
> > nbufs, struct mbuf **m_tailp)
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0m_freem(m_head);
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0return (NULL);
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> > > -
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->= m_len =3D size;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_adj(m, ET= HER_ALIGN);
> >
> > The driver is expecting to get a cluster of size sc->vtnet_rx_= clustersz,
> > but now it's getting one of size sc->vtnet_rx_clustersz - = 2.=C2=A0 I don't
> > see how this change can be sufficient on its own: what prevents v= irtio
> > from writing sc->vtnet_rx_clustersz bytes and thereby overwrit= ing the
> > two bytes following the cluster?
> >
>
> The only trouble that I saw was here:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Every = mbuf should have the expected cluster size since
> that
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * is als= o used to allocate the replacements.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0KASSERT(m= ->m_len =3D=3D clustersz,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0("%s: mbuf size %d not expected cluster size %d",
> __func__,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0m->m_len, clustersz));
>
> and even that's fine: We can adjust that assert. The next lines I = think
> make it fine:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m->m_l= en =3D MIN(m->m_len, len);
>
> so we only use what we say is there to land bytes into the mbuf.=C2=A0= I'm now
> not sure though what to do about a few lines later:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 m_new =3D vtnet_rx_alloc_buf(sc, nreplace, = &m_tail);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (m_new =3D=3D NULL) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0m_prev-&g= t;m_len =3D clustersz;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (E= NOBUFS);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
> which likely needs to be adjusted (or the old saved size).

I'm not certain about what it should be doing instead, but yes this cod= e
looks wrong now.

Yea. We need to do this at time of use not time of alloc.

> I likely didn't see asserts in my testing because I didn't hav= e LRO slow
> path packets.
>
> Do you see other places that would need adjusting?

Presumably vtnet_rx_cluster_size() needs to be adjusted too?

I also don't understand why we do this adjustment unconditionally
instead of only when __NO_STRICT_ALIGNMENT is not defined, as almost all other NIC drivers do.

None of the ones I looked at had this... maybe I got u= nlucky. I'd think we'd have better performance if things are aligne= d all the way... is there a reason not to?
Yea, I see a lot of other drivers now...=C2=A0 I'll follow = the trend. I think it gives the ability to receive a full jumbo frame if I = don't adjust.


has the updated patches, including backing out this change and movi= ng it from alloc time to use time which seems better since we're chaini= ng and should work for all the cases we're talking about here.

Warner
=C2=A0
Warner

> All the other places
> seem to use m_len correctly on the rx path, but I'm not a nic driv= er guy,
> and I might be missing something.
--000000000000700aaa060d0bcb6f--