Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Nov 2021 13:37:35 +0100
From:      Vincenzo Maffione <vmaffione@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        "net@FreeBSD.org" <net@freebsd.org>, Mark Johnston <markj@freebsd.org>,  Patrick Kelsey <pkelsey@freebsd.org>
Subject:   Re: vmxnet3: possible bug in vmxnet3_isc_rxd_pkt_get
Message-ID:  <CA%2B_eA9gT0yUFgjhBdZydeKj-p_7UzWT1VQ_7MOQ1b6epV6-tnQ@mail.gmail.com>
In-Reply-To: <65d72f7d-5096-07ec-4e21-c6356be7e06f@FreeBSD.org>
References:  <0dbe63d0-3219-846d-4c58-0bf219f41634@FreeBSD.org> <65d72f7d-5096-07ec-4e21-c6356be7e06f@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000aa4ff705d137a86a
Content-Type: text/plain; charset="UTF-8"

+1 for adding the sanity check in vmxnet3_isc_rxd_pkt_get().
This looks like a bug to me...

Cheers
  Vincenzo

Il giorno ven 19 nov 2021 alle ore 19:46 Andriy Gapon <avg@freebsd.org> ha
scritto:

> On 19/11/2021 20:19, Andriy Gapon wrote:
> > Here is some data to demonstrate the issue:
> > $1 = (iflib_rxq_t) 0xfffffe00ea9f6200
> > (kgdb) p $1->ifr_frags[0]
> > $2 = {irf_flid = 0 '\000', irf_idx = 1799, irf_len = 118}
> >
> > (kgdb) p $1->ifr_frags[1]
> > $3 = {irf_flid = 1 '\001', irf_idx = 674, irf_len = 0}
> > (kgdb) p $1->ifr_frags[2]
> > $4 = {irf_flid = 1 '\001', irf_idx = 675, irf_len = 0}
> >
> > ... elements 3..62 follow the same pattern ...
> >
> > (kgdb) p $1->ifr_frags[63]
> > $6 = {irf_flid = 1 '\001', irf_idx = 736, irf_len = 0}
> >
> > and then...
> >
> > (kgdb) p $1->ifr_frags[64]
> > $7 = {irf_flid = 1 '\001', irf_idx = 737, irf_len = 0}
> > (kgdb) p $1->ifr_frags[65]
> > $8 = {irf_flid = 1 '\001', irf_idx = 738, irf_len = 0}
> > ... the pattern continues ...
> > (kgdb) p $1->ifr_frags[70]
> > $10 = {irf_flid = 1 '\001', irf_idx = 743, irf_len = 0}
> >
> >
> > It seems like a start-of-packet completion descriptor referenced a
> descriptor in
> > the command ring zero (and apparently it didn't have the end-of-packet
> bit). And
> > there were another 70 zero-length completions referencing the ring one
> until the
> > end-of-packet.
> > So, in total 71 fragment was recorded.
> >
> > Or it's possible that those zero-length fragments were from the
> penultimate
> > pkt_get call and ifr_frags[0] was obtained after that...
>
>
> I think that this was the case and that I was able to find the
> corresponding
> descriptors in the completion ring.
>
> Please see https://people.freebsd.org/~avg/vmxnet3-fragment-overrun.txt
>
> $54 is the SOP, it has qid of 6.
> It is followed by many fragments with qid 14 (there are 8 queues / queue
> sets)
> and zero length.
> But not all of them are zero length, some have length of 4096, e.g. $77,
> $86, etc.
> $124 is the last fragment, its has eop = 1 and error = 1.
> So, there are 71 fragments in total.
>
> So, it is clear that VMWare produced 71 segments for a single packet
> before
> giving up on it.
>
> I wonder why it did that.
> Perhaps it's a bug, perhaps it does not count zero-length segments against
> the
> limit, maybe something else.
>
> In any case, it happens.
>
> Finally, the packet looks interesting: udp = 0, tcp = 0, ipcsum_ok = 0,
> ipv6 =
> 0, ipv4 = 0.  I wonder what kind of a packet it could be -- being rather
> large
> and not an IP packet.
>
> > I am not sure how that could happen.
> > I am thinking about adding a sanity check for the number of fragments.
> > Not sure yet what options there are for handling the overflow besides
> panicing.
> >
> >
> > Also, some data from the vmxnet3's side of things:
> > (kgdb) p $15.vmx_rxq[6]
> > $18 = {vxrxq_sc = 0xfffff80002d9b800, vxrxq_id = 6, vxrxq_intr_idx = 6,
> > vxrxq_irq = {ii_res = 0xfffff80002f23e00, ii_rid = 7, ii_tag =
> > 0xfffff80002f23d80}, vxrxq_cmd_ring = {{vxrxr_rxd = 0xfffffe00ead3c000,
> > vxrxr_ndesc = 2048,
> >        vxrxr_gen = 0, vxrxr_paddr = 57917440, vxrxr_desc_skips = 1114,
> > vxrxr_refill_start = 1799}, {vxrxr_rxd = 0xfffffe00ead44000, vxrxr_ndesc
> = 2048,
> > vxrxr_gen = 0, vxrxr_paddr = 57950208, vxrxr_desc_skips = 121,
> >        vxrxr_refill_start = 743}}, vxrxq_comp_ring = {vxcr_u = {txcd =
> > 0xfffffe00ead2c000, rxcd = 0xfffffe00ead2c000}, vxcr_next = 0,
> vxcr_ndesc =
> > 4096, vxcr_gen = 1, vxcr_paddr = 57851904, vxcr_zero_length = 1044,
> >      vxcr_pkt_errors = 128}, vxrxq_rs = 0xfffff80002d78e00, vxrxq_sysctl
> =
> > 0xfffff80004308080, vxrxq_name = "vmx0-rx6\000\000\000\000\000\000\000"}
> >
> > vxrxr_refill_start values are consistent with what is seen in
> ifr_frags[].
> > vxcr_zero_length and vxcr_pkt_errors are both not zero, so maybe
> something got
> > the driver into a confused state or the emulated hardware became
> confused.
>
>
> --
> Andriy Gapon
>
>

--000000000000aa4ff705d137a86a--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9gT0yUFgjhBdZydeKj-p_7UzWT1VQ_7MOQ1b6epV6-tnQ>