Date: Tue, 17 Dec 2013 17:39:45 +0100 From: Michael Tuexen <Michael.Tuexen@lurchi.franken.de> To: Adrian Chadd <adrian@freebsd.org> Cc: Jack F Vogel <jfv@freebsd.org>, freebsd-net <freebsd-net@freebsd.org>, Ryan Stone <rysto32@gmail.com> Subject: Re: buf_ring in HEAD is racy Message-ID: <E5612C2C-C9B9-45D8-A794-9993FE39D9E6@lurchi.franken.de> In-Reply-To: <CAJ-Vmo=iayVJiMNihmPf_Dd18uEZ5W-j9E3g4SeaRhfFJQj4Ow@mail.gmail.com> References: <CAFMmRNyJpvZ0AewWr62w16=qKer%2BFNXUJJy0Qc=EBqMnUV3OyQ@mail.gmail.com> <CAJ-VmonJG-M_f_m36f-z3ArumBKNdt5%2B6muwFJzWABRxRQWJaw@mail.gmail.com> <CAJ-Vmo=OpWZ_OphA=%2BXbmgHw%2BFwuCMwngHq9_2WOnNHMa_RA9A@mail.gmail.com> <CAJ-Vmo=iayVJiMNihmPf_Dd18uEZ5W-j9E3g4SeaRhfFJQj4Ow@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 17, 2013, at 4:43 PM, Adrian Chadd <adrian@freebsd.org> wrote:
> ie:
>=20
> Index: sys/dev/ixgbe/ixgbe.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- sys/dev/ixgbe/ixgbe.c (revision 2995)
> +++ sys/dev/ixgbe/ixgbe.c (working copy)
> @@ -5178,6 +5178,7 @@
> struct ixgbe_hw *hw =3D &adapter->hw;
> u32 missed_rx =3D 0, bprc, lxon, lxoff, total;
> u64 total_missed_rx =3D 0;
> + u64 odrops;
What about=20
+ u64 odrops =3D 0;
since odrops seems to be used uninitialized.
Best regards
Michael
>=20
> adapter->stats.crcerrs +=3D IXGBE_READ_REG(hw, IXGBE_CRCERRS);
> adapter->stats.illerrc +=3D IXGBE_READ_REG(hw, IXGBE_ILLERRC);
> @@ -5308,6 +5309,11 @@
> adapter->stats.fcoedwtc +=3D IXGBE_READ_REG(hw, =
IXGBE_FCOEDWTC);
> }
>=20
> + /* TX drops */
> + for (int i =3D 0; i < adapter->num_queues; i++) {
> + odrops +=3D adapter->tx_rings[i].br->br_drops;
> + }
> +
> /* Fill out the OS statistics structure */
> ifp->if_ipackets =3D adapter->stats.gprc;
> ifp->if_opackets =3D adapter->stats.gptc;
> @@ -5317,6 +5323,9 @@
> ifp->if_omcasts =3D adapter->stats.mptc;
> ifp->if_collisions =3D 0;
>=20
> + /* TX drops are stored in if_snd for now, not the top level =
counters */
> + ifp->if_snd.ifq_drops =3D drops;
> +
> /* Rx Errors */
> ifp->if_iqdrops =3D total_missed_rx;
> ifp->if_ierrors =3D adapter->stats.crcerrs + =
adapter->stats.rlec;
>=20
>=20
>=20
>=20
> -adrian
>=20
> On 16 December 2013 12:19, Adrian Chadd <adrian@freebsd.org> wrote:
>> So I've done some digging.
>>=20
>> The TL;DR - we really should expose the drops count in the ifa_data
>> field. But that would involve an ABI change and .. well, that will be
>> full of drama.
>>=20
>> The non-TL:DR:
>>=20
>> * The send queue drops are in the ifmibdata structure, which includes
>> a few other things. But that requires you use the MIB interface to
>> pull things out, rather than getifaddrs().
>> * getifaddrs() doesn't contain the max sendq length or drops, so we
>> can't report those without adding a MIB fetch for the relevant
>> interface.
>> * bsnmp (which we use at work) correctly populates output discards by
>> checking the MIB for ifmd_snd_drops.
>>=20
>> So I'd rather we fix ixgbe and other drivers by updating the send
>> queue drops counter if the frames are dropped. That way it's
>> consistent with other things.
>>=20
>> We should then do some of these things:
>>=20
>> * add a send queue drops field to the ifdata/ifadata where =
appropriate
>> and make sure it gets correctly populated;
>> * teach netstat to use ifmib instead of getifaddrs();
>> * as part of killing the ifsnd queue stuff, we should also migrate
>> that particular drops counter out from there and to a top-level
>> counter in ifnet, like the rest.
>>=20
>> Comments?
>>=20
>>=20
>> -a
>>=20
>>=20
>> On 14 December 2013 16:06, Adrian Chadd <adrian@freebsd.org> wrote:
>>> oh cool, you just did the output-drops thing I was about to code up.
>>> We're missing those counters at work and the ops guys poked me about
>>> it.
>>>=20
>>> I'll also give that a whirl locally and see about working with jack =
to
>>> get it into -HEAD / MFC'ed to 10.
>>>=20
>>> Thanks!
>>>=20
>>>=20
>>> -a
>>>=20
>>>=20
>>> On 13 December 2013 21:04, Ryan Stone <rysto32@gmail.com> wrote:
>>>> I am seeing spurious output packet drops that appear to be due to
>>>> insufficient memory barriers in buf_ring. I believe that this is =
the
>>>> scenario that I am seeing:
>>>>=20
>>>> 1) The buf_ring is empty, br_prod_head =3D br_cons_head =3D 0
>>>> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It =
fetches
>>>> br_prod_head (0) into a local variable called prod_head
>>>> 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of =
events
>>>> is essentially:
>>>>=20
>>>> Thread 2 claims an index in the ring and atomically sets =
br_prod_head (say to 1)
>>>> Thread 2 sets br_ring[1] =3D mbuf;
>>>> Thread 2 does a full memory barrier
>>>> Thread 2 updates br_prod_tail to 1
>>>>=20
>>>> 4) Thread 2 dequeues the packet from the buf_ring using the
>>>> single-consumer interface. The sequence of events is essentialy:
>>>>=20
>>>> Thread 2 checks whether queue is empty (br_cons_head =3D=3D =
br_prod_tail),
>>>> this is false
>>>> Thread 2 sets br_cons_head to 1
>>>> Thread 2 grabs the mbuf from br_ring[1]
>>>> Thread 2 sets br_cons_tail to 1
>>>>=20
>>>> 5) Thread 1, which is still attempting to enqueue an mbuf on the =
ring.
>>>> fetches br_cons_tail (1) into a local variable called cons_tail. =
It
>>>> sees cons_tail =3D=3D 1 but prod_head =3D=3D 0 and concludes that =
the ring is
>>>> full and drops the packet (incrementing br_drops unatomically, I =
might
>>>> add)
>>>>=20
>>>>=20
>>>> I can reproduce several drops per minute by configuring the ixgbe
>>>> driver to use only 1 queue and then sending traffic from concurrent =
8
>>>> iperf processes. (You will need this hacky patch to even see the
>>>> drops with netstat, though:
>>>> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff)
>>>>=20
>>>> I am investigating fixing buf_ring by using acquire/release =
semantics
>>>> rather than load/store barriers. However, I note that this will
>>>> apparently be the second attempt to fix buf_ring, and I'm seriously
>>>> questioning whether this is worth the effort compared to the
>>>> simplicity of using a mutex. I'm not even convinced that a correct
>>>> lockless implementation will even be a performance win, given the
>>>> number of memory barriers that will apparently be necessary.
>>>> _______________________________________________
>>>> freebsd-net@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E5612C2C-C9B9-45D8-A794-9993FE39D9E6>
