Date: Mon, 16 Dec 2013 12:19:40 -0800 From: Adrian Chadd <adrian@freebsd.org> To: Ryan Stone <rysto32@gmail.com> Cc: freebsd-net <freebsd-net@freebsd.org> Subject: Re: buf_ring in HEAD is racy Message-ID: <CAJ-Vmo=OpWZ_OphA=%2BXbmgHw%2BFwuCMwngHq9_2WOnNHMa_RA9A@mail.gmail.com> In-Reply-To: <CAJ-VmonJG-M_f_m36f-z3ArumBKNdt5%2B6muwFJzWABRxRQWJaw@mail.gmail.com> References: <CAFMmRNyJpvZ0AewWr62w16=qKer%2BFNXUJJy0Qc=EBqMnUV3OyQ@mail.gmail.com> <CAJ-VmonJG-M_f_m36f-z3ArumBKNdt5%2B6muwFJzWABRxRQWJaw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
So I've done some digging. The TL;DR - we really should expose the drops count in the ifa_data field. But that would involve an ABI change and .. well, that will be full of drama. The non-TL:DR: * The send queue drops are in the ifmibdata structure, which includes a few other things. But that requires you use the MIB interface to pull things out, rather than getifaddrs(). * getifaddrs() doesn't contain the max sendq length or drops, so we can't report those without adding a MIB fetch for the relevant interface. * bsnmp (which we use at work) correctly populates output discards by checking the MIB for ifmd_snd_drops. So I'd rather we fix ixgbe and other drivers by updating the send queue drops counter if the frames are dropped. That way it's consistent with other things. We should then do some of these things: * add a send queue drops field to the ifdata/ifadata where appropriate and make sure it gets correctly populated; * teach netstat to use ifmib instead of getifaddrs(); * as part of killing the ifsnd queue stuff, we should also migrate that particular drops counter out from there and to a top-level counter in ifnet, like the rest. Comments? -a On 14 December 2013 16:06, Adrian Chadd <adrian@freebsd.org> wrote: > oh cool, you just did the output-drops thing I was about to code up. > We're missing those counters at work and the ops guys poked me about > it. > > I'll also give that a whirl locally and see about working with jack to > get it into -HEAD / MFC'ed to 10. > > Thanks! > > > -a > > > On 13 December 2013 21:04, Ryan Stone <rysto32@gmail.com> wrote: >> I am seeing spurious output packet drops that appear to be due to >> insufficient memory barriers in buf_ring. I believe that this is the >> scenario that I am seeing: >> >> 1) The buf_ring is empty, br_prod_head = br_cons_head = 0 >> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It fetches >> br_prod_head (0) into a local variable called prod_head >> 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of events >> is essentially: >> >> Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1) >> Thread 2 sets br_ring[1] = mbuf; >> Thread 2 does a full memory barrier >> Thread 2 updates br_prod_tail to 1 >> >> 4) Thread 2 dequeues the packet from the buf_ring using the >> single-consumer interface. The sequence of events is essentialy: >> >> Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail), >> this is false >> Thread 2 sets br_cons_head to 1 >> Thread 2 grabs the mbuf from br_ring[1] >> Thread 2 sets br_cons_tail to 1 >> >> 5) Thread 1, which is still attempting to enqueue an mbuf on the ring. >> fetches br_cons_tail (1) into a local variable called cons_tail. It >> sees cons_tail == 1 but prod_head == 0 and concludes that the ring is >> full and drops the packet (incrementing br_drops unatomically, I might >> add) >> >> >> I can reproduce several drops per minute by configuring the ixgbe >> driver to use only 1 queue and then sending traffic from concurrent 8 >> iperf processes. (You will need this hacky patch to even see the >> drops with netstat, though: >> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff) >> >> I am investigating fixing buf_ring by using acquire/release semantics >> rather than load/store barriers. However, I note that this will >> apparently be the second attempt to fix buf_ring, and I'm seriously >> questioning whether this is worth the effort compared to the >> simplicity of using a mutex. I'm not even convinced that a correct >> lockless implementation will even be a performance win, given the >> number of memory barriers that will apparently be necessary. >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=OpWZ_OphA=%2BXbmgHw%2BFwuCMwngHq9_2WOnNHMa_RA9A>