From owner-freebsd-net@FreeBSD.ORG Tue Dec 17 16:39:50 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9322B6AF; Tue, 17 Dec 2013 16:39:50 +0000 (UTC) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A11FE152A; Tue, 17 Dec 2013 16:39:49 +0000 (UTC) Received: from [192.168.1.200] (p508F1D30.dip0.t-ipconnect.de [80.143.29.48]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id BDAB31C0C0692; Tue, 17 Dec 2013 17:39:45 +0100 (CET) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: buf_ring in HEAD is racy From: Michael Tuexen In-Reply-To: Date: Tue, 17 Dec 2013 17:39:45 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Adrian Chadd X-Mailer: Apple Mail (2.1510) Cc: Jack F Vogel , freebsd-net , Ryan Stone X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Dec 2013 16:39:50 -0000 On Dec 17, 2013, at 4:43 PM, Adrian Chadd wrote: > ie: >=20 > Index: sys/dev/ixgbe/ixgbe.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/dev/ixgbe/ixgbe.c (revision 2995) > +++ sys/dev/ixgbe/ixgbe.c (working copy) > @@ -5178,6 +5178,7 @@ > struct ixgbe_hw *hw =3D &adapter->hw; > u32 missed_rx =3D 0, bprc, lxon, lxoff, total; > u64 total_missed_rx =3D 0; > + u64 odrops; What about=20 + u64 odrops =3D 0; since odrops seems to be used uninitialized. Best regards Michael >=20 > adapter->stats.crcerrs +=3D IXGBE_READ_REG(hw, IXGBE_CRCERRS); > adapter->stats.illerrc +=3D IXGBE_READ_REG(hw, IXGBE_ILLERRC); > @@ -5308,6 +5309,11 @@ > adapter->stats.fcoedwtc +=3D IXGBE_READ_REG(hw, = IXGBE_FCOEDWTC); > } >=20 > + /* TX drops */ > + for (int i =3D 0; i < adapter->num_queues; i++) { > + odrops +=3D adapter->tx_rings[i].br->br_drops; > + } > + > /* Fill out the OS statistics structure */ > ifp->if_ipackets =3D adapter->stats.gprc; > ifp->if_opackets =3D adapter->stats.gptc; > @@ -5317,6 +5323,9 @@ > ifp->if_omcasts =3D adapter->stats.mptc; > ifp->if_collisions =3D 0; >=20 > + /* TX drops are stored in if_snd for now, not the top level = counters */ > + ifp->if_snd.ifq_drops =3D drops; > + > /* Rx Errors */ > ifp->if_iqdrops =3D total_missed_rx; > ifp->if_ierrors =3D adapter->stats.crcerrs + = adapter->stats.rlec; >=20 >=20 >=20 >=20 > -adrian >=20 > On 16 December 2013 12:19, Adrian Chadd wrote: >> So I've done some digging. >>=20 >> The TL;DR - we really should expose the drops count in the ifa_data >> field. But that would involve an ABI change and .. well, that will be >> full of drama. >>=20 >> The non-TL:DR: >>=20 >> * The send queue drops are in the ifmibdata structure, which includes >> a few other things. But that requires you use the MIB interface to >> pull things out, rather than getifaddrs(). >> * getifaddrs() doesn't contain the max sendq length or drops, so we >> can't report those without adding a MIB fetch for the relevant >> interface. >> * bsnmp (which we use at work) correctly populates output discards by >> checking the MIB for ifmd_snd_drops. >>=20 >> So I'd rather we fix ixgbe and other drivers by updating the send >> queue drops counter if the frames are dropped. That way it's >> consistent with other things. >>=20 >> We should then do some of these things: >>=20 >> * add a send queue drops field to the ifdata/ifadata where = appropriate >> and make sure it gets correctly populated; >> * teach netstat to use ifmib instead of getifaddrs(); >> * as part of killing the ifsnd queue stuff, we should also migrate >> that particular drops counter out from there and to a top-level >> counter in ifnet, like the rest. >>=20 >> Comments? >>=20 >>=20 >> -a >>=20 >>=20 >> On 14 December 2013 16:06, Adrian Chadd wrote: >>> oh cool, you just did the output-drops thing I was about to code up. >>> We're missing those counters at work and the ops guys poked me about >>> it. >>>=20 >>> I'll also give that a whirl locally and see about working with jack = to >>> get it into -HEAD / MFC'ed to 10. >>>=20 >>> Thanks! >>>=20 >>>=20 >>> -a >>>=20 >>>=20 >>> On 13 December 2013 21:04, Ryan Stone wrote: >>>> I am seeing spurious output packet drops that appear to be due to >>>> insufficient memory barriers in buf_ring. I believe that this is = the >>>> scenario that I am seeing: >>>>=20 >>>> 1) The buf_ring is empty, br_prod_head =3D br_cons_head =3D 0 >>>> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It = fetches >>>> br_prod_head (0) into a local variable called prod_head >>>> 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of = events >>>> is essentially: >>>>=20 >>>> Thread 2 claims an index in the ring and atomically sets = br_prod_head (say to 1) >>>> Thread 2 sets br_ring[1] =3D mbuf; >>>> Thread 2 does a full memory barrier >>>> Thread 2 updates br_prod_tail to 1 >>>>=20 >>>> 4) Thread 2 dequeues the packet from the buf_ring using the >>>> single-consumer interface. The sequence of events is essentialy: >>>>=20 >>>> Thread 2 checks whether queue is empty (br_cons_head =3D=3D = br_prod_tail), >>>> this is false >>>> Thread 2 sets br_cons_head to 1 >>>> Thread 2 grabs the mbuf from br_ring[1] >>>> Thread 2 sets br_cons_tail to 1 >>>>=20 >>>> 5) Thread 1, which is still attempting to enqueue an mbuf on the = ring. >>>> fetches br_cons_tail (1) into a local variable called cons_tail. = It >>>> sees cons_tail =3D=3D 1 but prod_head =3D=3D 0 and concludes that = the ring is >>>> full and drops the packet (incrementing br_drops unatomically, I = might >>>> add) >>>>=20 >>>>=20 >>>> I can reproduce several drops per minute by configuring the ixgbe >>>> driver to use only 1 queue and then sending traffic from concurrent = 8 >>>> iperf processes. (You will need this hacky patch to even see the >>>> drops with netstat, though: >>>> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff) >>>>=20 >>>> I am investigating fixing buf_ring by using acquire/release = semantics >>>> rather than load/store barriers. However, I note that this will >>>> apparently be the second attempt to fix buf_ring, and I'm seriously >>>> questioning whether this is worth the effort compared to the >>>> simplicity of using a mutex. I'm not even convinced that a correct >>>> lockless implementation will even be a performance win, given the >>>> number of memory barriers that will apparently be necessary. >>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20