From owner-freebsd-net@FreeBSD.ORG Tue Dec 17 15:43:50 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5012C9C6; Tue, 17 Dec 2013 15:43:50 +0000 (UTC) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EF8511045; Tue, 17 Dec 2013 15:43:49 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id w7so5083791qcr.11 for ; Tue, 17 Dec 2013 07:43:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=CiPWYMSMTosjx/3PFb5HFsr9t0zfKGDXSvNGufX1n4Q=; b=ppm3gIM3p+jpWYmZ/o74FhJBvJ2fZuO2NQ09yR3imj9DbK9t0TGPvVUfXD9lNBKnOo DPB2Nn3MmiTB5xfFAkSLt17OHuTFMHlF2ScunnBFra8JnmZ/k9nbBANCC7Bg0Olpf+6N Iv3BGfwiBMIpNUd4ClM6cTXOQLZpeCxDFw3EjRhHJIxAUfviNIFdWgy/ul5vjuUsjj3K If6U5aGFoTvoHehJN0S1IkvSuop3Fr0yH7Fp2kx6BqfZ13YAvjF4z1gHlWxxSQ1gRuNm Aru3VEF4acGdTnXDMG5jRZRnDJqZiJuatlZM8y1dXtCt5T2SrhCnwMttnA8vz6GVZSH8 hAdw== MIME-Version: 1.0 X-Received: by 10.224.89.73 with SMTP id d9mr44860263qam.5.1387295029171; Tue, 17 Dec 2013 07:43:49 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Tue, 17 Dec 2013 07:43:49 -0800 (PST) In-Reply-To: References: Date: Tue, 17 Dec 2013 07:43:49 -0800 X-Google-Sender-Auth: 7JlznGHsXQm95Jvv87nfkG50W10 Message-ID: Subject: Re: buf_ring in HEAD is racy From: Adrian Chadd To: Ryan Stone , Jack F Vogel Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Dec 2013 15:43:50 -0000 ie: Index: sys/dev/ixgbe/ixgbe.c =================================================================== --- sys/dev/ixgbe/ixgbe.c (revision 2995) +++ sys/dev/ixgbe/ixgbe.c (working copy) @@ -5178,6 +5178,7 @@ struct ixgbe_hw *hw = &adapter->hw; u32 missed_rx = 0, bprc, lxon, lxoff, total; u64 total_missed_rx = 0; + u64 odrops; adapter->stats.crcerrs += IXGBE_READ_REG(hw, IXGBE_CRCERRS); adapter->stats.illerrc += IXGBE_READ_REG(hw, IXGBE_ILLERRC); @@ -5308,6 +5309,11 @@ adapter->stats.fcoedwtc += IXGBE_READ_REG(hw, IXGBE_FCOEDWTC); } + /* TX drops */ + for (int i = 0; i < adapter->num_queues; i++) { + odrops += adapter->tx_rings[i].br->br_drops; + } + /* Fill out the OS statistics structure */ ifp->if_ipackets = adapter->stats.gprc; ifp->if_opackets = adapter->stats.gptc; @@ -5317,6 +5323,9 @@ ifp->if_omcasts = adapter->stats.mptc; ifp->if_collisions = 0; + /* TX drops are stored in if_snd for now, not the top level counters */ + ifp->if_snd.ifq_drops = drops; + /* Rx Errors */ ifp->if_iqdrops = total_missed_rx; ifp->if_ierrors = adapter->stats.crcerrs + adapter->stats.rlec; -adrian On 16 December 2013 12:19, Adrian Chadd wrote: > So I've done some digging. > > The TL;DR - we really should expose the drops count in the ifa_data > field. But that would involve an ABI change and .. well, that will be > full of drama. > > The non-TL:DR: > > * The send queue drops are in the ifmibdata structure, which includes > a few other things. But that requires you use the MIB interface to > pull things out, rather than getifaddrs(). > * getifaddrs() doesn't contain the max sendq length or drops, so we > can't report those without adding a MIB fetch for the relevant > interface. > * bsnmp (which we use at work) correctly populates output discards by > checking the MIB for ifmd_snd_drops. > > So I'd rather we fix ixgbe and other drivers by updating the send > queue drops counter if the frames are dropped. That way it's > consistent with other things. > > We should then do some of these things: > > * add a send queue drops field to the ifdata/ifadata where appropriate > and make sure it gets correctly populated; > * teach netstat to use ifmib instead of getifaddrs(); > * as part of killing the ifsnd queue stuff, we should also migrate > that particular drops counter out from there and to a top-level > counter in ifnet, like the rest. > > Comments? > > > -a > > > On 14 December 2013 16:06, Adrian Chadd wrote: >> oh cool, you just did the output-drops thing I was about to code up. >> We're missing those counters at work and the ops guys poked me about >> it. >> >> I'll also give that a whirl locally and see about working with jack to >> get it into -HEAD / MFC'ed to 10. >> >> Thanks! >> >> >> -a >> >> >> On 13 December 2013 21:04, Ryan Stone wrote: >>> I am seeing spurious output packet drops that appear to be due to >>> insufficient memory barriers in buf_ring. I believe that this is the >>> scenario that I am seeing: >>> >>> 1) The buf_ring is empty, br_prod_head = br_cons_head = 0 >>> 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It fetches >>> br_prod_head (0) into a local variable called prod_head >>> 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of events >>> is essentially: >>> >>> Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1) >>> Thread 2 sets br_ring[1] = mbuf; >>> Thread 2 does a full memory barrier >>> Thread 2 updates br_prod_tail to 1 >>> >>> 4) Thread 2 dequeues the packet from the buf_ring using the >>> single-consumer interface. The sequence of events is essentialy: >>> >>> Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail), >>> this is false >>> Thread 2 sets br_cons_head to 1 >>> Thread 2 grabs the mbuf from br_ring[1] >>> Thread 2 sets br_cons_tail to 1 >>> >>> 5) Thread 1, which is still attempting to enqueue an mbuf on the ring. >>> fetches br_cons_tail (1) into a local variable called cons_tail. It >>> sees cons_tail == 1 but prod_head == 0 and concludes that the ring is >>> full and drops the packet (incrementing br_drops unatomically, I might >>> add) >>> >>> >>> I can reproduce several drops per minute by configuring the ixgbe >>> driver to use only 1 queue and then sending traffic from concurrent 8 >>> iperf processes. (You will need this hacky patch to even see the >>> drops with netstat, though: >>> http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff) >>> >>> I am investigating fixing buf_ring by using acquire/release semantics >>> rather than load/store barriers. However, I note that this will >>> apparently be the second attempt to fix buf_ring, and I'm seriously >>> questioning whether this is worth the effort compared to the >>> simplicity of using a mutex. I'm not even convinced that a correct >>> lockless implementation will even be a performance win, given the >>> number of memory barriers that will apparently be necessary. >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"