From owner-freebsd-net@FreeBSD.ORG Sun Dec 15 00:06:38 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 23D44141 for ; Sun, 15 Dec 2013 00:06:38 +0000 (UTC) Received: from mail-qa0-x22f.google.com (mail-qa0-x22f.google.com [IPv6:2607:f8b0:400d:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D74121871 for ; Sun, 15 Dec 2013 00:06:37 +0000 (UTC) Received: by mail-qa0-f47.google.com with SMTP id w5so591697qac.13 for ; Sat, 14 Dec 2013 16:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=+AHhCp+HwyLXbti7/kRtFYQ3XcGxQ5UvDrYbss+RBpM=; b=wVIb5idsOuTtCulshrwo8DzxjhlYT1lrPCmWkwWsGL5G5eYXlMy71AJHXQBBBSEy8C 3WsDmnuzz2UYSl4B+IZCczLFyOT42/DL0lw+m6FHOCfUsA67ZKcZL7JgcNHrhcd9iDzf eFOXacg3rO3pXmsS6Y7Nc9wgfFyorH4SWpUbD7FqwPkCVu5HkFvi2lpi0cqcgB+vxoP6 Oeb3pscfuaWrWs/8sAXhx+cKNNmPCNd2zWMldTqdEVWC9Am7CIQowrTQfXL7zfjIqWs0 3F83+U83P2t0mOdiz5PHIItarDangkJRTLQi5S212jJB1cKhZErvIyceGwiHDZ0L91zN HfXg== MIME-Version: 1.0 X-Received: by 10.49.116.141 with SMTP id jw13mr19229799qeb.2.1387065997119; Sat, 14 Dec 2013 16:06:37 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.53.200 with HTTP; Sat, 14 Dec 2013 16:06:37 -0800 (PST) In-Reply-To: References: Date: Sat, 14 Dec 2013 16:06:37 -0800 X-Google-Sender-Auth: tuzNx1qdLgpkxQuangBtsdsaLOE Message-ID: Subject: Re: buf_ring in HEAD is racy From: Adrian Chadd To: Ryan Stone Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Dec 2013 00:06:38 -0000 oh cool, you just did the output-drops thing I was about to code up. We're missing those counters at work and the ops guys poked me about it. I'll also give that a whirl locally and see about working with jack to get it into -HEAD / MFC'ed to 10. Thanks! -a On 13 December 2013 21:04, Ryan Stone wrote: > I am seeing spurious output packet drops that appear to be due to > insufficient memory barriers in buf_ring. I believe that this is the > scenario that I am seeing: > > 1) The buf_ring is empty, br_prod_head = br_cons_head = 0 > 2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It fetches > br_prod_head (0) into a local variable called prod_head > 3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of events > is essentially: > > Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1) > Thread 2 sets br_ring[1] = mbuf; > Thread 2 does a full memory barrier > Thread 2 updates br_prod_tail to 1 > > 4) Thread 2 dequeues the packet from the buf_ring using the > single-consumer interface. The sequence of events is essentialy: > > Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail), > this is false > Thread 2 sets br_cons_head to 1 > Thread 2 grabs the mbuf from br_ring[1] > Thread 2 sets br_cons_tail to 1 > > 5) Thread 1, which is still attempting to enqueue an mbuf on the ring. > fetches br_cons_tail (1) into a local variable called cons_tail. It > sees cons_tail == 1 but prod_head == 0 and concludes that the ring is > full and drops the packet (incrementing br_drops unatomically, I might > add) > > > I can reproduce several drops per minute by configuring the ixgbe > driver to use only 1 queue and then sending traffic from concurrent 8 > iperf processes. (You will need this hacky patch to even see the > drops with netstat, though: > http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff) > > I am investigating fixing buf_ring by using acquire/release semantics > rather than load/store barriers. However, I note that this will > apparently be the second attempt to fix buf_ring, and I'm seriously > questioning whether this is worth the effort compared to the > simplicity of using a mutex. I'm not even convinced that a correct > lockless implementation will even be a performance win, given the > number of memory barriers that will apparently be necessary. > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"