From owner-freebsd-net@FreeBSD.ORG Tue Aug 27 07:28:03 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4EBC9935 for ; Tue, 27 Aug 2013 07:28:03 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9B37C21D5 for ; Tue, 27 Aug 2013 07:28:02 +0000 (UTC) Received: (qmail 11868 invoked from network); 27 Aug 2013 08:10:05 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 27 Aug 2013 08:10:05 -0000 Message-ID: <521C54FD.2060109@freebsd.org> Date: Tue, 27 Aug 2013 09:27:57 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: Flow ID, LACP, and igb References: <521BBD21.4070304@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jack F Vogel , "Justin T. Gibbs" , Alan Somers , "net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Aug 2013 07:28:03 -0000 On 27.08.2013 01:30, Adrian Chadd wrote: > ... is there any reason we wouldn't want to have the TX and RX for a given flow mapped to the same core? They are. Thing is the inbound and outbound packet flow id's are totally independent from each other. The inbound one determines the RX ring it will take to go up the stack. If that's bound to a core that's fine and gives affinity. If the socket and user-space application are bound to the same core as well, there is full affinity. Now on the way down the core doing the write to the socket matters entering the kernel. It stays there until the packet is generated (in tcp_output for example). The flow id of the packet doesn't matter at all so far because it is filled only then. Now the packet goes down the stack and the flow id is only used at the end when it has to decide for an outbound TX queue based on it. This outbound TX ring doesn't have to be same it came in on as long as it stays the same to prevent reordering. This fixes Justin's issue with if_lagg and poor balancing. He can simply choose a good hash for the packets going out and stop worrying about it. More important he's no longer hostage to random switches with poor hashing. Ultimately you could try to bind the TX ring to a particular CPU as well and try to run it lockless. That is fraught with some difficult problems though. First you must have exactly as many RX/TX queues as cores. That's often not the case as there are many cards that only support a limited number of rings. Then for packets generated locally (think DNS query over UDP) you either simply stick to the local cpu-assigned queue to send without looking at the computed flow id or you have to switch cores to send the packet on the correct queue. Such a very strong core binding is typically only really useful in embarrassing parallel applications that only do packet pushing. If your application is also compute intense you may want to have some more flexibility to schedule threads to prevent stalls from busy cores. In that case not binding TX to a core is a win. So we will pretty much end up with one lock per TX ring to protect the DMA descriptor structures. We're still far way from having to worry about this TX issue. The big win is the RX queue - socket - application affinity (to the same core). -- Andre