From owner-freebsd-net@FreeBSD.ORG Wed Aug 19 09:24:53 2009 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F348A106568D for ; Wed, 19 Aug 2009 09:24:52 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 89AF58FC43 for ; Wed, 19 Aug 2009 09:24:52 +0000 (UTC) Received: from c122-106-152-1.carlnfd1.nsw.optusnet.com.au (c122-106-152-1.carlnfd1.nsw.optusnet.com.au [122.106.152.1]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n7J9OkMp024397 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Aug 2009 19:24:48 +1000 Date: Wed, 19 Aug 2009 19:24:46 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Manish Vachharajani In-Reply-To: <5bc218350908181535o7c5275dfn2f6647454cfac804@mail.gmail.com> Message-ID: <20090819183756.Y35058@delplex.bde.org> References: <5bc218350908171524m5a46c3dbm3e6af625c51370d0@mail.gmail.com> <373149.52091.qm@web63907.mail.re1.yahoo.com> <5bc218350908181535o7c5275dfn2f6647454cfac804@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org Subject: Re: Dropped vs. missed packets in the ixgbe driver X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Aug 2009 09:24:53 -0000 On Tue, 18 Aug 2009, Manish Vachharajani wrote: > So, in a nutshell, the question is: should these drivers be reporting > miss events as input errors in the ifnet struct as the bge driver > does, or as drops in the ifnet struct, was there some conscious > decision not to report miss events anywhere outside the debug and > stats info, or am I just being silly and not seeing where the numbers > are reported? Certainly they should be no worse than bge in this area. Even bge has problems for the 5705_PLUS versions. PLUS really means MINUS; 5705- hardware is dumbed down so the IFIN_DROPS register is almost unusable, but since the hardware is so bad drops are more likely than with better hardware. The unusablility involves the register being only 8 (?) bits wide and being reset on every read, so you have to read it often to ensure that it doesn't wrap, but reading it (or any PCI register) is very inefficient so the the read that is done often enough to work (in bge_rxeof()) is only done if the "notyet" non-option is configured. Resetting on every read of this and most or all other statistic registers on 5705- hardware also completely breaks most or all bge statistics in the bge statistics sysctl, due to the way sysctl(3) is implemented: sysctl(3) always calls the sysctl syscall twice and uses the results of the second call; both calls do a read at the lowest level, so with registers that are reset on every read, the first call resets the registers and the second call usually reads zero. No history is kept in the sysctl, so the sysctl also clobbers the statistics that are maintained at the non-sysctl level (only collisions and ifin drops for 5705-). The non-sysctl level understands the reset and does keep history, but this is defeated if the sysctl is used. There may also still be a generic problem with intrq drops. The default ip intrq length (sysctl net.inet.ip.intr_queue_maxlen) was too small by default (32 IIRC). Now it is larger by default (256), but 256 is still small if you have multiple NICs with rx ring sizes of hundreds or thousands. Direct dispatch reduces this problem. Further, if an intrq drop actually occurs, then it is only reported in generic ip statistics (net.inet.ip.intr_queue_drops); there is no sign of it in ierrors and no way to determine which interface it happened on. I use 1024 to ensure no drops with a single bge NIC. There is still a related design problem for intrq drops: packets that will be dropped should not even be passed to upper layers, to avoid unnecessary extra load on already-overloaded systems. There is related inefficiency of IFF_MONITOR mode: checking for this should be the very first thing in ether_input(), or at least before asking for cache misses by looking at packet headers, but the check is after mounds of code and at least 1 likely cache miss (for initializing etype unecessarily early). Intrq drops would be efficient if they occurred near the start of ether_input() too, but they occur up much further up the stack than the check for monitor mode. Bruce