From owner-freebsd-net@FreeBSD.ORG Fri Mar 8 07:55:13 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C51139D; Fri, 8 Mar 2013 07:55:13 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pb0-f52.google.com (mail-pb0-f52.google.com [209.85.160.52]) by mx1.freebsd.org (Postfix) with ESMTP id 825B2F0F; Fri, 8 Mar 2013 07:55:13 +0000 (UTC) Received: by mail-pb0-f52.google.com with SMTP id ma3so1014403pbc.25 for ; Thu, 07 Mar 2013 23:55:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:date:to:cc:subject:message-id:reply-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=VcQqfCnSkyv44aLqzTPHNhhkPzmU9WWa0b9o1VF+2AM=; b=B+7qkk2cAAsDNNCFlYxomD8Wg2/AktnAfQ6PgeLEKkgZb6O//Kjvvbflh/najsTJnN xjysmxEnz181fZQOwXs2puMpimAsoV5cLC6/Ax5PkgoOeJLhuzwxh+sS3MeH1bYz0NbY YIlxCyXswFPzFDdjgpLVKsqKX6cjR02aW/GZsEFUoMHl0dDdU/+XFYZ5A+7Wm0Zivd6d pFEFvGvDwTCG9QkUjZQYN9JxfbxwvkiJ6/owPv93nofxkypwv4h5hwYXQy+HToePrhyq rfqXMTTaAhBoL/7Ytca1AVGX7TWq2W1X9FP4Jixpfvo9qxoFqUOyih4boYpkyhzsjI/Q ohtA== X-Received: by 10.67.11.4 with SMTP id ee4mr2670500pad.107.1362729307415; Thu, 07 Mar 2013 23:55:07 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPS id av14sm5355178pac.18.2013.03.07.23.55.03 (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 07 Mar 2013 23:55:06 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Fri, 08 Mar 2013 16:54:58 +0900 From: YongHyeon PYUN Date: Fri, 8 Mar 2013 16:54:58 +0900 To: Garrett Wollman Subject: Re: Limits on jumbo mbuf cluster allocation Message-ID: <20130308075458.GA1442@michelle.cdnetworks.com> References: <20793.36593.774795.720959@hergotha.csail.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20793.36593.774795.720959@hergotha.csail.mit.edu> User-Agent: Mutt/1.4.2.3i Cc: jfv@freebsd.org, freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 07:55:13 -0000 On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote: > I have a machine (actually six of them) with an Intel dual-10G NIC on > the motherboard. Two of them (so far) are connected to a network > using jumbo frames, with an MTU a little under 9k, so the ixgbe driver > allocates 32,000 9k clusters for its receive rings. I have noticed, > on the machine that is an active NFS server, that it can get into a > state where allocating more 9k clusters fails (as reflected in the > mbuf failure counters) at a utilization far lower than the configured > limits -- in fact, quite close to the number allocated by the driver > for its rx ring. Eventually, network traffic grinds completely to a > halt, and if one of the interfaces is administratively downed, it > cannot be brought back up again. There's generally plenty of physical > memory free (at least two or three GB). > > There are no console messages generated to indicate what is going on, > and overall UMA usage doesn't look extreme. I'm guessing that this is > a result of kernel memory fragmentation, although I'm a little bit > unclear as to how this actually comes about. I am assuming that this > hardware has only limited scatter-gather capability and can't receive > a single packet into multiple buffers of a smaller size, which would > reduce the requirement for two-and-a-quarter consecutive pages of KVA > for each packet. In actual usage, most of our clients aren't on a > jumbo network, so most of the time, all the packets will fit into a > normal 2k cluster, and we've never observed this issue when the > *server* is on a non-jumbo network. > AFAIK all Intel controllers generate jumbo frame by concatenating multiple mbufs on RX side so there is no physically contiguous 9KB allocation. I vaguely guess there could be mbuf leakage when jumbo frame is enabled. I would check how driver handles mbuf shortage or frame errors while mbuf concatenation for jumbo frame is in progress. > Does anyone have suggestions for dealing with this issue? Will > increasing the amount of KVA (to, say, twice physical memory) help > things? It seems to me like a bug that these large packets don't have > their own submap to ensure that allocation is always possible when > sufficient physical pages are available. > > -GAWollman