From owner-freebsd-net@FreeBSD.ORG  Fri Mar  8 07:55:13 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C51139D;
 Fri,  8 Mar 2013 07:55:13 +0000 (UTC)
 (envelope-from pyunyh@gmail.com)
Received: from mail-pb0-f52.google.com (mail-pb0-f52.google.com
 [209.85.160.52]) by mx1.freebsd.org (Postfix) with ESMTP id 825B2F0F;
 Fri,  8 Mar 2013 07:55:13 +0000 (UTC)
Received: by mail-pb0-f52.google.com with SMTP id ma3so1014403pbc.25
 for <multiple recipients>; Thu, 07 Mar 2013 23:55:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:from:date:to:cc:subject:message-id:reply-to:references
 :mime-version:content-type:content-disposition:in-reply-to
 :user-agent; bh=VcQqfCnSkyv44aLqzTPHNhhkPzmU9WWa0b9o1VF+2AM=;
 b=B+7qkk2cAAsDNNCFlYxomD8Wg2/AktnAfQ6PgeLEKkgZb6O//Kjvvbflh/najsTJnN
 xjysmxEnz181fZQOwXs2puMpimAsoV5cLC6/Ax5PkgoOeJLhuzwxh+sS3MeH1bYz0NbY
 YIlxCyXswFPzFDdjgpLVKsqKX6cjR02aW/GZsEFUoMHl0dDdU/+XFYZ5A+7Wm0Zivd6d
 pFEFvGvDwTCG9QkUjZQYN9JxfbxwvkiJ6/owPv93nofxkypwv4h5hwYXQy+HToePrhyq
 rfqXMTTaAhBoL/7Ytca1AVGX7TWq2W1X9FP4Jixpfvo9qxoFqUOyih4boYpkyhzsjI/Q
 ohtA==
X-Received: by 10.67.11.4 with SMTP id ee4mr2670500pad.107.1362729307415;
 Thu, 07 Mar 2013 23:55:07 -0800 (PST)
Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249])
 by mx.google.com with ESMTPS id av14sm5355178pac.18.2013.03.07.23.55.03
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Thu, 07 Mar 2013 23:55:06 -0800 (PST)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
 Fri, 08 Mar 2013 16:54:58 +0900
From: YongHyeon PYUN <pyunyh@gmail.com>
Date: Fri, 8 Mar 2013 16:54:58 +0900
To: Garrett Wollman <wollman@freebsd.org>
Subject: Re: Limits on jumbo mbuf cluster allocation
Message-ID: <20130308075458.GA1442@michelle.cdnetworks.com>
References: <20793.36593.774795.720959@hergotha.csail.mit.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20793.36593.774795.720959@hergotha.csail.mit.edu>
User-Agent: Mutt/1.4.2.3i
Cc: jfv@freebsd.org, freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2013 07:55:13 -0000

On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote:
> I have a machine (actually six of them) with an Intel dual-10G NIC on
> the motherboard.  Two of them (so far) are connected to a network
> using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
> allocates 32,000 9k clusters for its receive rings.  I have noticed,
> on the machine that is an active NFS server, that it can get into a
> state where allocating more 9k clusters fails (as reflected in the
> mbuf failure counters) at a utilization far lower than the configured
> limits -- in fact, quite close to the number allocated by the driver
> for its rx ring.  Eventually, network traffic grinds completely to a
> halt, and if one of the interfaces is administratively downed, it
> cannot be brought back up again.  There's generally plenty of physical
> memory free (at least two or three GB).
> 
> There are no console messages generated to indicate what is going on,
> and overall UMA usage doesn't look extreme.  I'm guessing that this is
> a result of kernel memory fragmentation, although I'm a little bit
> unclear as to how this actually comes about.  I am assuming that this
> hardware has only limited scatter-gather capability and can't receive
> a single packet into multiple buffers of a smaller size, which would
> reduce the requirement for two-and-a-quarter consecutive pages of KVA
> for each packet.  In actual usage, most of our clients aren't on a
> jumbo network, so most of the time, all the packets will fit into a
> normal 2k cluster, and we've never observed this issue when the
> *server* is on a non-jumbo network.
> 

AFAIK all Intel controllers generate jumbo frame by concatenating
multiple mbufs on RX side so there is no physically contiguous 9KB
allocation. I vaguely guess there could be mbuf leakage when jumbo
frame is enabled. I would check how driver handles mbuf shortage or 
frame errors while mbuf concatenation for jumbo frame is in
progress.

> Does anyone have suggestions for dealing with this issue?  Will
> increasing the amount of KVA (to, say, twice physical memory) help
> things?  It seems to me like a bug that these large packets don't have
> their own submap to ensure that allocation is always possible when
> sufficient physical pages are available.
> 
> -GAWollman