From owner-freebsd-net@FreeBSD.ORG  Fri Mar  8 08:27:44 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 22F4F981;
 Fri,  8 Mar 2013 08:27:44 +0000 (UTC)
 (envelope-from jfvogel@gmail.com)
Received: from mail-ve0-f180.google.com (mail-ve0-f180.google.com
 [209.85.128.180]) by mx1.freebsd.org (Postfix) with ESMTP id B7AFABF;
 Fri,  8 Mar 2013 08:27:43 +0000 (UTC)
Received: by mail-ve0-f180.google.com with SMTP id jx10so1040939veb.25
 for <multiple recipients>; Fri, 08 Mar 2013 00:27:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=gdU2vJzm6t7YLjU3UDorWz5FHnY+HPa9McEz1yMCyCE=;
 b=qZZNNrVVstQ0I7K74WY6rwNkgy8oTnegeQiPmHhfgQwDkw2I3fqLOAQA9MPVrtWgUH
 dU8DH0mp2XmCHtpVoOnVIstJJk+K4i/LLiE7T5P1manDYF/wagbMYl0RZjc/4H2WZAue
 lXCe+p6qG+7tyTkrA11Og9rjHzBoxAsq5ZOZOcC/TNmh00MhQTmt9S0Mnh4C9lUwNyfI
 dl5YLvOKi+eDkUPxxfXUOBEzkbGJ/8T8ja/pfetZuhAQmluDH6fO4ANvUFBls+4ivu7i
 wP9BywpSFZLOkzQb+hPnoFeW2uZVtbVxmzf8sJk4bxuHIp6NH54qD56G0kX4hvpdWcOc
 Ge5w==
MIME-Version: 1.0
X-Received: by 10.58.56.161 with SMTP id b1mr588244veq.42.1362731257517; Fri,
 08 Mar 2013 00:27:37 -0800 (PST)
Received: by 10.220.191.132 with HTTP; Fri, 8 Mar 2013 00:27:37 -0800 (PST)
In-Reply-To: <20130308075458.GA1442@michelle.cdnetworks.com>
References: <20793.36593.774795.720959@hergotha.csail.mit.edu>
 <20130308075458.GA1442@michelle.cdnetworks.com>
Date: Fri, 8 Mar 2013 00:27:37 -0800
Message-ID: <CAFOYbckHDeuwmcPZzhewqrAju3GZ8er6nnTVgkNeVhvH4k=ydQ@mail.gmail.com>
Subject: Re: Limits on jumbo mbuf cluster allocation
From: Jack Vogel <jfvogel@gmail.com>
To: pyunyh@gmail.com
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: jfv@freebsd.org, freebsd-net@freebsd.org,
 Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Mar 2013 08:27:44 -0000

On Thu, Mar 7, 2013 at 11:54 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote:

> On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote:
> > I have a machine (actually six of them) with an Intel dual-10G NIC on
> > the motherboard.  Two of them (so far) are connected to a network
> > using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
> > allocates 32,000 9k clusters for its receive rings.  I have noticed,
> > on the machine that is an active NFS server, that it can get into a
> > state where allocating more 9k clusters fails (as reflected in the
> > mbuf failure counters) at a utilization far lower than the configured
> > limits -- in fact, quite close to the number allocated by the driver
> > for its rx ring.  Eventually, network traffic grinds completely to a
> > halt, and if one of the interfaces is administratively downed, it
> > cannot be brought back up again.  There's generally plenty of physical
> > memory free (at least two or three GB).
> >
> > There are no console messages generated to indicate what is going on,
> > and overall UMA usage doesn't look extreme.  I'm guessing that this is
> > a result of kernel memory fragmentation, although I'm a little bit
> > unclear as to how this actually comes about.  I am assuming that this
> > hardware has only limited scatter-gather capability and can't receive
> > a single packet into multiple buffers of a smaller size, which would
> > reduce the requirement for two-and-a-quarter consecutive pages of KVA
> > for each packet.  In actual usage, most of our clients aren't on a
> > jumbo network, so most of the time, all the packets will fit into a
> > normal 2k cluster, and we've never observed this issue when the
> > *server* is on a non-jumbo network.
> >
>
> AFAIK all Intel controllers generate jumbo frame by concatenating
> multiple mbufs on RX side so there is no physically contiguous 9KB
> allocation. I vaguely guess there could be mbuf leakage when jumbo
> frame is enabled. I would check how driver handles mbuf shortage or
> frame errors while mbuf concatenation for jumbo frame is in
> progress.
>

No, this is not true, if using a 9K jumbo it will actually use the larger
mbuf pool, the code has been this way for a little while now.

Jack


>
> > Does anyone have suggestions for dealing with this issue?  Will
> > increasing the amount of KVA (to, say, twice physical memory) help
> > things?  It seems to me like a bug that these large packets don't have
> > their own submap to ensure that allocation is always possible when
> > sufficient physical pages are available.
> >
> > -GAWollman
>