Date: Fri, 8 Mar 2013 17:39:32 +0900 From: YongHyeon PYUN <pyunyh@gmail.com> To: Jack Vogel <jfvogel@gmail.com> Cc: jfv@freebsd.org, freebsd-net@freebsd.org, Garrett Wollman <wollman@freebsd.org> Subject: Re: Limits on jumbo mbuf cluster allocation Message-ID: <20130308083932.GB1442@michelle.cdnetworks.com> In-Reply-To: <CAFOYbckHDeuwmcPZzhewqrAju3GZ8er6nnTVgkNeVhvH4k=ydQ@mail.gmail.com> References: <20793.36593.774795.720959@hergotha.csail.mit.edu> <20130308075458.GA1442@michelle.cdnetworks.com> <CAFOYbckHDeuwmcPZzhewqrAju3GZ8er6nnTVgkNeVhvH4k=ydQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 08, 2013 at 12:27:37AM -0800, Jack Vogel wrote: > On Thu, Mar 7, 2013 at 11:54 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote: > > > On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote: > > > I have a machine (actually six of them) with an Intel dual-10G NIC on > > > the motherboard. Two of them (so far) are connected to a network > > > using jumbo frames, with an MTU a little under 9k, so the ixgbe driver > > > allocates 32,000 9k clusters for its receive rings. I have noticed, > > > on the machine that is an active NFS server, that it can get into a > > > state where allocating more 9k clusters fails (as reflected in the > > > mbuf failure counters) at a utilization far lower than the configured > > > limits -- in fact, quite close to the number allocated by the driver > > > for its rx ring. Eventually, network traffic grinds completely to a > > > halt, and if one of the interfaces is administratively downed, it > > > cannot be brought back up again. There's generally plenty of physical > > > memory free (at least two or three GB). > > > > > > There are no console messages generated to indicate what is going on, > > > and overall UMA usage doesn't look extreme. I'm guessing that this is > > > a result of kernel memory fragmentation, although I'm a little bit > > > unclear as to how this actually comes about. I am assuming that this > > > hardware has only limited scatter-gather capability and can't receive > > > a single packet into multiple buffers of a smaller size, which would > > > reduce the requirement for two-and-a-quarter consecutive pages of KVA > > > for each packet. In actual usage, most of our clients aren't on a > > > jumbo network, so most of the time, all the packets will fit into a > > > normal 2k cluster, and we've never observed this issue when the > > > *server* is on a non-jumbo network. > > > > > > > AFAIK all Intel controllers generate jumbo frame by concatenating > > multiple mbufs on RX side so there is no physically contiguous 9KB > > allocation. I vaguely guess there could be mbuf leakage when jumbo > > frame is enabled. I would check how driver handles mbuf shortage or > > frame errors while mbuf concatenation for jumbo frame is in > > progress. > > > > No, this is not true, if using a 9K jumbo it will actually use the larger > mbuf pool, the code has been this way for a little while now. Ah, thanks for correcting me. If H/W is still able to support old style chaining like em(4), wouldn't it better to use that rather than allocating a 9KB buffer? Allocating a 9KB buffer to handle a pure TCP ACK segment looks inefficient. > > Jack > > > > > > > Does anyone have suggestions for dealing with this issue? Will > > > increasing the amount of KVA (to, say, twice physical memory) help > > > things? It seems to me like a bug that these large packets don't have > > > their own submap to ensure that allocation is always possible when > > > sufficient physical pages are available. > > > > > > -GAWollman > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130308083932.GB1442>