Date: Wed, 12 Feb 2014 23:56:51 -0800 From: John-Mark Gurney <jmg@funkthat.com> To: Garrett Wollman <wollman@bimajority.org> Cc: FreeBSD Net <freebsd-net@freebsd.org>, John Baldwin <jhb@freebsd.org> Subject: Re: Use of contiguous physical memory in cxgbe driver Message-ID: <20140213075651.GY34851@funkthat.com> In-Reply-To: <21244.20212.423983.960018@hergotha.csail.mit.edu> References: <21216.22944.314697.179039@hergotha.csail.mit.edu> <201402111348.52135.jhb@freebsd.org> <CAJ-VmonCdNQPUCQwm0OhqQ3Kt_7x6-g-JwGVZQfzWTgrDYfmqw@mail.gmail.com> <201402121446.19278.jhb@freebsd.org> <21244.20212.423983.960018@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Garrett Wollman wrote this message on Wed, Feb 12, 2014 at 23:49 -0500: > <<On Wed, 12 Feb 2014 14:46:19 -0500, John Baldwin <jhb@freebsd.org> said: > > > Is this because UMA keeps lots of mbufs cached in your workload? > > The physmem buddy allocator certainly seeks to minimize > > fragmentation. However, it can't go yank memory out of UMA caches > > to do so. > > It's not just UMA caches: there are TCP queues, interface queues, the > NFS request "cache", and elsewhere. I first discovered this problem > in the NFS context: what happens is that you build up very large TCP > send buffers (NFS forces the socket buffers to 2M) for many clients > (easy if the server is dedicated 10G and the clients are all on shared > 1G links). The NIC is eventually unable to replenish its receive > ring, and everything just stops. Eventually, the TCP connections time > out, the buffers are freed, and the server mysteriously starts working > again. (Actually, the last bit never happens in production. It's > more like: Eventually, the users start filing trouble tickets, then > Nagios starts paging the sysadmins, then someone does a hard reset > because that's the fastest way to recover. And then they blame me.) This is an issue that most ethernet drivers have in that they require the ability to fetch a new mbuf to replace the received one instead of delaying the replacement till later... If the driver allowed the receive ring to be "missing" a few buffers that are potentially filled upon next RX, it would allow the machine forward progress and possibly free up a ton of mbufs... Maybe that dropped frame is an ack that will free up 10 or more mbufs, but we'll never know since we just drop it on the floor... Though we might want to keep a few mbufs reserved for receive now that you mention it... We should never get to the point where we can't allocate even one frame for receive... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140213075651.GY34851>