From owner-freebsd-net@FreeBSD.ORG Thu Feb 13 07:57:06 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 573C6DD2; Thu, 13 Feb 2014 07:57:06 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 30A4F15B2; Thu, 13 Feb 2014 07:57:05 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s1D7upI7085332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 12 Feb 2014 23:56:51 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s1D7up1i085331; Wed, 12 Feb 2014 23:56:51 -0800 (PST) (envelope-from jmg) Date: Wed, 12 Feb 2014 23:56:51 -0800 From: John-Mark Gurney To: Garrett Wollman Subject: Re: Use of contiguous physical memory in cxgbe driver Message-ID: <20140213075651.GY34851@funkthat.com> Mail-Followup-To: Garrett Wollman , John Baldwin , FreeBSD Net References: <21216.22944.314697.179039@hergotha.csail.mit.edu> <201402111348.52135.jhb@freebsd.org> <201402121446.19278.jhb@freebsd.org> <21244.20212.423983.960018@hergotha.csail.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <21244.20212.423983.960018@hergotha.csail.mit.edu> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Wed, 12 Feb 2014 23:56:52 -0800 (PST) Cc: FreeBSD Net , John Baldwin X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Feb 2014 07:57:06 -0000 Garrett Wollman wrote this message on Wed, Feb 12, 2014 at 23:49 -0500: > < said: > > > Is this because UMA keeps lots of mbufs cached in your workload? > > The physmem buddy allocator certainly seeks to minimize > > fragmentation. However, it can't go yank memory out of UMA caches > > to do so. > > It's not just UMA caches: there are TCP queues, interface queues, the > NFS request "cache", and elsewhere. I first discovered this problem > in the NFS context: what happens is that you build up very large TCP > send buffers (NFS forces the socket buffers to 2M) for many clients > (easy if the server is dedicated 10G and the clients are all on shared > 1G links). The NIC is eventually unable to replenish its receive > ring, and everything just stops. Eventually, the TCP connections time > out, the buffers are freed, and the server mysteriously starts working > again. (Actually, the last bit never happens in production. It's > more like: Eventually, the users start filing trouble tickets, then > Nagios starts paging the sysadmins, then someone does a hard reset > because that's the fastest way to recover. And then they blame me.) This is an issue that most ethernet drivers have in that they require the ability to fetch a new mbuf to replace the received one instead of delaying the replacement till later... If the driver allowed the receive ring to be "missing" a few buffers that are potentially filled upon next RX, it would allow the machine forward progress and possibly free up a ton of mbufs... Maybe that dropped frame is an ack that will free up 10 or more mbufs, but we'll never know since we just drop it on the floor... Though we might want to keep a few mbufs reserved for receive now that you mention it... We should never get to the point where we can't allocate even one frame for receive... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."