From owner-freebsd-net@FreeBSD.ORG Wed Apr 29 05:08:04 2015 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AAA56441 for ; Wed, 29 Apr 2015 05:08:04 +0000 (UTC) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 687D317C8 for ; Wed, 29 Apr 2015 05:08:04 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.9/8.14.9) with ESMTP id t3T581RE047109; Wed, 29 Apr 2015 01:08:01 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.9/8.14.4/Submit) id t3T580cr047106; Wed, 29 Apr 2015 01:08:00 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21824.26416.855441.21454@hergotha.csail.mit.edu> Date: Wed, 29 Apr 2015 01:08:00 -0400 From: Garrett Wollman To: Rick Macklem Cc: Mark Schouten , freebsd-net@FreeBSD.org Subject: Re: Frequent hickups on the networking layer In-Reply-To: <137094161.27589033.1430255162390.JavaMail.root@uoguelph.ca> References: <4281350517-9417@kerio.tuxis.nl> <137094161.27589033.1430255162390.JavaMail.root@uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (hergotha.csail.mit.edu [127.0.0.1]); Wed, 29 Apr 2015 01:08:01 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED, HEADER_FROM_DIFFERENT_DOMAINS autolearn=disabled version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on hergotha.csail.mit.edu X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Apr 2015 05:08:04 -0000 < said: > There have been email list threads discussing how allocating 9K jumbo > mbufs will fragment the KVM (kernel virtual memory) used for mbuf > cluster allocation and cause grief. The problem is not KVA fragmentation -- the clusters come from a separate map which should prevent that -- it's that clusters have to be physically contiguous, and an active machine is going to have trouble with that. The fact that 9k is a goofy size (two pages plus a little bit) doesn't help matters. The other side, as Neel and others have pointed out, is that it's beneficial for the hardware to have a big chunk of physically contiguous memory to dump packets into, especially with various kinds of receive-side offloading. I see two solutions to this, but don't have the time or resources (or, frankly, the need) to implement them (and both are probably required for different situations): 1) Reserve a big chunk of physical memory early on for big clusters. How much this needs to be will depend on the application and the particular network interface hardware, but you should be thinking in terms of megabytes or (on a big server) gigabytes. Big enough to be mapped as superpages on hardware where that's beneficial. If you have aggressive LRO, "big clusters" might be 64k or larger in size. 2) Use the IOMMU -- if it's available, which it won't be when running under a hypervisor that's already using it for passthrough -- to obviate the need for physically contiguous pages; then the problem reduces to KVA fragmentation, which is easier to avoid in the allocator. > As far as I know (just from email discussion, never used them myself), > you can either stop using jumbo packets or switch to a different net > interface that doesn't allocate 9K jumbo mbufs (doing the receives of > jumbo packets into a list of smaller mbuf clusters). Or just hack the driver to not use them. For the Intel drivers this is easy, and at least for the hardware I have there's no benefit to using 9k clusters over 4k; for Chelsio it's quite a bit harder. -GAWollman