From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 17:49:52 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77154106564A for ; Sun, 20 Mar 2011 17:49:52 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 074B68FC20 for ; Sun, 20 Mar 2011 17:49:51 +0000 (UTC) Received: by fxm11 with SMTP id 11so5794286fxm.13 for ; Sun, 20 Mar 2011 10:49:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=IaNIhGosepu9RIQRwXo3DxDgmZJ94Ikg25Mf2Sbi/x8=; b=TktFDJZvHZUAiTG+yLEgmZR59tyV6ONaFr/egwyGeEUq/Fe8rb3ktnPisSePqRbaS3 ZS9NTiwbgPMXDC8w4FRHZRs45WZeydDY3y83aPtkR4AwmEw7hNwj8mDg5/JH05iChlet vSzRsJMV0Us5iZQGh0ZqgKpzoJ4hcDRSg8HAQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=xGrL/LYXlOEByzV0ZnQrh9aSTgEt7gn5Lgtoii4/mLEWJO11M9S9UhopoOE5kRhrIR aO0WZjZe/l1sLOW6s83OF5THc/usRRLcbVAvdD037FxMcF+FAzzm9DvYXuB79+YnX1qA GpU7xCDJTR5OdvDOCYtDRqvDueuGeGRPqob3g= MIME-Version: 1.0 Received: by 10.223.6.11 with SMTP id 11mr3802363fax.101.1300641853282; Sun, 20 Mar 2011 10:24:13 -0700 (PDT) Received: by 10.223.115.148 with HTTP; Sun, 20 Mar 2011 10:24:13 -0700 (PDT) In-Reply-To: <281E39E0-55D0-4B52-9CD9-F437442B67EC@neville-neil.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <281E39E0-55D0-4B52-9CD9-F437442B67EC@neville-neil.com> Date: Sun, 20 Mar 2011 12:24:13 -0500 Message-ID: From: Alan Cox To: George Neville-Neil Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: arch@freebsd.org, Navdeep Parhar Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 17:49:52 -0000 On Sat, Mar 19, 2011 at 10:47 PM, George Neville-Neil wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Mar 20, 2011, at 08:13 , Navdeep Parhar wrote: > > > On Fri, Mar 18, 2011 at 11:37 PM, George Neville-Neil > > wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Howdy, > >> > >> I believe it's time for us to upgrade our sysctl values for TCP sockets > so that > >> they are more in line with the modern world. At the moment we have > these limits on > >> our buffering: > >> > >> kern.ipc.maxsockbuf: 262144 > >> net.inet.tcp.recvbuf_max: 262144 > >> net.inet.tcp.sendbuf_max: 262144 > >> > >> I believe it's time to up these values to something that's in line with > higher speed > >> local networks, such as 10G. Perhaps it's time to move these to 2MB > instead of 256K. > >> > >> Thoughts? > > > > 256KB seems adequate for 10G (as long as the consumer can keep > > draining the socket rcv buffer fast enough). If you consider 2 x > > bandwidth delay product to be a reasonable socket buffer size then > > 256K allows for 10G networks with ~100ms delays. Normally the delay > > is _way_ less than this for 10G and even 256K may be an overkill (but > > this is ok, the kernel has tcp_do_autorcvbuf on by default) > > > > While we're here discussing defaults, what about nmbclusters and > > nmbjumboXX? Now those haven't kept up with modern machines (imho). > > > > Yes we should also up the nmbclusters, IMHO, but I wasn't going to > put that in the same bucket with the TCP buffers just yet. > On 64 bit/large memory machines you could make the nmbclusters > far higher than our current default. I know people who just set > that to 1,000,000 by default. > > If people are also happy to up nmbclusters I'm willing to conflate > that with this. > > A more modest but nonetheless significant increase could also be possible on i386 machines. If you go back to r129906, wherein we switched to using UMA for allocating mbufs and mbuf clusters, and read it carefully, you'll find that there was a subtle mistake made in the changes to the sizing of the kmem_map, or the "kernel heap". Prior to r129906, the overall size of the kmem map was based on the limits on mbufs and mbuf clusters PLUS the amount of kernel heap that was desired for everything else. After r129906, the limits on mbufs and mbuf clusters no longer made any difference to the size of the kmem map. The reason being that the limit on mbuf clusters was factored into the autosizing too early. It is added to the minimum "kernel heap" size, not the desired size. So, the end result is that mbufs, mbuf clusters, and everything else were made to compete for a smaller kmem map. In short, r129906 should have increased VM_KMEM_SIZE_MAX from its current limit of 320MB. I'd be curious if people running i386-based network servers have any problems with using #ifndef VM_KMEM_SIZE_MAX #define VM_KMEM_SIZE_MAX ((VM_MAX_KERNEL_ADDRESS - \ VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5) #endif in place of #ifndef VM_KMEM_SIZE_MAX #define VM_KMEM_SIZE_MAX (320 * 1024 * 1024) #endif Really, the only downside to this change is that it reduces the available kernel virtual address space for thread stacks and 9 and 16KB jumbo frames. Alan