From owner-freebsd-net@FreeBSD.ORG Sat Mar 24 21:02:22 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D9EE21065672 for ; Sat, 24 Mar 2012 21:02:22 +0000 (UTC) (envelope-from juli@clockworksquid.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5CE688FC1A for ; Sat, 24 Mar 2012 21:02:21 +0000 (UTC) Received: by wern13 with SMTP id n13so4529572wer.13 for ; Sat, 24 Mar 2012 14:02:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=DL3Qz/i3T/gmv2anBMpoh4sy7qMVgA31sOm8G+XkxqA=; b=Wsbtw1j6lPkPBPLSUsP2SwC7iAHIZAushb6bsMrKuRM+qhLPe/KLscaa0CjHGIkGYq jyMo4HC3Q++BY9aDx25sHzHrVmh8P1pAK5Gw8EJ1PTemjxvEZd4l5a5n0IX03IpMUBYU 355REYV4GV/P8GVaUPmmGmbmz3jWp4OUt58rYP6LLIG6yim1Fdzv4vL2XhF+SI616sNI cYj9ASAbEP1atznFPzaWOJIHb/v0Fk9APri9qE7IEHqcAQHPhUofhd+KbexaQnKiZg/b j/vdu1ZcUFIStdD/EQc9oI1KYOwHh/Tix2lI9V8Pv5QDwRXDFyxaBAmfcDOIFl09E7pf C4jg== Received: by 10.180.83.198 with SMTP id s6mr7010712wiy.8.1332622941120; Sat, 24 Mar 2012 14:02:21 -0700 (PDT) MIME-Version: 1.0 Sender: juli@clockworksquid.com Received: by 10.180.106.198 with HTTP; Sat, 24 Mar 2012 14:02:00 -0700 (PDT) In-Reply-To: References: <20120222205231.GA81949@onelab2.iet.unipi.it> <1329944986.2621.46.camel@bwh-desktop> <20120222214433.GA82582@onelab2.iet.unipi.it> <134564BB-676B-49BB-8BDA-6B8EB8965969@netasq.com> <20120324200853.GE2253@funkthat.com> From: Juli Mallett Date: Sat, 24 Mar 2012 14:02:00 -0700 X-Google-Sender-Auth: dl6dFl2QxyK_oibODQquPO2Dy0g Message-ID: To: Jack Vogel Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQkboCpRM9bGQYCrRJNiKopwdJItzkcMsOeVwCFWxxKtc4pGhqIPRR/Tyg9F7W3e/q+DawRX Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: nmbclusters: how do we want to fix this for 8.3 ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Mar 2012 21:02:22 -0000 On Sat, Mar 24, 2012 at 13:33, Jack Vogel wrote: > On Sat, Mar 24, 2012 at 1:08 PM, John-Mark Gurney wrot= e: >> If we had some sort of tuning algorithm that would keep track of the >> current receive queue usage depth, and always keep enough mbufs on the >> queue to handle the largest expected burst of packets (either historical= , >> or by looking at largest tcp window size, etc), this would both improve >> memory usage, and in general reduce the number of require mbufs on the >> system... =C2=A0If you have fast processors, you might be able to get aw= ay with >> less mbufs since you can drain the receive queue faster, but on slower >> systems, you would use more mbufs. > > These are the days when machines might have 64 GIGABYTES of main storage, > so having sufficient memory to run high performance networking seems litt= le > to > ask. I think the suggestion is that this should be configurable. FreeBSD is also being used on systems, in production, doing networking-related tasks, with <128MB of RAM. And it works fine, more or less. >> This tuning would also fix the problem of interfaces not coming up since >> at boot, each interface might only allocate 128 or so mbufs, and then >> dynamicly grow as necessary... > > You want modern fast networked servers but only giving them 128 mbufs, > ya right , allocating memory takes time, so when you do this people will > whine about latency :) Allocating memory doesn't have to take much time. A multi-queue driver could steal mbufs from an underutilized queue. It could grow the number of descriptors based on load. Some of those things are hard to implement in the first place and harder to cover the corner cases of, but not all. > When you start pumping 10G...40G...100G ...the scale of the system > is different, thinking in terms of the old 10Mb or 100Mb days just doesn'= t > work. This is a red herring. Yes, some systems need to do 40/100G. They require special tuning. The default shouldn't assume that everyone's getting maximum pps. This seems an especially-silly argument when much of the silicon available can't even keep up with maximum packet rates with minimally-sized frames, at 10G or even at 1G. But again, 1G NICs are the default now. Does every FreeBSD system with a 1G NIC have loads of memory? No. I have an Atheros system with 2 1G NICs and 256MB of RAM. It can't do anything at 1gbps. Not even drop packets. Why should its memory usage model be tuned for something it can't do? I'm not saying it should be impossible to allocate a bajillion gigaquads of memory to receive rings, I certainly do it myself all the time. But choosing defaults is a tricky thing, and systems that are "pumping 10G" need other tweaks anyway, whether that's enabling forwarding or something else. Because they have to be configured for the task that they are to do. If part of that is increasing the number of receive descriptors (as the Intel drivers already allow us to do =E2=80=94 thanks, Jack) and the number of queues, is that such a bad thing? I really don't think it makes sense for my 8-core system or my 16-core system to come up with 8- or 16-queues *per interface*. That just doesn't make sense. 8/N or 16/N queues where N is the number of interfaces makes more sense under heavy load. 1 queue per port is *ideal* if a single core can handle the load of that interface. > Sorry but the direction is to scale everything, not pare back on the netw= ork > IMHO. There is not just one direction. There is not just one point of scaling. Relatively-new defaults do not necessarily have to be increased in the future. I mean, should a 1G NIC use 64 queues on a 64-core system that can do 100gbps @ 64 bytes on one core? It's actively-harmful to performance. The answer to "what's the most sensible default?" is not "what does a system that just forwards packets need?" A system that just forwards packets already needs IPs configured and a sysctl set. If we make it easier to change the tuning of the system for that scenario, then nobody's going to care what our defaults are, or think us "slow" for them.