From owner-freebsd-net@FreeBSD.ORG Sat Mar 24 21:18:04 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB505106567F; Sat, 24 Mar 2012 21:18:04 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mx1.freebsd.org (Postfix) with ESMTP id A227A8FC1A; Sat, 24 Mar 2012 21:18:03 +0000 (UTC) Received: by wibhr17 with SMTP id hr17so2743222wib.1 for ; Sat, 24 Mar 2012 14:17:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=qqw+E28ybgHipYM9OGFzuBu+5Pbrvsj40rlhPx4mTvc=; b=OvKLgSbAtypaDBCriVg0NCqFEv4VgxmbbktNLQeQiTlJsoZWpHuDo/ZuiS8oPVzwLI jt5QQs3iIjOMclZMVDEc+Uyr/C8QP3DeGlS2zUR3U8HoT/TBeQfhsbvYO0SOYdM7RXqP AlipTh8rWP72VKvdHz41Zz8J1wj2dTRHaU3ikJqrPwtyhCuLegsl94+98mTD/Gb5r7kB HfPnB0oaAakLa2Dri/YCXm+mTIyeM63pLRaSSpi6RFOGsaT1eY5yYtBnwYhvmNZv0Muw 762lxR+xSIO05UKoVgnjV8zQ1PL6+cvKZs5Y2djdh8aW0HbgU7zaEFaeKLQ0QlzEBFFE x2vQ== MIME-Version: 1.0 Received: by 10.180.82.132 with SMTP id i4mr7071099wiy.12.1332623877495; Sat, 24 Mar 2012 14:17:57 -0700 (PDT) Received: by 10.180.82.168 with HTTP; Sat, 24 Mar 2012 14:17:57 -0700 (PDT) In-Reply-To: References: <20120222205231.GA81949@onelab2.iet.unipi.it> <1329944986.2621.46.camel@bwh-desktop> <20120222214433.GA82582@onelab2.iet.unipi.it> <134564BB-676B-49BB-8BDA-6B8EB8965969@netasq.com> <20120324200853.GE2253@funkthat.com> Date: Sat, 24 Mar 2012 14:17:57 -0700 Message-ID: From: Jack Vogel To: Juli Mallett Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: nmbclusters: how do we want to fix this for 8.3 ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Mar 2012 21:18:04 -0000 This whole issue only came up on a system with 10G devices, and only igb does anything like you're talking about, not a device/driver on most low en= d systems. So, we are trading red herrings it would seem. I'm not opposed to economizing things in a sensible way, it was I that brought the issue up after all :) Jack On Sat, Mar 24, 2012 at 2:02 PM, Juli Mallett wrote: > On Sat, Mar 24, 2012 at 13:33, Jack Vogel wrote: > > On Sat, Mar 24, 2012 at 1:08 PM, John-Mark Gurney > wrote: > >> If we had some sort of tuning algorithm that would keep track of the > >> current receive queue usage depth, and always keep enough mbufs on the > >> queue to handle the largest expected burst of packets (either > historical, > >> or by looking at largest tcp window size, etc), this would both improv= e > >> memory usage, and in general reduce the number of require mbufs on the > >> system... If you have fast processors, you might be able to get away > with > >> less mbufs since you can drain the receive queue faster, but on slower > >> systems, you would use more mbufs. > > > > These are the days when machines might have 64 GIGABYTES of main storag= e, > > so having sufficient memory to run high performance networking seems > little > > to > > ask. > > I think the suggestion is that this should be configurable. FreeBSD > is also being used on systems, in production, doing networking-related > tasks, with <128MB of RAM. And it works fine, more or less. > > >> This tuning would also fix the problem of interfaces not coming up sin= ce > >> at boot, each interface might only allocate 128 or so mbufs, and then > >> dynamicly grow as necessary... > > > > You want modern fast networked servers but only giving them 128 mbufs, > > ya right , allocating memory takes time, so when you do this people wil= l > > whine about latency :) > > Allocating memory doesn't have to take much time. A multi-queue > driver could steal mbufs from an underutilized queue. It could grow > the number of descriptors based on load. Some of those things are > hard to implement in the first place and harder to cover the corner > cases of, but not all. > > > When you start pumping 10G...40G...100G ...the scale of the system > > is different, thinking in terms of the old 10Mb or 100Mb days just > doesn't > > work. > > This is a red herring. Yes, some systems need to do 40/100G. They > require special tuning. The default shouldn't assume that everyone's > getting maximum pps. This seems an especially-silly argument when > much of the silicon available can't even keep up with maximum packet > rates with minimally-sized frames, at 10G or even at 1G. > > But again, 1G NICs are the default now. Does every FreeBSD system > with a 1G NIC have loads of memory? No. I have an Atheros system > with 2 1G NICs and 256MB of RAM. It can't do anything at 1gbps. Not > even drop packets. Why should its memory usage model be tuned for > something it can't do? > > I'm not saying it should be impossible to allocate a bajillion > gigaquads of memory to receive rings, I certainly do it myself all the > time. But choosing defaults is a tricky thing, and systems that are > "pumping 10G" need other tweaks anyway, whether that's enabling > forwarding or something else. Because they have to be configured for > the task that they are to do. If part of that is increasing the > number of receive descriptors (as the Intel drivers already allow us > to do =97 thanks, Jack) and the number of queues, is that such a bad > thing? I really don't think it makes sense for my 8-core system or my > 16-core system to come up with 8- or 16-queues *per interface*. That > just doesn't make sense. 8/N or 16/N queues where N is the number of > interfaces makes more sense under heavy load. 1 queue per port is > *ideal* if a single core can handle the load of that interface. > > > Sorry but the direction is to scale everything, not pare back on the > network > > IMHO. > > There is not just one direction. There is not just one point of > scaling. Relatively-new defaults do not necessarily have to be > increased in the future. I mean, should a 1G NIC use 64 queues on a > 64-core system that can do 100gbps @ 64 bytes on one core? It's > actively-harmful to performance. The answer to "what's the most > sensible default?" is not "what does a system that just forwards > packets need?" A system that just forwards packets already needs IPs > configured and a sysctl set. If we make it easier to change the > tuning of the system for that scenario, then nobody's going to care > what our defaults are, or think us "slow" for them. >