From owner-freebsd-net@FreeBSD.ORG Thu Jan 17 16:45:06 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E6165FEF for ; Thu, 17 Jan 2013 16:45:06 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 6F4A0E3B for ; Thu, 17 Jan 2013 16:45:06 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 3E9227300A; Thu, 17 Jan 2013 17:44:32 +0100 (CET) Date: Thu, 17 Jan 2013 17:44:32 +0100 From: Luigi Rizzo To: Barney Cordoba Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() Message-ID: <20130117164432.GA64619@onelab2.iet.unipi.it> References: <20130117025502.GA57613@onelab2.iet.unipi.it> <1358438932.56236.YahooMailClassic@web121601.mail.ne1.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1358438932.56236.YahooMailClassic@web121601.mail.ne1.yahoo.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jan 2013 16:45:07 -0000 On Thu, Jan 17, 2013 at 08:08:52AM -0800, Barney Cordoba wrote: > > > --- On Wed, 1/16/13, Luigi Rizzo wrote: > > > From: Luigi Rizzo > > Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() > > To: "Barney Cordoba" > > Cc: freebsd-net@freebsd.org > > Date: Wednesday, January 16, 2013, 9:55 PM > > On Wed, Jan 16, 2013 at 06:19:01AM > > -0800, Barney Cordoba wrote: > > > > > > > > > --- On Tue, 1/15/13, Luigi Rizzo > > wrote: > > > > > > > From: Luigi Rizzo > > > > Subject: two problems in > > dev/e1000/if_lem.c::lem_handle_rxtx() > > > > To: head@freebsd.org, > > "freebsd-net@freebsd.org" > > > > > > Cc: "Jack Vogel" > > > > Date: Tuesday, January 15, 2013, 8:23 PM > > > > Hi, > > > > i found a couple of problems in > > > > ? ? ? ? > > > > dev/e1000/if_lem.c::lem_handle_rxtx() , > > > > (compare with dev/e1000/if_em.c::em_handle_que() > > for better > > > > understanding): > > > > > > > > 1. in if_em.c::em_handle_que(), when em_rxeof() > > exceeds the > > > > ? rx_process_limit, the task is rescheduled so it > > can > > > > complete the work. > > > > ? Conversely, in if_lem.c::lem_handle_rxtx() the > > > > lem_rxeof() is > > > > ? only run once, and if there are more pending > > packets > > > > the only > > > > ? chance to drain them is to receive (many) more > > > > interrupts. > > > > > > > > ? This is a relatively serious problem, because > > the > > > > receiver has > > > > ? a hard time recovering. > > > > > > > > ? I'd like to commit a fix to this same as it is > > done > > > > in e1000. > > > > > > > > 2. in if_em.c::em_handle_que(), interrupts are > > reenabled > > > > unconditionally, > > > > ???whereas lem_handle_rxtx() only enables > > > > them if IFF_DRV_RUNNING is set. > > > > > > > > ???I cannot really tell what is the correct > > > > way here, so I'd like > > > > ???to put a comment there unless there is a > > > > good suggestion on > > > > ???what to do. > > > > > > > > ???Accesses to the intr register are > > > > race-prone anyways > > > > ???(disabled in fastintr, enabled in the rxtx > > > > task without > > > > ???holding any lock, and generally accessed > > > > under EM_CORE_LOCK > > > > ???in other places), and presumably > > > > enabling/disabling the > > > > ???interrupts around activations of the taks > > > > is just an > > > > ???optimization (and on a VM, it is actually > > > > a pessimization > > > > ???due to the huge cost of VM exits). > > > > > > > > cheers > > > > luigi > > > > > > This is not really a big deal; this is how things works > > for a million > > > years before we had task queues. > > > > i agree that the second issue is not a big deal. > > > > The first one, on the contrary, is a real problem no matter > > how you set the 'work' parameter (unless you make it large > > enough to drain the entire queue in one call). > > Which should be the goal, except in extreme circumstances. Having more > packets than "work" should be the extreme case and not the norm. not really. . Once "work" is O(100), you have already heavily amortized all the constant costs related to scheduling, and performance improvements are marginal if you increase it further. Instead, you start to have extreme burstiness everywhere -- on the CPU, on the input queues for sockets/shapers, on the CPU (the thread running the handler cannot do anything else), and even on the NIC unless you advance the RDT (receive queue tail) a little bit at a tie to make room for newly incoming packets. So you really want to yield every now and then to smooth the bursts. I should mention i am running tests in such "extreme circumstances", bombing the guest in kvm with over 400Kpps (maximum-sized frames). And yes it can handle them. Hopefully you'll hear about this soon :) cheers luigi