From owner-freebsd-net@FreeBSD.ORG Thu Apr 23 20:45:49 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27DE010656F7 for ; Thu, 23 Apr 2009 20:45:49 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-bw0-f213.google.com (mail-bw0-f213.google.com [209.85.218.213]) by mx1.freebsd.org (Postfix) with ESMTP id 93ADB8FC14 for ; Thu, 23 Apr 2009 20:45:48 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by bwz9 with SMTP id 9so744018bwz.43 for ; Thu, 23 Apr 2009 13:45:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=sjonOR5JXZyWLn6IxG+Fn7wqJ4YRB/FMtt5tHXGw3b4=; b=F3inBYt82X05+8QaGKBGjepIyMkFOk7bOUr2XVdFtSRHc3MdnrW3q/T4k4CqZmZmig 02ksobmQLdzRlrUV3y89WKMRf70M70FFhZZ5hdcfs6xzCNSFXvSOc5Ltl8T9ZelYaHaa OA0bVy7C7gm4r2/5/o+JPFhpwxfsUbpvErd8U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=iKwHCqx/wA6A+J+EJidDAKy4815/8vhZstiSPxq/3WdK64S+24Ae8JO/TnQFqL9mqd eFqGGJWyvSrDHz6CK22UC+lBnPLYUPlm3/Ucm/sVmoKNCSIz2AONRRQ018QbPEwgTVjV q0COODFoqaKxbh7O0QjmCuiolmNKXittMnsvk= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.117.194 with SMTP id s2mr446256faq.83.1240517622560; Thu, 23 Apr 2009 13:13:42 -0700 (PDT) In-Reply-To: <20090423190408.GA65895@jem.dhs.org> References: <20090327071742.GA87385@onelab2.iet.unipi.it> <20090423190408.GA65895@jem.dhs.org> Date: Thu, 23 Apr 2009 22:13:42 +0200 X-Google-Sender-Auth: 8456fb9897301fbf Message-ID: <3bbf2fe10904231313o858b9e9v733564ee4f3d7d40@mail.gmail.com> From: Attilio Rao To: Ed Maste Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, Luigi Rizzo , Andrew Brampton Subject: Re: Interrupts + Polling mode (similar to Linux's NAPI) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2009 20:45:49 -0000 2009/4/23 Ed Maste : > On Fri, Mar 27, 2009 at 11:05:00AM +0000, Andrew Brampton wrote: > >> 2009/3/27 Luigi Rizzo : >> > The load of polling is pretty low (within 1% or so) even with >> > polling. The advantage of having interrupts is faster response >> > to incoming traffic, not CPU load. >> >> oh, I was under the impression that polling spun in a tight loop, thus >> using 100% of the processor. After a quick test I see this is not the >> case. I assume it will get to 100% CPU load if I saturate my network. > > Yes, polling has a limit on the maximum CPU time it will use, and also > will use less than the limit if there is no traffic. > > There are a number of sysctls under kern.polling that control its > behaviour: > > * kern.polling.user_frac: Desired user fraction of cpu time > > This attempts to reserve at least a specified percentage of available > CPU time for user processes; polling tries to limit its percentage use > to 100 less this value. > > * kern.polling.burst: Current polling burst size > * kern.polling.burst_max: Max Polling burst size > * kern.polling.each_burst: Max size of each burst > > These three control the number of packets that polling processes per > call / tick. =C2=A0Packets are processed in batches of each_burst, up to > burst packets total per tick. =C2=A0The value of burst is capped at > busrt_max. > > In order to keep the user_frac CPU percentage available for non-polling, > a feedback loop is used that controls the value of burst. =C2=A0Each time= a > bach of packets is processed, burst is incremented or decremented by 1, > depending on how much CPU time polling actually used. =C2=A0In addition, = if > polling extends beyond the next tick it's scaled back to 7/8ths of the > current value. > > Polling was originally implemented as a livelock-avoidance technique > for the single-core case -- the primary goal is to guarantee the > availability of CPU time specified in user_frac. =C2=A0The current algori= thm > does not behave that well if user_frac is set low. =C2=A0Setting it low i= s > reasonable if the workload is largely in-kernel (i.e., bridging or > routing), or when running SMP. > > Another downside of the current implementation is that interfaces will > be polled multiple times per tick (burst / each_burst times), even if > there are no packets to process. > > At work we've developed a replacement polling algorithm that keeps track > of the actual amount of time spent per packet, and uses that as the > feedback to control the number of packets in each batch. > > This work requires a change to the polling KPI: the polling handlers > have to return the count of packets actually handled. =C2=A0My hope is to= get > the KPI change committed in time for 8.0, even if we don't switch the > algorithm yet. =C2=A0Attilio (on CC:) and I will make the patch set for t= he > KPI change available shortly for comment. This is the KPI breakage patch: http://people.freebsd.org/~attilio/Sandvine/polling/polling_kpi.diff Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein