From owner-freebsd-current@FreeBSD.ORG Fri Mar 13 00:35:51 2009 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41571106566B for ; Fri, 13 Mar 2009 00:35:51 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id C778E8FC0A for ; Fri, 13 Mar 2009 00:35:50 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.local ([192.168.254.200]) (authenticated bits=0) by pooker.samsco.org (8.14.2/8.14.2) with ESMTP id n2D0ZfOr045837; Thu, 12 Mar 2009 18:35:41 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <49B9AA5D.9070309@samsco.org> Date: Thu, 12 Mar 2009 18:35:41 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.13) Gecko/20080313 SeaMonkey/1.1.9 MIME-Version: 1.0 To: barney_cordoba@yahoo.com References: <857006.73926.qm@web63904.mail.re1.yahoo.com> In-Reply-To: <857006.73926.qm@web63904.mail.re1.yahoo.com> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.4 required=3.8 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: current@freebsd.org Subject: Re: Interrupt routine usage not shown by top in 8.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2009 00:35:51 -0000 Barney Cordoba wrote: > > > > --- On Thu, 3/12/09, Scott Long wrote: > >> From: Scott Long >> Subject: Re: Interrupt routine usage not shown by top in 8.0 >> To: barney_cordoba@yahoo.com >> Cc: current@freebsd.org >> Date: Thursday, March 12, 2009, 7:42 PM >> Barney Cordoba wrote: >>> I'm fireing 400Kpps at a udp blackhole port. >> I'm getting 6000 interrupts >>> per second on em3: >>> >>> testbox# vmstat -i; sleep 1; vmstat -i >>> interrupt total rate >>> irq1: atkbd0 1 0 >>> irq6: fdc0 1 0 >>> irq17: uhci1+ 2226 9 >>> irq18: uhci2 ehci+ 9 0 >>> cpu0: timer 470507 1993 >>> irq256: em0 665 2 >>> irq259: em3 1027684 4354 >>> cpu1: timer 470272 1992 >>> cpu3: timer 470273 1992 >>> cpu2: timer 470273 1992 >>> Total 2911911 12338 >>> >>> interrupt total rate >>> irq1: atkbd0 1 0 >>> irq6: fdc0 1 0 >>> irq17: uhci1+ 2226 9 >>> irq18: uhci2 ehci+ 9 0 >>> cpu0: timer 472513 1993 >>> irq256: em0 668 2 >>> irq259: em3 1033703 4361 >>> cpu1: timer 472278 1992 >>> cpu3: timer 472279 1992 >>> cpu2: timer 472279 1992 >>> Total 2925957 12345 >>> >>> >>> top -SH shows: >>> >>> PID STATE C TIME CPU COMMAND >>> 10 CPU3 3 7:32 100.00% idle >>> 10 CPU2 2 7:32 100.00% idle >>> 10 RUN 0 7:31 100.00% idle >>> 10 CPU1 1 7:31 100.00% idle >>> >>> This implies that CPU usage is substantially >> under-reported in general >>> by the system. Note that I've modified >> em_irq_fast() to call em_handle_rxtx() directly rather than >> scheduling a task to illustrate >>> the problem >>> >> With unmodified code, what do you see? Are you sending >> valid UDP frames with valid checksums and a valid port, or >> is everything that you're blasting at the interface >> getting dropped right away? Calling em_handle_rxtx() >> directly will cause a very quick panic once you start >> handling real traffic and you encounter a lock. >> >> Scott > > I think you're mistaken. I'm also accessing the system via an em port > (and running top) and em_handle_rxtx() is self contained lock-wise. > The taskqueue doesn't obtain a lock before calling the routine. > I understand perfectly how the code works, as I wrote it. While there are no locks in the RX path of the driver, there are certainly locks higher up in the network stack RX path. You're not going to hit them in your test, but in the real world you will. > As I mentioned, they're being dumped into a udp blackhole, which implies > that I have udp.blackhole set and the port is unused. I can see the > packets hit the udp socket so its working as expected: > > 853967872 dropped due to no socket > > With unmodified code, the tasq shows 25% usage or so. > > I'm not sure what the point of your criticism for what clearly is a test. > Are you implying that the system can receive 400K pps with 6000 ints/sec > and record 0% usage because of a coding imperfection? Or are you implying > that the 25% usage is all due to launching tasks unnecessarily and process > switching? Prior to FreeBSD 5, interrupt processing time was counted in the %intr stat. With FreeBSD 5 and beyond, most interrupts moved to full processing contexts called ithreads, and the processing time spent in the ithread was counted in the %intr stat. The time spent in low-level interrupts was merely counted against the process that got interrupted. This wasn't a big deal because low-level interrupts were only used to launch ithreads and to process low-latency interrupts for a few drivers. Moving to the taskq model breaks this accounting model. What's happening in your test is that the system is almost completely idle, so the only thing that is being interrupted by the low-level if_em handler is the cpu idle thread. Since you're also bogusly bypassing the deferral to the taskq, all stack processing is also happening in this low-level context, and it's being counted against the CPU idle thread. However, the process accounting code knows not to charge idle thread time against the normal stats, because doing so would result in the system always showing 100% busy. So your test is exploiting this; you're stealing all of your cycles from the idle threads, and they aren't being accounted for because it's hard to know when the idle thread is having its cycles stolen. So no, 25% of a CPU isn't going to "launching tasks unnecessarily and process switching." It's going to processing 400k packets/sec off of the RX ring and up the stack to the UDP layer. I think that if you studied how the code worked, and devised more useful benchmarks, you'd see that the taskq deferral method is usually a significant gain in performance over polling or simple ithreads. There is certainly room for more improvement, and my taskq scheme isn't the only way to get good performance, but it does work fairly well. Scott