From owner-freebsd-firewire@FreeBSD.ORG Thu Dec 18 08:53:39 2008 Return-Path: Delivered-To: freebsd-firewire@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8D1B1065673; Thu, 18 Dec 2008 08:53:39 +0000 (UTC) (envelope-from freebsd@sopwith.solgatos.com) Received: from sopwith.solgatos.com (pool-71-182-69-106.ptldor.fios.verizon.net [71.182.69.106]) by mx1.freebsd.org (Postfix) with ESMTP id A17938FC0C; Thu, 18 Dec 2008 08:53:38 +0000 (UTC) (envelope-from freebsd@sopwith.solgatos.com) Received: by sopwith.solgatos.com (Postfix, from userid 66) id E0D853F22; Thu, 18 Dec 2008 00:30:39 -0800 (PST) Received: from localhost by sopwith.solgatos.com (8.8.8/6.24) id IAA08485; Thu, 18 Dec 2008 08:51:41 GMT Message-Id: <200812180851.IAA08485@sopwith.solgatos.com> To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org In-reply-to: Your message of "Wed, 17 Dec 2008 17:10:53 PST." <4949A31D.5030507@miralink.com> Date: Thu, 18 Dec 2008 00:51:41 +0000 From: Dieter Cc: Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost X-BeenThere: freebsd-firewire@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Firewire support in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Dec 2008 08:53:39 -0000 > > : Warner> This can't be the case. There's no SPL involved at all. Maybe > > : Warner> removing the printfs causes an interrupt to be serviced faster, > > : Warner> resulting in what appears to be better performance... > > : > > : With the printfs, Ethernet is not getting serviced. This > > : is repeatable and easily reproduced. Without the printfs, > > : it seems ok. > > : > > : If it isn't spl, what is locking out Ethernet? > > > > interrupt storm? In old-spl-locked drivers, often times the interrupt > > would be masked while certain operations happened. In new > > mutex-locked freebsd, there's no way to block the interrupts, so > > sometimes old code gets into an interrupt storm, which starves other > > things. Not sure why printf would trigger this, however... I tried logging in on the console and "cat /etc/termcap" does not create a problem. Systat -vmstat reports that IRQ 5 (console) interrupts go from 0 to 120. CPU: 0.0%Sys 0.6%Intr 0.0%User 0.0%Nice 99.4%Idle The normal firewire bus reset printfs generate 890 IRQ 5 interrupts with the shell firing off "fwcontrol -u 1 -r" in a loop. CPU goes from idle to 100% busy: 98.1%Sys 1.7%Intr 0.3%User 0.0%Nice 0.0%Idle "fwcontrol -u 0 -r" (the onboard firewire controller) gives the same 890 IRQ 5 interrupts, but CPU usage is different: 79.2%Sys 20.8%Intr 0.0%User 0.0%Nice 0.0%Idle A couple of other IRQs go up to 30-40 range, nothing huge. A single "fwcontrol -u 0 -r" generates 14 interrupts on IRQ 5 and 4 interrupts on 3 other IRQs. A single "fwcontrol -u 1 -r" generates 14 interrupts on IRQ 5 and 3 interrupts on 2 other IRQs. yawn ------ With the bus reset printfs commented out, I get: -u 0 80.0%Sys 4.8%Intr 13.9%User 0.0%Nice 1.2%Idle 3300 interrupts on the onboard firewire controller -u 1 84.8%Sys 1.1%Intr 14.1%User 0.0%Nice 0.0%Idle 3900 interrupts on the PCI card firewire controller After awhile the console says: fwohci1: phy read failed(1). i = 100 retry = 0 fwohci1: phy read failed(1). i = 100 retry = 1 fwohci1: phy read failed(1). i = 100 retry = 2 fwohci1: phy read failed(1). i = 100 retry = 3 ... and the system becomes unresponsive. Reset button required. :-( Observations: Kernel printfs to console appear to take more interrupts and more CPU than userland output to console. Seems odd that the onboard controller uses so much CPU for interrupts in the with-printf case, but there wasn't a large number on interrupts. The without-printf case had a LOT more interrupts from the fw controller. Resetting the firewire bus in an endless loop is not recommended. I'm not an expert on interrupt storms, but none of this looks like an interrupt storm to me. (Assuming systat is trustworthy.) Looks like the kernel printf takes a *lot* of CPU, and somehow locks out Ethernet.