Date: Thu, 18 Dec 2008 00:51:41 +0000 From: Dieter <freebsd@sopwith.solgatos.com> To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost Message-ID: <200812180851.IAA08485@sopwith.solgatos.com> In-Reply-To: Your message of "Wed, 17 Dec 2008 17:10:53 PST." <4949A31D.5030507@miralink.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > : Warner> This can't be the case. There's no SPL involved at all. Maybe > > : Warner> removing the printfs causes an interrupt to be serviced faster, > > : Warner> resulting in what appears to be better performance... > > : > > : With the printfs, Ethernet is not getting serviced. This > > : is repeatable and easily reproduced. Without the printfs, > > : it seems ok. > > : > > : If it isn't spl, what is locking out Ethernet? > > > > interrupt storm? In old-spl-locked drivers, often times the interrupt > > would be masked while certain operations happened. In new > > mutex-locked freebsd, there's no way to block the interrupts, so > > sometimes old code gets into an interrupt storm, which starves other > > things. Not sure why printf would trigger this, however... I tried logging in on the console and "cat /etc/termcap" does not create a problem. Systat -vmstat reports that IRQ 5 (console) interrupts go from 0 to 120. CPU: 0.0%Sys 0.6%Intr 0.0%User 0.0%Nice 99.4%Idle The normal firewire bus reset printfs generate 890 IRQ 5 interrupts with the shell firing off "fwcontrol -u 1 -r" in a loop. CPU goes from idle to 100% busy: 98.1%Sys 1.7%Intr 0.3%User 0.0%Nice 0.0%Idle "fwcontrol -u 0 -r" (the onboard firewire controller) gives the same 890 IRQ 5 interrupts, but CPU usage is different: 79.2%Sys 20.8%Intr 0.0%User 0.0%Nice 0.0%Idle A couple of other IRQs go up to 30-40 range, nothing huge. A single "fwcontrol -u 0 -r" generates 14 interrupts on IRQ 5 and 4 interrupts on 3 other IRQs. A single "fwcontrol -u 1 -r" generates 14 interrupts on IRQ 5 and 3 interrupts on 2 other IRQs. yawn ------ With the bus reset printfs commented out, I get: -u 0 80.0%Sys 4.8%Intr 13.9%User 0.0%Nice 1.2%Idle 3300 interrupts on the onboard firewire controller -u 1 84.8%Sys 1.1%Intr 14.1%User 0.0%Nice 0.0%Idle 3900 interrupts on the PCI card firewire controller After awhile the console says: fwohci1: phy read failed(1). i = 100 retry = 0 fwohci1: phy read failed(1). i = 100 retry = 1 fwohci1: phy read failed(1). i = 100 retry = 2 fwohci1: phy read failed(1). i = 100 retry = 3 ... and the system becomes unresponsive. Reset button required. :-( Observations: Kernel printfs to console appear to take more interrupts and more CPU than userland output to console. Seems odd that the onboard controller uses so much CPU for interrupts in the with-printf case, but there wasn't a large number on interrupts. The without-printf case had a LOT more interrupts from the fw controller. Resetting the firewire bus in an endless loop is not recommended. I'm not an expert on interrupt storms, but none of this looks like an interrupt storm to me. (Assuming systat is trustworthy.) Looks like the kernel printf takes a *lot* of CPU, and somehow locks out Ethernet.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200812180851.IAA08485>