From owner-freebsd-firewire@FreeBSD.ORG  Thu Dec 18 08:53:39 2008
Return-Path: <owner-freebsd-firewire@FreeBSD.ORG>
Delivered-To: freebsd-firewire@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C8D1B1065673;
	Thu, 18 Dec 2008 08:53:39 +0000 (UTC)
	(envelope-from freebsd@sopwith.solgatos.com)
Received: from sopwith.solgatos.com
	(pool-71-182-69-106.ptldor.fios.verizon.net [71.182.69.106])
	by mx1.freebsd.org (Postfix) with ESMTP id A17938FC0C;
	Thu, 18 Dec 2008 08:53:38 +0000 (UTC)
	(envelope-from freebsd@sopwith.solgatos.com)
Received: by sopwith.solgatos.com (Postfix, from userid 66)
	id E0D853F22; Thu, 18 Dec 2008 00:30:39 -0800 (PST)
Received: from localhost by sopwith.solgatos.com (8.8.8/6.24)
	id IAA08485; Thu, 18 Dec 2008 08:51:41 GMT
Message-Id: <200812180851.IAA08485@sopwith.solgatos.com>
To: freebsd-firewire@freebsd.org, bug-followup@freebsd.org
In-reply-to: Your message of "Wed, 17 Dec 2008 17:10:53 PST."
	<4949A31D.5030507@miralink.com> 
Date: Thu, 18 Dec 2008 00:51:41 +0000
From: Dieter <freebsd@sopwith.solgatos.com>
Cc: 
Subject: Re: kern/118093: firewire bus reset hogs CPU,
	causing data to be lost 
X-BeenThere: freebsd-firewire@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Firewire support in FreeBSD <freebsd-firewire.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-firewire>, 
	<mailto:freebsd-firewire-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-firewire>
List-Post: <mailto:freebsd-firewire@freebsd.org>
List-Help: <mailto:freebsd-firewire-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-firewire>, 
	<mailto:freebsd-firewire-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Dec 2008 08:53:39 -0000

> > : Warner>  This can't be the case.  There's no SPL involved at all.  Maybe
> > : Warner>  removing the printfs causes an interrupt to be serviced faster,
> > : Warner>  resulting in what appears to be better performance...
> > : 
> > : With the printfs, Ethernet is not getting serviced.  This
> > : is repeatable and easily reproduced.   Without the printfs,
> > : it seems ok.
> > : 
> > : If it isn't spl, what is locking out Ethernet?
> >
> > interrupt storm?  In old-spl-locked drivers, often times the interrupt
> > would be masked while certain operations happened.  In new
> > mutex-locked freebsd, there's no way to block the interrupts, so
> > sometimes old code gets into an interrupt storm, which starves other
> > things.  Not sure why printf would trigger this, however...

I tried logging in on the console and "cat /etc/termcap" does
not create a problem.  Systat -vmstat reports that IRQ 5 (console)
interrupts go from 0 to 120.  CPU:
	0.0%Sys   0.6%Intr  0.0%User  0.0%Nice 99.4%Idle

The normal firewire bus reset printfs generate 890 IRQ 5 interrupts
with the shell firing off "fwcontrol -u 1 -r" in a loop.
CPU goes from idle to 100% busy:
	98.1%Sys   1.7%Intr  0.3%User  0.0%Nice  0.0%Idle

"fwcontrol -u 0 -r" (the onboard firewire controller) gives
the same 890 IRQ 5 interrupts, but CPU usage is different:
	79.2%Sys  20.8%Intr  0.0%User  0.0%Nice  0.0%Idle

A couple of other IRQs go up to 30-40 range, nothing huge.

A single "fwcontrol -u 0 -r" generates 14 interrupts on IRQ 5
and 4 interrupts on 3 other IRQs.

A single "fwcontrol -u 1 -r" generates 14 interrupts on IRQ 5
and 3 interrupts on 2 other IRQs.  yawn

------

With the bus reset printfs commented out, I get:

-u 0

   80.0%Sys   4.8%Intr 13.9%User  0.0%Nice  1.2%Idle

   3300 interrupts on the onboard firewire controller

-u 1

   84.8%Sys   1.1%Intr 14.1%User  0.0%Nice  0.0%Idle

   3900 interrupts on the PCI card firewire controller

   After awhile the console says:

   fwohci1: phy read failed(1). i = 100 retry = 0
   fwohci1: phy read failed(1). i = 100 retry = 1
   fwohci1: phy read failed(1). i = 100 retry = 2
   fwohci1: phy read failed(1). i = 100 retry = 3
   ...

   and the system becomes unresponsive.  Reset button required.  :-(

Observations:

Kernel printfs to console appear to take more interrupts and
more CPU than userland output to console.

Seems odd that the onboard controller uses so much CPU for
interrupts in the with-printf case, but there wasn't a
large number on interrupts.  The without-printf case had
a LOT more interrupts from the fw controller.

Resetting the firewire bus in an endless loop is not recommended.

I'm not an expert on interrupt storms, but none of this looks
like an interrupt storm to me.  (Assuming systat is trustworthy.)

Looks like the kernel printf takes a *lot* of CPU, and
somehow locks out Ethernet.