From owner-freebsd-current@FreeBSD.ORG  Mon May 17 20:50:12 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A1CA916A4CE
	for <freebsd-current@freebsd.org>;
	Mon, 17 May 2004 20:50:12 -0700 (PDT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2164D43D45
	for <freebsd-current@freebsd.org>;
	Mon, 17 May 2004 20:50:10 -0700 (PDT)	(envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])i4I3o75v030041;	Tue, 18 May 2004 13:50:07 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i4I3o02O002775;	Tue, 18 May 2004 13:50:06 +1000
Date: Tue, 18 May 2004 13:50:02 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Mike Tancsa <mike@sentex.net>
In-Reply-To: <6.0.3.0.0.20040517154946.06d23d60@64.7.153.2>
Message-ID: <20040518132157.B8772@gamplex.bde.org>
References: <6.0.3.0.0.20040517154946.06d23d60@64.7.153.2>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@freebsd.org
Subject: Re: sio / puc wedging on both -current and -stable
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 May 2004 03:50:12 -0000

On Mon, 17 May 2004, Mike Tancsa wrote:

> We are building a box that needs many serial ports to talk to some legacy
> low speed (9600) serial devices.  Our application (a small daemon written
> in c) happily talks to the devices and all works well.  However, if one of
> the external devices die or is unplugged, the FreeBSD box will at seemingly
> irregular intervals lockup hard.  The only way to unlock the machine is to
> either hit the reset button (the keyboard is locked solid-- not even num
> lock works) *or* if I jiggle the DB9 connector enough so that enough noise
> shorts across the serial port *or* plug the serial port into a working
> device that I imagine sends some data on the serial port.  The machine then
> returns to a normal state and all is well. This does NOT happen with the
> onboard serial ports.  Only with a PUC device (we have tried several and
> its the same result)
>
> Does this jog anyone's memory as to what the problem might be ?

It's an interrupt storm of some sort.  PCI interrupts are more likely to
cause one than ISA interrupts because they are more likely to be level
triggered.

> I have a remote debugger setup and I can send a break and drop the unit
> into debugger, but kernel debugging is a little beyond our skillset.

Does this break into the locked machine?  If so...

> db> trace
> siointr1(c11d0000,d56dacb0,c02b49e6,c11d0000,10) at siointr1+0xc5
> siointr(c11d0000,10,a005,c,10060) at siointr+0xc
> Xfastintr4(c11d0c00,d56dacd8,c02a741a,c11d0c00,c0a3f240) at Xfastintr4+0x16
> siointr(c11d0c00) at siointr+0xc

... Type "s", then hold down the Enter key to repeat the "s" command until
control returns here, then keep holding down the Enter key until something
loops (may take many hundreds of commands).  Record all the output using
a serial console (don't type it in) and send it to me.

> puc_intr(c11af000,63103a,c11d0c00,0,d56dad68) at puc_intr+0x4e

If control returns here, then siointr hasn't looped internally; keep
going.

> intr_mux(c0a3f240,0,630010,c1360010,c0170010) at intr_mux+0x1f

If control returns here, then the loop is external so it is harder to
debug  (but this is the most likely case).

Going through intr_mux() means that the interrupt is not fast
(options PUC_FASTINTR).  Try that.

> Xresume12() at Xresume12+0x2b

Stop if it gets back here.

> --- interrupt, eip = 0xc02b5b2a, esp = 0xd56dad38, ebp = 0xd56dad68 ---
> vec12(c11ce980,3,2000,cbf03a00,d56634c0) at vec12+0x2
> cnopen(c11ce980,3,2000,cbf03a00,0) at cnopen+0x6a

It may be significant that the hang seems to occur while openig the console
device.  Do you have a serial console on the puc device?  I thought that
this doesn't work.

> Any pointers on how to track this down ?  It happens both in RELENG_4 from
> May 12th and 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu May 13

Did it work before then?  The driver hasn't changed since long before then.

Bruce