From owner-freebsd-current@FreeBSD.ORG Mon May 17 20:50:12 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A1CA916A4CE for ; Mon, 17 May 2004 20:50:12 -0700 (PDT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2164D43D45 for ; Mon, 17 May 2004 20:50:10 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])i4I3o75v030041; Tue, 18 May 2004 13:50:07 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i4I3o02O002775; Tue, 18 May 2004 13:50:06 +1000 Date: Tue, 18 May 2004 13:50:02 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Mike Tancsa In-Reply-To: <6.0.3.0.0.20040517154946.06d23d60@64.7.153.2> Message-ID: <20040518132157.B8772@gamplex.bde.org> References: <6.0.3.0.0.20040517154946.06d23d60@64.7.153.2> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org Subject: Re: sio / puc wedging on both -current and -stable X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 May 2004 03:50:12 -0000 On Mon, 17 May 2004, Mike Tancsa wrote: > We are building a box that needs many serial ports to talk to some legacy > low speed (9600) serial devices. Our application (a small daemon written > in c) happily talks to the devices and all works well. However, if one of > the external devices die or is unplugged, the FreeBSD box will at seemingly > irregular intervals lockup hard. The only way to unlock the machine is to > either hit the reset button (the keyboard is locked solid-- not even num > lock works) *or* if I jiggle the DB9 connector enough so that enough noise > shorts across the serial port *or* plug the serial port into a working > device that I imagine sends some data on the serial port. The machine then > returns to a normal state and all is well. This does NOT happen with the > onboard serial ports. Only with a PUC device (we have tried several and > its the same result) > > Does this jog anyone's memory as to what the problem might be ? It's an interrupt storm of some sort. PCI interrupts are more likely to cause one than ISA interrupts because they are more likely to be level triggered. > I have a remote debugger setup and I can send a break and drop the unit > into debugger, but kernel debugging is a little beyond our skillset. Does this break into the locked machine? If so... > db> trace > siointr1(c11d0000,d56dacb0,c02b49e6,c11d0000,10) at siointr1+0xc5 > siointr(c11d0000,10,a005,c,10060) at siointr+0xc > Xfastintr4(c11d0c00,d56dacd8,c02a741a,c11d0c00,c0a3f240) at Xfastintr4+0x16 > siointr(c11d0c00) at siointr+0xc ... Type "s", then hold down the Enter key to repeat the "s" command until control returns here, then keep holding down the Enter key until something loops (may take many hundreds of commands). Record all the output using a serial console (don't type it in) and send it to me. > puc_intr(c11af000,63103a,c11d0c00,0,d56dad68) at puc_intr+0x4e If control returns here, then siointr hasn't looped internally; keep going. > intr_mux(c0a3f240,0,630010,c1360010,c0170010) at intr_mux+0x1f If control returns here, then the loop is external so it is harder to debug (but this is the most likely case). Going through intr_mux() means that the interrupt is not fast (options PUC_FASTINTR). Try that. > Xresume12() at Xresume12+0x2b Stop if it gets back here. > --- interrupt, eip = 0xc02b5b2a, esp = 0xd56dad38, ebp = 0xd56dad68 --- > vec12(c11ce980,3,2000,cbf03a00,d56634c0) at vec12+0x2 > cnopen(c11ce980,3,2000,cbf03a00,0) at cnopen+0x6a It may be significant that the hang seems to occur while openig the console device. Do you have a serial console on the puc device? I thought that this doesn't work. > Any pointers on how to track this down ? It happens both in RELENG_4 from > May 12th and 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu May 13 Did it work before then? The driver hasn't changed since long before then. Bruce