From owner-freebsd-hackers Tue Aug 12 19:39:06 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id TAA11650 for hackers-outgoing; Tue, 12 Aug 1997 19:39:06 -0700 (PDT) Received: from genesis.atrad.adelaide.edu.au (genesis.atrad.adelaide.edu.au [129.127.96.120]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA11637; Tue, 12 Aug 1997 19:38:51 -0700 (PDT) Received: (from msmith@localhost) by genesis.atrad.adelaide.edu.au (8.8.5/8.7.3) id MAA11390; Wed, 13 Aug 1997 12:04:40 +0930 (CST) From: Michael Smith Message-Id: <199708130234.MAA11390@genesis.atrad.adelaide.edu.au> Subject: Re: 2.2.2+ crash.. more info In-Reply-To: <33F114EB.167EB0E7@whistle.com> from Julian Elischer at "Aug 12, 97 06:59:07 pm" To: julian@whistle.com (Julian Elischer) Date: Wed, 13 Aug 1997 12:04:39 +0930 (CST) Cc: msmith@atrad.adelaide.edu.au, julian@FreeBSD.ORG, hackers@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL28 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Julian Elischer stands accused of saying: > Michael Smith wrote: > > > > Julian Elischer stands accused of saying: > > > > > > We have several hundred Bsd machines here.. we see this one enough for > > > me to recognise it.. > > > > > > the plot thickens.. > > > I have discovered the following: > > > 1/ the code that crashes: > > > scanning the queues in swithc: > > > > This looks a lot like the sort of crazy stuff I was seeing when I was > > doing Verboten things inside a 'fast' ISA interrupt handler. Do you have > > RI_FAST set for any of your drivers, particularly ones that you've written > > yourself? > > > > You could try ripping RI_FAST out ouf _all_ of the handlers you're using > > to start with and see if this cures things. > > > > > code examinations will follow with more info.. > > > if this strikes anyone as familiar, do chime in! > > > > Frighteningly. It took us the best part of a year just to get a stack > > trace that actually hinted at the problem. > > > > > julian > > this particular machine has no interupt handlers that were not > part of standard FreeBSD.. > > ed0 and ed1 networks, > wd0 disk > sio0 and sio1 > > how do I SET RI_FAST? :) > (does that answer your question?) You mask it into the id_ri_flags field of the isa_device structure. Currently only the 'cy' and 'sio' drivers use it. You could try removing it from the 'sio' driver and see if it helps, but I expect that Bruce would insist that this is not the case. > actually it looks like some sort of SPL problem to me but as I said, > there is very little > that is non standard on this machine.. The RI_FAST problem _is_ an spl problem, in that a fast interrupt handler does not honour any spl() protection. > the fact that the process got put on the a sleep queue while it was > on the runnable queue. suggests that maybe an interrupt driver > ran 'tsleep' while curproc had the value of this process in it.. You get this sort of confusion if you futz with *sleep/wakeup inside a fast interrupt handler because you can end up re-entering the code that shuffles processess from one queue to another. I would be fairly surprised, given your usage, if the sio interrupt handler was the cause of your trouble; I think I may have given you a bum steer. -- ]] Mike Smith, Software Engineer msmith@gsoft.com.au [[ ]] Genesis Software genesis@gsoft.com.au [[ ]] High-speed data acquisition and (GSM mobile) 0411-222-496 [[ ]] realtime instrument control. (ph) +61-8-8267-3493 [[ ]] Unix hardware collector. "Where are your PEZ?" The Tick [[