From owner-freebsd-hackers  Tue Aug 12 19:39:06 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id TAA11650
          for hackers-outgoing; Tue, 12 Aug 1997 19:39:06 -0700 (PDT)
Received: from genesis.atrad.adelaide.edu.au (genesis.atrad.adelaide.edu.au [129.127.96.120])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id TAA11637;
          Tue, 12 Aug 1997 19:38:51 -0700 (PDT)
Received: (from msmith@localhost) by genesis.atrad.adelaide.edu.au (8.8.5/8.7.3) id MAA11390; Wed, 13 Aug 1997 12:04:40 +0930 (CST)
From: Michael Smith <msmith@atrad.adelaide.edu.au>
Message-Id: <199708130234.MAA11390@genesis.atrad.adelaide.edu.au>
Subject: Re: 2.2.2+ crash.. more info
In-Reply-To: <33F114EB.167EB0E7@whistle.com> from Julian Elischer at "Aug 12, 97 06:59:07 pm"
To: julian@whistle.com (Julian Elischer)
Date: Wed, 13 Aug 1997 12:04:39 +0930 (CST)
Cc: msmith@atrad.adelaide.edu.au, julian@FreeBSD.ORG, hackers@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL28 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Julian Elischer stands accused of saying:
> Michael Smith wrote:
> > 
> > Julian Elischer stands accused of saying:
> > >
> > > We have several hundred Bsd machines here.. we see this one enough for
> > > me to recognise it..
> > >
> > > the plot thickens..
> > > I have discovered the following:
> > > 1/ the code that crashes:
> > >   scanning the queues in swithc:
> > 
> > This looks a lot like the sort of crazy stuff I was seeing when I was
> > doing Verboten things inside a 'fast' ISA interrupt handler.  Do you have
> > RI_FAST set for any of your drivers, particularly ones that you've written
> > yourself?
> > 
> > You could try ripping RI_FAST out ouf _all_ of the handlers you're using
> > to start with and see if this cures things.
> > 
> > > code examinations will follow with more info..
> > > if this strikes anyone as familiar, do chime in!
> > 
> > Frighteningly.  It took us the best part of a year just to get a stack
> > trace that actually hinted at the problem.
> > 
> > > julian
> 
> this particular machine has no interupt handlers that were not 
> part of standard FreeBSD..
> 
> ed0 and ed1 networks,
> wd0 disk
> sio0 and sio1
> 
> how do I SET RI_FAST? :)
> (does that answer your question?)

You mask it into the id_ri_flags field of the isa_device structure.
Currently only the 'cy' and 'sio' drivers use it.  You could try
removing it from the 'sio' driver and see if it helps, but I expect
that Bruce would insist that this is not the case.

> actually it looks like some sort of SPL problem to me but as I said,
> there is very little
> that is non standard on this machine..

The RI_FAST problem _is_ an spl problem, in that a fast interrupt
handler does not honour any spl() protection.

> the fact that the process got put on the a sleep queue while it was
> on the runnable queue. suggests that maybe an interrupt driver
> ran 'tsleep' while curproc had the value of this process in it..

You get this sort of confusion if you futz with *sleep/wakeup inside a
fast interrupt handler because you can end up re-entering the code
that shuffles processess from one queue to another.

I would be fairly surprised, given your usage, if the sio interrupt
handler was the cause of your trouble; I think I may have given you
a bum steer.

-- 
]] Mike Smith, Software Engineer        msmith@gsoft.com.au             [[
]] Genesis Software                     genesis@gsoft.com.au            [[
]] High-speed data acquisition and      (GSM mobile)     0411-222-496   [[
]] realtime instrument control.         (ph)          +61-8-8267-3493   [[
]] Unix hardware collector.             "Where are your PEZ?" The Tick  [[