Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Mar 1997 14:39:04 -0500 (EST)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!lakes.water.net!rivers, ponds!lambert.org!terry
Cc:        ponds!freebsd.org!hackers
Subject:   Re: Another installment of the "dup alloc"/"bad dir" panic problems.
Message-ID:  <199703031939.OAA21260@lakes.water.net>

next in thread | raw e-mail | index | archive | help
> 
> > > 1542B?
> > > 
> > > How much RAM do you have?
> > > 
> > > If you have more than 16M ... it's bouncing.  Try backing down to
> > > 16M and not bouncing and see if that's where it is...
> > 
> >  Another good idea - but I only have 12 meg in this particular
> > machine.
> 
> Now try not bouncing.  In your posted kernel compilation line, it showed
> "-DBOUNCE_BUFFERS".  Turn them off.  The code could still be bogus there
> even though you don't have enough memory to require their invocation.  I
> went the rounds with Nate on this one once; I couldn't believe that the
> bounce code was not automatic and handled in the generic SCSI layer until
> Nate pointed me at code (I still think this is bogus as hell).

 Hmm.. I hadn't considered that...

 My kernel compilation, by the way, is just what you get when doing
a "make release" (i.e. building the boot kernel) - nothing special.

> 
> 
> >  Also, you should recall that I am experiencing this problem on an
> > 8-meg 386dx (intel 387) with an IDE drive... that kinda points to
> > something "higher-level" then the physical device drivers...
> 
> Not necessarily.  As I said, the buffer handling code could still be
> bogus.

 Even for IDE?  Could be, I suppose  - I'll try it.

> 
> 
> >  Right now, I'm mulling over race conditions in disksort().  Something
> > along the lines of:
> > 
> > 	start to add buf to beginning of queue
> > 	take an interrupt indicating previous I/O was complete
> > 	remove partially added buf
> > 	wow - lost buffer...
> > 
> >  disksort() appears to be run at splbio() [it's not obvious from
> > the SCSI code that's what's going on, but the wd.c code definitely 
> > dones that.]  If the interrupt comes in at just the right time, it
> > seems there is a potential to loose a buffer... which I think is
> > what I'm seeing.  [That would also explain why adding a printf()
> > to disksort masked the problem.] I'm going to play with this idea
> > a while and see if I can verify it...
> 
> If this were the problem, then I would think it would be *much* more
> widespread than it seems to be.  My gut feeling is that you have an
> odd hardware configuration, or have done strange things to the code
> some other way, maybe with your choice of devices (like the 1542B).
> In any case, if there were a rache, all you'd need would be a
> sufficiently large discrepancy beteen processor speed and transfer
> rate, and there's enough machines out there that meet those criteria
> that you'd expect it to trigger *much* more frequently.

 Yes - I'd have to agree...

 Remember; I'm not doing anything special to the kernel, and I can
reliably reproduce this with a 2.1.5, 2.1.6.1 and 2.2-GAMMA install
kernel... (it's been happening since 2.1 but I just haven't tried
it on any older kernels.)

 Also, I'm not sure the hardware is at fault here, as I have it
happening on two disparit machines, and it's been demonstrated on
others... and, since it happens with IDE, and MFS, one would tend to 
rule out the 1542B as the culprit....


> 
> 					Regards,
> 					Terry Lambert
> 					terry@lambert.org

	- Dave R. -



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703031939.OAA21260>