FreeBSD Mail Archives

Date:      Mon, 3 Mar 1997 11:10:28 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ponds!rivers@dg-rtp.dg.com (Thomas David Rivers)
Cc:        hackers@freebsd.org
Subject:   Re: Another installment of the "dup alloc"/"bad dir" panic problems.
Message-ID:  <199703031810.LAA08249@phaeton.artisoft.com>
In-Reply-To: <199703021317.IAA13157@lakes.water.net> from "Thomas David Rivers" at Mar 2, 97 08:17:26 am

> > 1542B?
> > 
> > How much RAM do you have?
> > 
> > If you have more than 16M ... it's bouncing.  Try backing down to
> > 16M and not bouncing and see if that's where it is...
> 
>  Another good idea - but I only have 12 meg in this particular
> machine.

Now try not bouncing.  In your posted kernel compilation line, it showed
"-DBOUNCE_BUFFERS".  Turn them off.  The code could still be bogus there
even though you don't have enough memory to require their invocation.  I
went the rounds with Nate on this one once; I couldn't believe that the
bounce code was not automatic and handled in the generic SCSI layer until
Nate pointed me at code (I still think this is bogus as hell).


>  Also, you should recall that I am experiencing this problem on an
> 8-meg 386dx (intel 387) with an IDE drive... that kinda points to
> something "higher-level" then the physical device drivers...

Not necessarily.  As I said, the buffer handling code could still be
bogus.


>  Right now, I'm mulling over race conditions in disksort().  Something
> along the lines of:
> 
> 	start to add buf to beginning of queue
> 	take an interrupt indicating previous I/O was complete
> 	remove partially added buf
> 	wow - lost buffer...
> 
>  disksort() appears to be run at splbio() [it's not obvious from
> the SCSI code that's what's going on, but the wd.c code definitely 
> dones that.]  If the interrupt comes in at just the right time, it
> seems there is a potential to loose a buffer... which I think is
> what I'm seeing.  [That would also explain why adding a printf()
> to disksort masked the problem.] I'm going to play with this idea
> a while and see if I can verify it...

If this were the problem, then I would think it would be *much* more
widespread than it seems to be.  My gut feeling is that you have an
odd hardware configuration, or have done strange things to the code
some other way, maybe with your choice of devices (like the 1542B).
In any case, if there were a rache, all you'd need would be a
sufficiently large discrepancy beteen processor speed and transfer
rate, and there's enough machines out there that meet those criteria
that you'd expect it to trigger *much* more frequently.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703031810.LAA08249>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation