Date: Mon, 3 Mar 1997 11:10:28 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: ponds!rivers@dg-rtp.dg.com (Thomas David Rivers) Cc: hackers@freebsd.org Subject: Re: Another installment of the "dup alloc"/"bad dir" panic problems. Message-ID: <199703031810.LAA08249@phaeton.artisoft.com> In-Reply-To: <199703021317.IAA13157@lakes.water.net> from "Thomas David Rivers" at Mar 2, 97 08:17:26 am
next in thread | previous in thread | raw e-mail | index | archive | help
> > 1542B? > > > > How much RAM do you have? > > > > If you have more than 16M ... it's bouncing. Try backing down to > > 16M and not bouncing and see if that's where it is... > > Another good idea - but I only have 12 meg in this particular > machine. Now try not bouncing. In your posted kernel compilation line, it showed "-DBOUNCE_BUFFERS". Turn them off. The code could still be bogus there even though you don't have enough memory to require their invocation. I went the rounds with Nate on this one once; I couldn't believe that the bounce code was not automatic and handled in the generic SCSI layer until Nate pointed me at code (I still think this is bogus as hell). > Also, you should recall that I am experiencing this problem on an > 8-meg 386dx (intel 387) with an IDE drive... that kinda points to > something "higher-level" then the physical device drivers... Not necessarily. As I said, the buffer handling code could still be bogus. > Right now, I'm mulling over race conditions in disksort(). Something > along the lines of: > > start to add buf to beginning of queue > take an interrupt indicating previous I/O was complete > remove partially added buf > wow - lost buffer... > > disksort() appears to be run at splbio() [it's not obvious from > the SCSI code that's what's going on, but the wd.c code definitely > dones that.] If the interrupt comes in at just the right time, it > seems there is a potential to loose a buffer... which I think is > what I'm seeing. [That would also explain why adding a printf() > to disksort masked the problem.] I'm going to play with this idea > a while and see if I can verify it... If this were the problem, then I would think it would be *much* more widespread than it seems to be. My gut feeling is that you have an odd hardware configuration, or have done strange things to the code some other way, maybe with your choice of devices (like the 1542B). In any case, if there were a rache, all you'd need would be a sufficiently large discrepancy beteen processor speed and transfer rate, and there's enough machines out there that meet those criteria that you'd expect it to trigger *much* more frequently. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703031810.LAA08249>