Date: Tue, 24 Jun 1997 09:47:56 -0700 (PDT) From: Jaye Mathisen <mrcpu@cdsnet.net> To: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> Cc: ponds!lambert.org!terry@cdsnet.net, ponds!sdf.com!tom@cdsnet.net, hackers@freebsd.org Subject: Re: panics/file system corruption - was Re: OpenBSD Message-ID: <Pine.NEB.3.95.970624094321.26020A-100000@mail.cdsnet.net> In-Reply-To: <199706241205.IAA04547@lakes.water.net>
next in thread | previous in thread | raw e-mail | index | archive | help
John Dyson mentioned in some private email to me that he found some more vnode locking problems in the code, and was rewriting the sections in question for a 3.0-SMP release. I think it ended up being some conditions that weren't being tested, or somesuch. It least is a bit of relief to me that regardless of whether or not this fixes my specific problem, at least some more bugs were fixed. :) The code would need to be backported to 2.x. So I haven't been able to test his fixes, although I can include the 3.0 section of code if somebody knowledgeable about the internals wants to take a crack at it. On Tue, 24 Jun 1997, Thomas David Rivers wrote: > > I just tripped over this; thought I would try to take a stab > at an answer.... > > > On Fri, 20 Jun 1997, Terry Lambert wrote: > > > > > > > Anybody running FreeBSD given it a shot just to see? I have been > > > > > thinking about it to see if it fixes my UFS problems that are seemingly > > > > > unrepairable. > > > > > > > > UFS problem? > > > > > > He's talking about his "free xxx isn't" race condition errors. > > > > What exactly is it about this condition that makes it occur on some > > machines? I don't see it on a 16GB and a 8GB news spool here. No > > corruption problmes either (although it was not clear to me, whether the > > corruption is just a result of the panic, or just another effect of this > > problem). > > > > > I'm not sure what the problem is - but it seems to be timing related. > > I can readily reproduce the newfs-doesn't-write-zeros problem on two > different 386 machines, one with IDE, one with SCSI. Jaye appears to > have the problem on his news machine. I definitely have it on my > news machine. > > I thought it might be related to the number of elements off of the > vnode free list - but when doing a newfs (during a clean install); that > number seems to be fixed at around 1. > > I also thought it may simply be a problem with writing blocks around > a multiple of the cluster size; but that doesn't seem to be the case, > as I have followed the write()s in newfs all the way to the SCSI driver. > > Here's what I currently believe - Somewhere; a buffer is being lost. > The loss of the buffer is timing dependent; because judicions printf()s > in the kernel alter the timing (and the stack) and cause things to work > correctly. [Of course, since the stack is being altered, it could also > be a stack corruption problem...] > > That is how, I believe, only an unlucky few have had the pleasure > of this problem... > > I have reproduced this on a dedicated machine now. If you (or anyone > else) would like access to that machine to try and solve it - just > let me know! If you let me know, we can set up a time where you can > get to it from the net... > > - Dave Rivers - >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.95.970624094321.26020A-100000>