Date: Tue, 24 Jun 1997 08:05:20 -0400 (EDT) From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> To: ponds!lambert.org!terry, ponds!sdf.com!tom Cc: ponds!FreeBSD.ORG!hackers, ponds!cdsnet.net!mrcpu Subject: panics/file system corruption - was Re: OpenBSD Message-ID: <199706241205.IAA04547@lakes.water.net>
next in thread | raw e-mail | index | archive | help
I just tripped over this; thought I would try to take a stab at an answer.... > On Fri, 20 Jun 1997, Terry Lambert wrote: > > > > > Anybody running FreeBSD given it a shot just to see? I have been > > > > thinking about it to see if it fixes my UFS problems that are seemingly > > > > unrepairable. > > > > > > UFS problem? > > > > He's talking about his "free xxx isn't" race condition errors. > > What exactly is it about this condition that makes it occur on some > machines? I don't see it on a 16GB and a 8GB news spool here. No > corruption problmes either (although it was not clear to me, whether the > corruption is just a result of the panic, or just another effect of this > problem). > > > Terry Lambert > > terry@lambert.org > > --- > > Any opinions in this posting are my own and not those of my present > > or previous employers. > > > > > > Tom > I'm not sure what the problem is - but it seems to be timing related. I can readily reproduce the newfs-doesn't-write-zeros problem on two different 386 machines, one with IDE, one with SCSI. Jaye appears to have the problem on his news machine. I definitely have it on my news machine. I thought it might be related to the number of elements off of the vnode free list - but when doing a newfs (during a clean install); that number seems to be fixed at around 1. I also thought it may simply be a problem with writing blocks around a multiple of the cluster size; but that doesn't seem to be the case, as I have followed the write()s in newfs all the way to the SCSI driver. Here's what I currently believe - Somewhere; a buffer is being lost. The loss of the buffer is timing dependent; because judicions printf()s in the kernel alter the timing (and the stack) and cause things to work correctly. [Of course, since the stack is being altered, it could also be a stack corruption problem...] That is how, I believe, only an unlucky few have had the pleasure of this problem... I have reproduced this on a dedicated machine now. If you (or anyone else) would like access to that machine to try and solve it - just let me know! If you let me know, we can set up a time where you can get to it from the net... - Dave Rivers -
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199706241205.IAA04547>