From owner-freebsd-hackers Tue Jun 24 08:22:20 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id IAA19734 for hackers-outgoing; Tue, 24 Jun 1997 08:22:20 -0700 (PDT) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id IAA19711 for ; Tue, 24 Jun 1997 08:22:15 -0700 (PDT) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA05445; Tue, 24 Jun 1997 11:21:44 -0400 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Tue, 24 Jun 1997 11:21 EDT Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.5/8.7.3) with ESMTP id HAA03559; Tue, 24 Jun 1997 07:56:55 -0400 (EDT) Received: (from rivers@localhost) by lakes.water.net (8.8.5/8.6.9) id IAA04547; Tue, 24 Jun 1997 08:05:20 -0400 (EDT) Date: Tue, 24 Jun 1997 08:05:20 -0400 (EDT) From: Thomas David Rivers Message-Id: <199706241205.IAA04547@lakes.water.net> To: ponds!lambert.org!terry, ponds!sdf.com!tom Subject: panics/file system corruption - was Re: OpenBSD Cc: ponds!FreeBSD.ORG!hackers, ponds!cdsnet.net!mrcpu Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk I just tripped over this; thought I would try to take a stab at an answer.... > On Fri, 20 Jun 1997, Terry Lambert wrote: > > > > > Anybody running FreeBSD given it a shot just to see? I have been > > > > thinking about it to see if it fixes my UFS problems that are seemingly > > > > unrepairable. > > > > > > UFS problem? > > > > He's talking about his "free xxx isn't" race condition errors. > > What exactly is it about this condition that makes it occur on some > machines? I don't see it on a 16GB and a 8GB news spool here. No > corruption problmes either (although it was not clear to me, whether the > corruption is just a result of the panic, or just another effect of this > problem). > > > Terry Lambert > > terry@lambert.org > > --- > > Any opinions in this posting are my own and not those of my present > > or previous employers. > > > > > > Tom > I'm not sure what the problem is - but it seems to be timing related. I can readily reproduce the newfs-doesn't-write-zeros problem on two different 386 machines, one with IDE, one with SCSI. Jaye appears to have the problem on his news machine. I definitely have it on my news machine. I thought it might be related to the number of elements off of the vnode free list - but when doing a newfs (during a clean install); that number seems to be fixed at around 1. I also thought it may simply be a problem with writing blocks around a multiple of the cluster size; but that doesn't seem to be the case, as I have followed the write()s in newfs all the way to the SCSI driver. Here's what I currently believe - Somewhere; a buffer is being lost. The loss of the buffer is timing dependent; because judicions printf()s in the kernel alter the timing (and the stack) and cause things to work correctly. [Of course, since the stack is being altered, it could also be a stack corruption problem...] That is how, I believe, only an unlucky few have had the pleasure of this problem... I have reproduced this on a dedicated machine now. If you (or anyone else) would like access to that machine to try and solve it - just let me know! If you let me know, we can set up a time where you can get to it from the net... - Dave Rivers -