From owner-freebsd-hackers  Tue Jun 24 08:22:20 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id IAA19734
          for hackers-outgoing; Tue, 24 Jun 1997 08:22:20 -0700 (PDT)
Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id IAA19711
          for <hackers@FreeBSD.ORG>; Tue, 24 Jun 1997 08:22:15 -0700 (PDT)
Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02)
	id AA05445; Tue, 24 Jun 1997 11:21:44 -0400
Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Tue, 24 Jun 1997 11:21 EDT
Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.5/8.7.3) with ESMTP id HAA03559; Tue, 24 Jun 1997 07:56:55 -0400 (EDT)
Received: (from rivers@localhost) by lakes.water.net (8.8.5/8.6.9) id IAA04547; Tue, 24 Jun 1997 08:05:20 -0400 (EDT)
Date: Tue, 24 Jun 1997 08:05:20 -0400 (EDT)
From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
Message-Id: <199706241205.IAA04547@lakes.water.net>
To: ponds!lambert.org!terry, ponds!sdf.com!tom
Subject: panics/file system corruption - was Re: OpenBSD
Cc: ponds!FreeBSD.ORG!hackers, ponds!cdsnet.net!mrcpu
Content-Type: text
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


I just tripped over this; thought I would try to take a stab
at an answer....

> On Fri, 20 Jun 1997, Terry Lambert wrote:
> 
> > > > Anybody running FreeBSD given it a shot just to see?  I  have been
> > > > thinking about it to see if it fixes my UFS problems that are seemingly
> > > > unrepairable.
> > > 
> > >   UFS problem?
> > 
> > He's talking about his "free xxx isn't" race condition errors.
> 
>   What exactly is it about this condition that makes it occur on some
> machines?  I don't see it on a 16GB and a 8GB news spool here.  No
> corruption problmes either (although it was not clear to me, whether the
> corruption is just a result of the panic, or just another effect of this
> problem).
> 
> > 					Terry Lambert
> > 					terry@lambert.org
> > ---
> > Any opinions in this posting are my own and not those of my present
> > or previous employers.
> > 
> > 
> 
> Tom
> 


  I'm not sure what the problem is - but it seems to be timing related.

  I can readily reproduce the newfs-doesn't-write-zeros problem on two
 different 386 machines, one with IDE, one with SCSI.  Jaye appears to
 have the problem on his news machine.  I definitely have it on my
 news machine.  

  I thought it might be related to the number of elements off of the
 vnode free list - but when doing a newfs (during a clean install); that 
 number seems to be fixed at around 1.  

  I also thought it may simply be a problem with writing blocks around
 a multiple of the cluster size; but that doesn't seem to be the case,
 as I have followed the write()s in newfs all the way to the SCSI driver.

  Here's what I currently believe - Somewhere; a buffer is being lost.  
 The loss of the buffer is timing dependent; because judicions printf()s 
 in the kernel alter the timing (and the stack) and cause things to work 
 correctly. [Of course, since the stack is being altered, it could also
 be a stack corruption problem...]

  That is how, I believe, only an unlucky few have had the pleasure
 of this problem...

  I have reproduced this on a dedicated machine now.  If you (or anyone
 else) would like access to that machine to try and solve it - just
 let me know!  If you let me know, we can set up a time where you can
 get to it from the net...

	- Dave Rivers -