From owner-freebsd-hackers  Thu Feb 11 07:01:33 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id HAA27739
          for freebsd-hackers-outgoing; Thu, 11 Feb 1999 07:01:33 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from feral.com (feral.com [192.67.166.1])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA27733
          for <freebsd-hackers@FreeBSD.ORG>; Thu, 11 Feb 1999 07:01:31 -0800 (PST)
          (envelope-from mjacob@feral.com)
Received: from localhost (mjacob@localhost)
	by feral.com (8.8.7/8.8.7) with ESMTP id HAA13129;
	Thu, 11 Feb 1999 07:01:17 -0800
Date: Thu, 11 Feb 1999 07:01:17 -0800 (PST)
From: Matthew Jacob <mjacob@feral.com>
X-Sender: mjacob@feral-gw
Reply-To: mjacob@feral.com
To: Matthew Dillon <dillon@apollo.backplane.com>
cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: softupdates
In-Reply-To: <199902101949.LAA85603@apollo.backplane.com>
Message-ID: <Pine.LNX.4.04.9902110650060.13093-100000@feral-gw>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


> :As a consequence, FreeBSD lost out for being considered a candidate
> :at NASA/Ames for large mass storage. Shrug...It may or may not be
> :true that softupdates, per se, are stable. In my opinion, FFS as
> :offered by FreeBSD (and NetBSD) have not shown themselves to be
> :adequate to large (>500GB) filesystems. Sad to say, ext2 under
> :linux works better.
> 
>     Matt, I don't recall seeing anything from you in regards to
>     large filesystems.  Looking in the archives, I see one report
>     on Jan 27th from you relating to softupdates, but you indicate
>     that softupdates was not enabled on the volume in question,
>     so it seems unlikely that it is related to softupdates specifically.

I was misremembering. Softupdates wasn't the issue here but there *had*
been some question about whether this was a problem with softupdates *not*
enabled- that was from way earlier- sorry for jumping into the wrong
thread and sorry for also probably letting a less than felicitous tone
enter this email- I think I'm just really irritated that I didn't get mail
back from Luoqi.

> 
>     There have been several reports of dirty-buffer panics which is
>     of concern, but I haven't been able to reproduce the panic myself
>     yet.

Really? I found it fairly easy to reproduce in that every time I had a
large filesystem (e.g. > 10GB) and ran some fairly simple tests that
exercise VM and Filesystem interactions (simple code available on request-
NetBSD and FreeBSD fail in interesting and uninteresting ways. Solaris and
Linux pass unless there's an underlying h/w problem). I stopped testing
when I wasn't getting any response on this (see below for why)- but if
somebody is thinking of attacking this problem again I can certainly do
some testing (I may not be able to make test systems directly available
because of some NASA policies but I can probably snag some systems for
myself to run tests with- I have a couple of FreeBSD systems running
NASA/Ames still plus the ones I have at Feral (although Feral's disk
resources are substantially *less* than NASA/Ames!)).

Here's a couple of private emails that listed problems I was seeing. It
really seems to live down in the realloc blocks code. I wasn't subscribed
to freebsd-hackers back then, so I probably just didn't get this problem
announced to the right group.


++Date: Thu, 10 Dec 1998 09:01:39 -0800 (PST)
++From: Matthew Jacob <mjacob@feral.com>
++To: Jordan K. Hubbard <jkh@zippy.cdrom.com>
++Cc: Mike Smith <mike@smith.net.au>, Justin T. Gibbs <gibbs@plutotech.com>,
++dg@root.com
++Subject: More panics...
++
++
++I have not been able to arrange either a GDB line for this test system,
++nor have I gotten access for others to get to it as yet.
++
++However, with a kernel built from sources around that of Monday night or
++so, the same testing got:
++
++swap_pager: suggest more swap space: 1028 MB
++panic: ffs_reallocblks: unallocated block 1
++
++
++With a stack of:
++        ffs_reallocblks
++        cluster_write
++        ffs_write
++        vn_write
++        write
++
++Is this of interest considering all the recent commits for the ufs code
++and new fsck etc.? This was with even a 'normal' (1K/8K) but quite large
++filesystem. And no, I don't have softupdates on- I just have whatever
++comes out of the box...

++Date: Wed, 16 Dec 1998 08:53:49 -0800 (PST)
++From: Matthew Jacob <mjacob@feral.com>
++To: Mike Smith <mike@smith.net.au>
++Cc: Jordan K. Hubbard <jkh@zippy.cdrom.com>, Justin T. Gibbs <gibbs@plutotech.com>, dg@root.com
++Subject: Re: More panics... 
++
++
++Neither Luoqi nor Kirk responded. I've had systems panic and crash even
++with the realloc blocks code turn off (somewhere in the ffs alloc layer).
++
++I have been unsuccessful in getting 'official' remote access to some of
++the large systems because the person in charge has pointed out that there
++is no 'code sharing' Space Act agreement with FreeBSD as there is with
++NetBSD- this is a disappointment. It also was unhelpful that the problem
++couldn't be addressed because the 250GB dump server now is running
++NetBSD-current because it stayed up and was more stable than
++FreeBSD-current. Damn- I'm having bad luck in getting some buyin at NAS
++(which is suffering from budget problems but still has a hell of a lot of
++iron to play with).
++
++With a modest amount of h/w involved (e.g. a 2x180Mhz PPro) and one fast
++disk, this problem seems to be reproducible. I'm not a filesystem guy (if
++I'm *anything* from having spread myself too thinly)- so I haven't cycles
++nor intelligence (probably) to nail it. Whom really will own this problem?
++I'd rank this is as far more serious than NFS issues.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message