From owner-freebsd-hackers  Sat Feb 20 14:52:54 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP id B98CB11A71
	for <freebsd-hackers@FreeBSD.ORG>; Sat, 20 Feb 1999 14:52:35 -0800 (PST)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id PAA20948;
	Sat, 20 Feb 1999 15:52:34 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpd020893; Sat Feb 20 15:52:27 1999
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id PAA19845;
	Sat, 20 Feb 1999 15:52:21 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199902202252.PAA19845@usr08.primenet.com>
Subject: Re: Panic in FFS/4.0 as of yesterday
To: dfr@nlsystems.com (Doug Rabson)
Date: Sat, 20 Feb 1999 22:52:20 +0000 (GMT)
Cc: dillon@apollo.backplane.com, freebsd-hackers@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.05.9902202124420.82049-100000@herring.nlsystems.com> from "Doug Rabson" at Feb 20, 99 09:35:40 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> I always thought that the vnodes were locked that way during lookup to
> avoid more serious problems but I have never done the analysis to figure
> it out.  Certainly there are some tricky cases in the way that lookup is
> used to prepare for a subsequent create or rename (but that isn't the
> issue here I think).

See the rename code.


> If it works, then changing lookup to not require locks on both vnodes at
> the same time would be a good thing.  One of the reasons that NFS doesn't
> have proper node locks is that a dead NFS server can lead to a hung
> machine though a lock cascade from the NFS mount point.

The correct way to do this, IMO, is a back-off/retry, which would
unlock the lock and queue the operation for retry, which would
reacquire the lock.

I swear I saw code in NFS to do this.  Maybe it was pre-BSD4.4.

> >     Maybe.  The difference is that the I/O topology is not known at the
> >     higher levels where the I/O gets queued, so it would be more difficult
> >     to calculate what the async limit should be in a scaleable way.
> 
> Understood.  I played with a few non-solutions, limiting i/o on a mount
> point and on a vnode to an arbitrary limit but wasn't able to make a real
> difference to the responsiveness of the test.
> 
> It does seem wrong that a single writer process can generate arbitrary
> amounts of latency (essentially only bounded by the number of available
> buffers) for other clients on the same drive. Ideally the driver should be
> able to propagate its 'queue full' signals up to the bio system but I
> can't see a way of doing that easily in the current code.

If the queue gets full, then the disk gets busy.  In which case, you
could convert to using sync writes for all pending directory operations.

It should be pretty easy to do this kludge by (1) deciding how many
waiters is "too many", and (2) checking if the mount is async.  This
would avoid propagating the changes up.

I really think, though, that the correct fix is to flag the async
writes for directory data in the code, and then when you do the lower
level queue insertion, insert them ahead, per my other posting.

I personally like the name "EXPEDITE" for the flag.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message