Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Feb 1999 23:12:45 +0000 (GMT)
From:      Doug Rabson <dfr@nlsystems.com>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        dillon@apollo.backplane.com, freebsd-hackers@FreeBSD.ORG
Subject:   Re: Panic in FFS/4.0 as of yesterday
Message-ID:  <Pine.BSF.4.05.9902202305130.82049-100000@herring.nlsystems.com>
In-Reply-To: <199902202252.PAA19845@usr08.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 20 Feb 1999, Terry Lambert wrote:

> > I always thought that the vnodes were locked that way during lookup to
> > avoid more serious problems but I have never done the analysis to figure
> > it out.  Certainly there are some tricky cases in the way that lookup is
> > used to prepare for a subsequent create or rename (but that isn't the
> > issue here I think).
> 
> See the rename code.
> 
> 
> > If it works, then changing lookup to not require locks on both vnodes at
> > the same time would be a good thing.  One of the reasons that NFS doesn't
> > have proper node locks is that a dead NFS server can lead to a hung
> > machine though a lock cascade from the NFS mount point.
> 
> The correct way to do this, IMO, is a back-off/retry, which would
> unlock the lock and queue the operation for retry, which would
> reacquire the lock.
> 
> I swear I saw code in NFS to do this.  Maybe it was pre-BSD4.4.

I'm pretty sure it has never done this as long as I have been involved
with NFS.  I remember a (probably private) discussion with Rick Macklem a
long time ago where he explained some of the problems with using exclusive
node locks in NFS.  I'm not sure how a filesystem can unilaterally decide
to release an asserted lock.  It seems to me that this policy can only
happen at the level of the client code (otherwise the value of the lock
disappears).

> 
> > >     Maybe.  The difference is that the I/O topology is not known at the
> > >     higher levels where the I/O gets queued, so it would be more difficult
> > >     to calculate what the async limit should be in a scaleable way.
> > 
> > Understood.  I played with a few non-solutions, limiting i/o on a mount
> > point and on a vnode to an arbitrary limit but wasn't able to make a real
> > difference to the responsiveness of the test.
> > 
> > It does seem wrong that a single writer process can generate arbitrary
> > amounts of latency (essentially only bounded by the number of available
> > buffers) for other clients on the same drive. Ideally the driver should be
> > able to propagate its 'queue full' signals up to the bio system but I
> > can't see a way of doing that easily in the current code.
> 
> If the queue gets full, then the disk gets busy.  In which case, you
> could convert to using sync writes for all pending directory operations.

This is what happens for NFS (since Shimokawa-san and I rewrote the
queueing system in the NFS client a couple of years ago). In the NFS
client, the queue lengths are capped at 2*#iod and any i/o after that
threshold is synchronous.  This was possible since the i/o implementation
(rpc) was part of the filesystem.  Local filesystems need extra
communication channels with the underlying storage provider.

> 
> It should be pretty easy to do this kludge by (1) deciding how many
> waiters is "too many", and (2) checking if the mount is async.  This
> would avoid propagating the changes up.

Deciding how many is "too many" is the hard part.  It needs feedback from
the driver really.

> 
> I really think, though, that the correct fix is to flag the async
> writes for directory data in the code, and then when you do the lower
> level queue insertion, insert them ahead, per my other posting.
> 
> I personally like the name "EXPEDITE" for the flag.  8-).

I think promoting directory writes (and directory reads probably) might be
the simplest solution.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 442 9037





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9902202305130.82049-100000>