From owner-freebsd-hackers  Sat Feb 20 13:36:43 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from herring.nlsystems.com (nlsys.demon.co.uk [158.152.125.33])
	by hub.freebsd.org (Postfix) with ESMTP id BA3341193D
	for <freebsd-hackers@FreeBSD.ORG>; Sat, 20 Feb 1999 13:36:39 -0800 (PST)
	(envelope-from dfr@nlsystems.com)
Received: from localhost (dfr@localhost)
	by herring.nlsystems.com (8.9.3/8.8.8) with ESMTP id VAA54093;
	Sat, 20 Feb 1999 21:35:40 GMT
	(envelope-from dfr@nlsystems.com)
Date: Sat, 20 Feb 1999 21:35:40 +0000 (GMT)
From: Doug Rabson <dfr@nlsystems.com>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: freebsd-hackers@FreeBSD.ORG
Subject: Re: Panic in FFS/4.0 as of yesterday
In-Reply-To: <199902202056.MAA11068@apollo.backplane.com>
Message-ID: <Pine.BSF.4.05.9902202124420.82049-100000@herring.nlsystems.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sat, 20 Feb 1999, Matthew Dillon wrote:

> :Jacob's bulk writing test and I can see what is happening (although I'm
> :not sure what to do about it).
> :
> :The system is unresponsive because the root inode is locked virtually all
> :of the time and this is because of a lock cascade leading to a single
> :process which is trying to rewrite a block of the directory which the test
> :is running in (synchronously since the fs is not using softupdates). That
> :process is waiting for its i/o to complete before unlocking the directory.
> :Unfortunately the buffer is the last on the drive's buffer queue and there
> :are 647 (for one instance which I examined in the debugger) buffers ahead
> :of it, most of which are writing about 8k. About 4Mb of buffers on the
> :queue are from a *single* process which seems extreme.
> 
>     There isn't much we can do except to try to fix the lock cascade that
>     occurs in namei and lookup.  The problem is that the lower level vnode
>     is locked before the parent vnode is released.
> 
>     What if we simply bumped the vnode's v_holdcnt or v_usecount in lookup
>     instead of lock it, and then have the parent namei unlock the parent vnode
>     prior to gaining a lock on the new vnode in its loop?
> 
>     This would limit the locking cascade to one vnode worst case.  We would
>     have to allow the bumping up and down of v_usecount by independant
>     processes while an exclusive lock is held on it.

I always thought that the vnodes were locked that way during lookup to
avoid more serious problems but I have never done the analysis to figure
it out.  Certainly there are some tricky cases in the way that lookup is
used to prepare for a subsequent create or rename (but that isn't the
issue here I think).

If it works, then changing lookup to not require locks on both vnodes at
the same time would be a good thing.  One of the reasons that NFS doesn't
have proper node locks is that a dead NFS server can lead to a hung
machine though a lock cascade from the NFS mount point.

> 
> :It seems to me that there should be a mechanism to prevent the queued i/o
> :lists from becoming so long (over 5Mb is queued on the machine which I
> :have in the debugger), perhaps by throttling the writers if they start too
> :much asynchronous i/o.  I wonder if this can be treated as a similar
> :problem to the swapper latency issues which John Dyson was talking about.
> :--
> 
>     Maybe.  The difference is that the I/O topology is not known at the
>     higher levels where the I/O gets queued, so it would be more difficult
>     to calculate what the async limit should be in a scaleable way.

Understood.  I played with a few non-solutions, limiting i/o on a mount
point and on a vnode to an arbitrary limit but wasn't able to make a real
difference to the responsiveness of the test.

It does seem wrong that a single writer process can generate arbitrary
amounts of latency (essentially only bounded by the number of available
buffers) for other clients on the same drive. Ideally the driver should be
able to propagate its 'queue full' signals up to the bio system but I
can't see a way of doing that easily in the current code.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 442 9037


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message