From owner-freebsd-stable  Thu Apr 11 12:57: 6 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from HAL9000.wox.org (12-232-222-90.client.attbi.com [12.232.222.90])
	by hub.freebsd.org (Postfix) with ESMTP
	id C7A4B37B400; Thu, 11 Apr 2002 12:56:58 -0700 (PDT)
Received: (from das@localhost)
	by HAL9000.wox.org (8.11.6/8.11.6) id g3BJupI19289;
	Thu, 11 Apr 2002 12:56:51 -0700 (PDT)
	(envelope-from das)
Date: Thu, 11 Apr 2002 12:56:51 -0700
From: David Schultz <dschultz@uclink.Berkeley.EDU>
To: Ian Dowse <iedowse@maths.tcd.ie>
Cc: Coleman Kane <cokane@FreeBSD.org>, Bob Bishop <rb@gid.co.uk>,
	stable@FreeBSD.org
Subject: Re: very old bug
Message-ID: <20020411125651.A19217@HAL9000.wox.org>
Mail-Followup-To: Ian Dowse <iedowse@maths.tcd.ie>,
	Coleman Kane <cokane@FreeBSD.org>, Bob Bishop <rb@gid.co.uk>,
	stable@FreeBSD.org
References: <20020411100223.A64698@freebsd.org> <200204111604.aa64453@salmon.maths.tcd.ie>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200204111604.aa64453@salmon.maths.tcd.ie>; from iedowse@maths.tcd.ie on Thu, Apr 11, 2002 at 04:04:42PM +0100
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

Thus spake Ian Dowse <iedowse@maths.tcd.ie>:
> If an error occurs when the write is finally attempted, the buffer
> cache is left in a predicament because it has no way to inform the
> filesystem of the problem, and it can't just throw away the data
> without risking serious filesystem corruption, and confusion within
> the filesystem code. The two remaining options are to keep retrying
> the write in the hope that it will eventually succeed, or panic.
> 
> About the only other safe thing to do would be to completely
> disassociate the failing device from the filesystem and throw away
> any unwritten data. If the filesystem can handle the device going
> away like this without panicking, then maybe the user might be able
> to unmount it and contunue.

Thanks for the explanation.  It still bugs me that the failure occurs
when attempting to *read* a directory on which a write attempt failed.
Presumably the data of interest is still sitting in the buffer cache
and can be read without doing any I/O, so the filesystem code
shouldn't notice the problem.  At worst, the buffer is locked, but
then the read request should block or fail.  It doesn't seem to be
doing either; the filesystem code just loops as it attempts to read.

Would a partial solution be to simply cause all I/O requests for a
particular filesystem to fail as soon as the first write failure is
detected?  Then the user would be forced to unmount, at which point
the leftover dirty buffers could be discarded without risk of
corrupting the filesystem.  If this approach is unreasonable, the
buffer cache should at least be able to support the illusion that
everything is working fine, at least as long as there are enough free
buffers.  What does NFS do when the network link goes down?

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message