From owner-freebsd-stable Thu Apr 11 12:57: 6 2002 Delivered-To: freebsd-stable@freebsd.org Received: from HAL9000.wox.org (12-232-222-90.client.attbi.com [12.232.222.90]) by hub.freebsd.org (Postfix) with ESMTP id C7A4B37B400; Thu, 11 Apr 2002 12:56:58 -0700 (PDT) Received: (from das@localhost) by HAL9000.wox.org (8.11.6/8.11.6) id g3BJupI19289; Thu, 11 Apr 2002 12:56:51 -0700 (PDT) (envelope-from das) Date: Thu, 11 Apr 2002 12:56:51 -0700 From: David Schultz To: Ian Dowse Cc: Coleman Kane , Bob Bishop , stable@FreeBSD.org Subject: Re: very old bug Message-ID: <20020411125651.A19217@HAL9000.wox.org> Mail-Followup-To: Ian Dowse , Coleman Kane , Bob Bishop , stable@FreeBSD.org References: <20020411100223.A64698@freebsd.org> <200204111604.aa64453@salmon.maths.tcd.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200204111604.aa64453@salmon.maths.tcd.ie>; from iedowse@maths.tcd.ie on Thu, Apr 11, 2002 at 04:04:42PM +0100 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Ian Dowse : > If an error occurs when the write is finally attempted, the buffer > cache is left in a predicament because it has no way to inform the > filesystem of the problem, and it can't just throw away the data > without risking serious filesystem corruption, and confusion within > the filesystem code. The two remaining options are to keep retrying > the write in the hope that it will eventually succeed, or panic. > > About the only other safe thing to do would be to completely > disassociate the failing device from the filesystem and throw away > any unwritten data. If the filesystem can handle the device going > away like this without panicking, then maybe the user might be able > to unmount it and contunue. Thanks for the explanation. It still bugs me that the failure occurs when attempting to *read* a directory on which a write attempt failed. Presumably the data of interest is still sitting in the buffer cache and can be read without doing any I/O, so the filesystem code shouldn't notice the problem. At worst, the buffer is locked, but then the read request should block or fail. It doesn't seem to be doing either; the filesystem code just loops as it attempts to read. Would a partial solution be to simply cause all I/O requests for a particular filesystem to fail as soon as the first write failure is detected? Then the user would be forced to unmount, at which point the leftover dirty buffers could be discarded without risk of corrupting the filesystem. If this approach is unreasonable, the buffer cache should at least be able to support the illusion that everything is working fine, at least as long as there are enough free buffers. What does NFS do when the network link goes down? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message