From owner-freebsd-fs@FreeBSD.ORG Wed Mar 28 01:38:50 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 334D616A4D5 for ; Wed, 28 Mar 2007 01:38:50 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.freebsd.org (Postfix) with ESMTP id A6EF213C46C for ; Wed, 28 Mar 2007 01:38:49 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id 991E410B298; Wed, 28 Mar 2007 11:38:43 +1000 (EST) Received: from besplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (Postfix) with ESMTP id A488B27429; Wed, 28 Mar 2007 11:38:46 +1000 (EST) Date: Wed, 28 Mar 2007 11:38:44 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ivan Voras In-Reply-To: Message-ID: <20070328100536.S6916@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: gvirstor & UFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2007 01:38:50 -0000 On Wed, 28 Mar 2007, Ivan Voras wrote: > I'm having trouble recovering from "ENOSPC" situation in gvirstor - when > there's enough space on the virtual device, but not enough physical > space. No matter what I return to the upper layers (UFS), including EIO, > it seems to keep on retrying, spitting enormous amounts of messages to > the kernel log in g_vfs_done (and during this the console is stuck). > This is the same problems reported by testers some time ago. > > Maybe the solution is as simple as sticking a check for ENOSPC somewhere > in UFS code to make it a special case, but I doubt it's that simple. The > required behaviour is to, if this condition is reached, drop the current > IO request (returning appropriate errno to the application) and stop > retrying (or at least ignore further error messages) until the condition > is cleared. The following old patch may help. vfs retries too hard after write errors. Retrying after EIO is bad enough (since most parts of the kernel still expect the old treatment of not retrying), but retrying after a non-recoverable error is just a bug. %%% Index: vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.436 diff -u -2 -r1.436 vfs_bio.c --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436 +++ vfs_bio.c 17 Apr 2005 05:00:21 -0000 @@ -1222,19 +1312,16 @@ s = splbio(); - if (bp->b_iocmd == BIO_WRITE && - (bp->b_ioflags & BIO_ERROR) && - !(bp->b_flags & B_INVAL)) { + if (bp->b_iocmd == BIO_WRITE && (bp->b_ioflags & BIO_ERROR) && + (bp->b_flags & B_INVAL) == 0 && + (bp->b_error == EIO || bp->b_error == 0)) { /* * Failed write, redirty. Must clear BIO_ERROR to prevent - * pages from being scrapped. If B_INVAL is set then - * this case is not run and the next case is run to - * destroy the buffer. B_INVAL can occur if the buffer - * is outside the range supported by the underlying device. + * pages from being scrapped. */ bp->b_ioflags &= ~BIO_ERROR; bdirty(bp); } else if ((bp->b_flags & (B_NOCACHE | B_INVAL)) || - (bp->b_ioflags & BIO_ERROR) || - bp->b_iocmd == BIO_DELETE || (bp->b_bufsize <= 0)) { + bp->b_ioflags & BIO_ERROR || + bp->b_iocmd == BIO_DELETE || bp->b_bufsize <= 0) { /* * Either a failed I/O or we were asked to free or not %%% The main point here is to only redirty if the error is EIO. Other changes: - also redirty if the error is 0 (but BIO_ERROR is set). This case shouldn't happen, but if it does then there is no way to determine the error type, so play safe and retry. - fix some style bugs. - remove the comments about B_INVAL handling. This is probably wrong. However, at least with the check of b_error, the special handling of B_INVAL is probably unnecessary -- the code that set B_INVAL can just set b_error to something like EINVAL to avoid the redirtying, and this might already happen. Lower layers could also avoid the redirtying by setting B_INVAL, but I think they mostly aren't at a level that can know when to do this. I think B_INVAL is set mainly by nfs, and nfs can know when to do this better than most places because all its layers are combined. This patch at least used to help in at least one case, where an error is returned for writes beyond EOF in the case of non-virtual disks. Old versions of FreeBSD return the bogus errno EINVAL for reads and writes strictly beyond the end of the disk, so the above helps. The current version of FreeBSD returns the even more bogus errno EIO for reads and writes strictly beyond the end of the disk, so the above won't help (for non-virtual disks). Old and current versions of FreeBSD return the bogus non-error of 0 for both reads writes exactly at the end of the disk. write(2) cannot return 0, but does due to this error. dd treats this non-error as EOF and doesn't retry. The correct behaviour is to return 0 (EOF/no error) for reads at or beyond the end of the disk, and ENOSPC for writes at or beyond the end of the disk, the same as happens for regular files (non-extendible ones for the case of writes). For virtual disks, there is some chance of getting the correct error (ENOSPC) for writes beyond the end and for this error to be passed up to the vfs layer, so the above would help. I once debugged error handling near this for vnode-backed md disks: o a bug in ffs (operating on a file system on the md disk) resulted in garbage block numbers (for the backing vnode) o bounds checking in geom was apparently broken (probably due to an overflow bug in geom or md), so the I/O request got as far as ffs o ffs (operating on the backing vnode) detected the garbage and returned EFBIG o without the above patch, writing of blocks with garbage block numbers was retried endlessly. The ffs bug is fixed now, and if the bounds checking in geom is fixed too then the I/O request won't get as far as ffs -- g_io_check() will return the bogus errno of EIO, and the above patch won't help. Bruce