From owner-freebsd-hackers Fri Oct 18 10:43:21 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4EAA037B401; Fri, 18 Oct 2002 10:43:18 -0700 (PDT) Received: from baraca.united.net.ua (ns.united.net.ua [193.111.8.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6FAC543EA3; Fri, 18 Oct 2002 10:43:16 -0700 (PDT) (envelope-from sobomax@FreeBSD.org) Received: from vega.vega.com (xDSL-2-2.united.net.ua [193.111.9.226]) by baraca.united.net.ua (8.11.6/8.11.6) with ESMTP id g9IHh3d85417; Fri, 18 Oct 2002 20:43:05 +0300 (EEST) (envelope-from sobomax@FreeBSD.org) Received: from FreeBSD.org (big_brother.vega.com [192.168.1.1]) by vega.vega.com (8.12.6/8.12.5) with ESMTP id g9IHgwaJ010954; Fri, 18 Oct 2002 20:42:58 +0300 (EEST) (envelope-from sobomax@FreeBSD.org) Message-ID: <3DB048B5.21097613@FreeBSD.org> Date: Fri, 18 Oct 2002 20:45:25 +0300 From: Maxim Sobolev Organization: Vega International Capital X-Mailer: Mozilla 4.8 [en] (Windows NT 5.0; U) X-Accept-Language: en,uk,ru MIME-Version: 1.0 To: hackers@FreeBSD.org, dillon@FreeBSD.org Subject: Patch to allow a driver to report unrecoverable write errors to the buf layer Content-Type: multipart/mixed; boundary="------------23672A0561E832EE864612C2" Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This is a multi-part message in MIME format. --------------23672A0561E832EE864612C2 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Hi folks, I noticed that FreeBSD buf/bio subsystem has one very annoying problem - once the write request is ejected into it, and write operation failed, there seemingly no way valid to tell the layer to drop the buffer. Instead, it retries the attempt over and over again, until reboot, even though originator of request (usually vfs layer) was already notified about failure and propagated error condition to the underlying user-lever program. There is a very easy way to trigger the problem: insert blank floppy into your drive, format it with newfs_msdos, mount it, remove the disk from the drive without unmounting and do `touch /floppy/somefile'. You'll see that touch(1) fails with Input/Output error and the kernel reports write failure on the console. However, after couple of seconds you'll notice that the kernel tries to write exactly the same buffer again, then again ad infinitum. The same effect if you'll mount write-protected floppy in read/write mode. Moreover, such stale buffer prevents the fs from being unmounted (even forcefully) because before unmounting the kernel wants to ensure that all dirty buffers are flushed, thus blocking umount(8) forever in synchronization routine. OK, you can tell "well, don't do that!", and in this particular case I'd probably agree, but there at least few others situation in which such functionality would be very helpful: consider a machine, which has several disk drives mounted and suddenly one of the drives fails - it would be nice if the OS could at least try to withstand, or another example: a RAID array, which due to the failure of some stripes has been degraded into read-only mode, so that any write operation would cause above mentioned buf stall. Also in the era of P-n-P hardware (USB, FireWire etc), it is no longer safe to assume that the disk drive will be staying connected until the OS lets it go. Attached patch addresses the problem (with fd(4) only right now, but it should be trivial to extend other drivers) by allowing any device driver to inform the buf layer that unrecoverable error condition occurred during write operation, so that it is meaningless to do a retry. I would like to hear any comments or suggestions about my approach. Also it would be very nice to devise some way to propagate such error condition into vfs layer, so that the fs driver could act upon it somehow (e.g. degrade fs into read-only mode). Thanks! -Maxim --------------23672A0561E832EE864612C2 Content-Type: text/plain; charset=koi8-r; name="buf.noretry.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="buf.noretry.diff" Index: sys/bio.h =================================================================== RCS file: /home/ncvs/src/sys/sys/bio.h,v retrieving revision 1.122 diff -d -u -r1.122 bio.h --- sys/bio.h 9 Oct 2002 07:11:03 -0000 1.122 +++ sys/bio.h 18 Oct 2002 16:53:02 -0000 @@ -100,6 +100,15 @@ /* bio_flags */ #define BIO_ERROR 0x00000001 #define BIO_DONE 0x00000004 +#define BIO_NORETRY 0x00000008 /* Don't attempt to retry failed */ + /* operation. Should be set when */ + /* the underlying driver detected */ + /* some unrecoverable condition */ + /* e.g. fatal hardware failure, */ + /* forcefully ejected removable */ + /* media, media that has been made */ + /* write-protected, replaced with */ + /* another media etc. */ #define BIO_FLAG2 0x40000000 /* Available for local hacks */ #define BIO_FLAG1 0x80000000 /* Available for local hacks */ Index: kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.338 diff -d -u -r1.338 vfs_bio.c --- kern/vfs_bio.c 28 Sep 2002 17:46:30 -0000 1.338 +++ kern/vfs_bio.c 18 Oct 2002 16:53:05 -0000 @@ -2915,6 +2915,8 @@ return (EINTR); } if (bp->b_ioflags & BIO_ERROR) { + if (bp->b_ioflags & BIO_NORETRY) + bp->b_flags |= B_INVAL; return (bp->b_error ? bp->b_error : EIO); } else { return (0); Index: isa/fd.c =================================================================== RCS file: /home/ncvs/src/sys/isa/fd.c,v retrieving revision 1.241 diff -d -u -r1.241 fd.c --- isa/fd.c 2 Oct 2002 20:29:54 -0000 1.241 +++ isa/fd.c 18 Oct 2002 16:53:13 -0000 @@ -2530,6 +2530,8 @@ } if ((fd->options & FDOPT_NOERROR) == 0) { bp->bio_flags |= BIO_ERROR; + if (bp->bio_cmd == BIO_WRITE) + bp->bio_flags |= BIO_NORETRY; bp->bio_error = EIO; bp->bio_resid = bp->bio_bcount - fdc->fd->skip; } else --------------23672A0561E832EE864612C2-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message