Date: Fri, 18 Oct 2002 23:44:14 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Maxim Sobolev <sobomax@FreeBSD.ORG>, hackers@FreeBSD.ORG Subject: Re: Patch to allow a driver to report unrecoverable write errors to the buf layer Message-ID: <3DB0FF3E.E4096707@mindspring.com> References: <3DB048B5.21097613@FreeBSD.org> <200210181807.g9II7cBY024485@apollo.backplane.com> <3DB0516F.9BE00F57@FreeBSD.org> <200210181835.g9IIZsBX061970@apollo.backplane.com> <20021019051202.GB14922@vega.vega.com> <200210190613.g9J6Debh023134@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote: > :Hmm, the current approach doesn't look all that "right" to me, because we are > :retrying operation even though the upper-layer code that initiated it was > :already notified about the failure (e.g. received EIO), so that it should not > :assume that the data was actually written successfully. Or I am missing > :something? > > Yah, most writes issued through the buffer cache are asynchronous or > delayed. So the VFS layer that initiated the write is not necessarily > going to be notified of a failure. Thus the failure notification does > not help us here. First off, the failure notification needs to cascase all the way up: o retry by disk electronics o retry by controller o retry by driver o retry by FS o retry by application At any one of those layers, you could insert a "media perfection layer"; for example, using "GEOM", you could insert BAD144 support between the FS and the driver. The argument about the failure not going to the request that caused the failure is bogus; if we are actually talking about a request to a file where the semantics of the write are such that the request returns successfully before the write is guaranteed to be successful, OK: then the failure is there for the next operation. And that's fine, and reasonable. If people care about their data, they will use synchronous I/O, or they will use an asynchronous I/O interface with explicit completion notification (e.g. "aiowrite"). If they don't care about their data, then signalling a failure preemptively on the next attempt -- e.g. by closing a descriptor out from under them, or marking it read-only following a write failure -- is the right thing to do. In reality, people *do* care, even when they say they don't care... or they would be opening /dev/null instead of a file, to receive their writes. All we are really arguing about here is delayed notification because of intentional acceptance of bogus semantics surrounding the commit to stable storage. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DB0FF3E.E4096707>
