From owner-freebsd-geom@freebsd.org Sat Nov 25 11:37:35 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B0F0DDF02E; Sat, 25 Nov 2017 11:37:35 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 0C77E7E234; Sat, 25 Nov 2017 11:37:35 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id 3507927347; Sat, 25 Nov 2017 11:37:27 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTP id vAPBbAan030380; Sat, 25 Nov 2017 11:37:11 GMT (envelope-from phk@phk.freebsd.dk) To: Scott Long cc: Andriy Gapon , FreeBSD FS , Warner Losh , freebsd-geom@freebsd.org Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom In-reply-to: From: "Poul-Henning Kamp" References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <30378.1511609830.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Sat, 25 Nov 2017 11:37:10 +0000 Message-ID: <30379.1511609830@critter.freebsd.dk> X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Nov 2017 11:37:35 -0000 -------- In message , Scott Long w= rites: > Why is overloading EIO so bad? brelse() will call bdirty() when a BIO_W= RITE > command has failed with EIO. Calling bdirty() has the effect of retryin= g the I/O. > This disregards the fact that disk drivers only return EIO when they=E2=80= =99ve decided > that the I/O cannot be retried. It has no termination condition for the= retries, and > will endlessly retry I/O in vain; I=E2=80=99ve seen this quite frequentl= y. The really annoying thing about this particular class of errors, is that if we propagated them up to the filesystems, very often things could be relocated to different blocks and we would avoid the unnecessary filesystem corruption. The real fundamental deficiency is that we do not have a way to say "give = up if this bio cannot be completed in X time" which is what people actually w= ant. That is suprisingly hard to provide, there are far too many corner-cases for me to enumerate them all, but let me just give one example: Imagine you issue a deadlined write to a RAID5 thing. Thee component writes happen smoothly, but the last two fail the deadline, with no way to predict how long time it will take before they complete or fail. * Does the bio write transaction fail ? * Does the bio write transaction time out ? * Do you attempt to complete the write to the RAID5 ? * Where do you store a copy of the data if you do ? * What happens next time a read happens on this bio's extent ? Then for an encore, imagine it was a read bio: Three DMAs go smoothly, two are outstanding and you don't know if/when they will complete/fail. * If you fail or time out the bio, how do you "taint" the space being read into until the two remaining DMAs are outstanding? * What if that space is mapped into userland ? * What if that space is being executed ? * What if one of the two outstanding DMAs later return garbage ? My conclusion back when I did GEOM, was that the only way to do something like this sanely, is to have a special GEOM do it for you, which always allocates a temp-space: allocate temp buffer if (write) copy write data to temp buffer issue bio downwards on temp buffer if timeout park temp buffer until biodone return(timeout) if (read) copy temp buffer to read space return (ok/error) -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= .