Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Nov 2017 07:57:55 -0700
From:      Scott Long <scottl@samsco.org>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        Warner Losh <imp@bsdimp.com>, FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-geom@freebsd.org
Subject:   Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom
Message-ID:  <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org>
In-Reply-To: <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org>
References:  <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <CANCZdfoE5UWMC6v4bbov6zizvcEMCbrSdGeJ019axCUfS_T_6w@mail.gmail.com> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Nov 24, 2017, at 6:34 AM, Andriy Gapon <avg@FreeBSD.org> wrote:
>=20
> On 24/11/2017 15:08, Warner Losh wrote:
>>=20
>>=20
>> On Fri, Nov 24, 2017 at 3:30 AM, Andriy Gapon <avg@freebsd.org
>> <mailto:avg@freebsd.org>> wrote:
>>=20
>>=20
>>    https://reviews.freebsd.org/D13224 =
<https://reviews.freebsd.org/D13224>;
>>=20
>>    Anyone interested is welcome to join the review.
>>=20
>>=20
>> I think it's a really bad idea. It introduces a 'one-size-fits-all' =
notion of
>> QoS that seems misguided. It conflates a shorter timeout with don't =
retry. And
>> why is retrying bad? It seems more a notion of 'fail fast' or so =
other concept.
>> There's so many other ways you'd want to use it. And it uses the same =
return
>> code (EIO) to mean something new. It's generally meant 'The lower =
layers have
>> retried this, and it failed, do not submit it again as it will not =
succeed' with
>> 'I gave it a half-assed attempt, and that failed, but resubmission =
might work'.
>> This breaks a number of assumptions in the BUF/BIO layer as well as =
parts of CAM
>> even more than they are broken now.
>>=20
>> So let's step back a bit: what problem is it trying to solve?
>=20
> A simple example.  I have a mirror, I issue a read to one of its =
members.  Let's
> assume there is some trouble with that particular block on that =
particular disk.
> The disk may spend a lot of time trying to read it and would still =
fail.  With
> the current defaults I would wait 5x that time to finally get the =
error back.
> Then I go to another mirror member and get my data from there.

There are many RAID stacks that already solve this problem by having a =
policy
of always reading all disk members for every transaction, and throwing =
away the
sub-transactions that arrive late.  It=E2=80=99s not a policy that is =
always desired, but it
serves a useful purpose for low-latency needs.

> IMO, this is not optimal.  I'd rather pass BIO_NORETRY to the first =
read, get
> the error back sooner and try the other disk sooner.  Only if I know =
that there
> are no other copies to try, then I would use the normal read with all =
the retrying.
>=20

I agree with Warner that what you are proposing is not correct.  It =
weakens the
contract between the disk layer and the upper layers, making it less =
clear who is
responsible for retries and less clear what =E2=80=9CEIO=E2=80=9D means. =
 That contract is already
weak due to poor design decisions in VFS-BIO and GEOM, and Warner and I
are working on a plan to fix that. =20

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39E8D9C4-6BF3-4844-85AD-3568A6D16E64>