Date: Fri, 24 Nov 2017 19:20:51 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: Warner Losh <imp@bsdimp.com> Cc: FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-geom@freebsd.org, Scott Long <scottl@samsco.org> Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom Message-ID: <f18e2760-85b9-2b5e-4269-edfe5468f9db@FreeBSD.org> In-Reply-To: <CANCZdfrBtYm_Jxcb6tXP%2BdtMq7dhRKmVOzvshG%2ByB%2B%2BARx1qOQ@mail.gmail.com> References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <CANCZdfoE5UWMC6v4bbov6zizvcEMCbrSdGeJ019axCUfS_T_6w@mail.gmail.com> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <CANCZdfrBtYm_Jxcb6tXP%2BdtMq7dhRKmVOzvshG%2ByB%2B%2BARx1qOQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 24/11/2017 18:33, Warner Losh wrote: > > > On Fri, Nov 24, 2017 at 6:34 AM, Andriy Gapon <avg@freebsd.org > <mailto:avg@freebsd.org>> wrote: > > On 24/11/2017 15:08, Warner Losh wrote: > > > > > > On Fri, Nov 24, 2017 at 3:30 AM, Andriy Gapon <avg@freebsd.org <mailto:avg@freebsd.org> > > <mailto:avg@freebsd.org <mailto:avg@freebsd.org>>> wrote: > > > > > > https://reviews.freebsd.org/D13224 > <https://reviews.freebsd.org/D13224> <https://reviews.freebsd.org/D13224 > <https://reviews.freebsd.org/D13224>> > > > > Anyone interested is welcome to join the review. > > > > > > I think it's a really bad idea. It introduces a 'one-size-fits-all' notion of > > QoS that seems misguided. It conflates a shorter timeout with don't retry. And > > why is retrying bad? It seems more a notion of 'fail fast' or so other concept. > > There's so many other ways you'd want to use it. And it uses the same return > > code (EIO) to mean something new. It's generally meant 'The lower layers have > > retried this, and it failed, do not submit it again as it will not succeed' with > > 'I gave it a half-assed attempt, and that failed, but resubmission might work'. > > This breaks a number of assumptions in the BUF/BIO layer as well as parts of CAM > > even more than they are broken now. > > > > So let's step back a bit: what problem is it trying to solve? > > A simple example. I have a mirror, I issue a read to one of its members. Let's > assume there is some trouble with that particular block on that particular disk. > The disk may spend a lot of time trying to read it and would still fail. With > the current defaults I would wait 5x that time to finally get the error back. > Then I go to another mirror member and get my data from there. > IMO, this is not optimal. I'd rather pass BIO_NORETRY to the first read, get > the error back sooner and try the other disk sooner. Only if I know that there > are no other copies to try, then I would use the normal read with all the > retrying. > > > It sounds like you are optimizing the wrong thing and taking an overly > simplistic view of quality of service. > First, failing blocks on a disk is fairly rare. Do you really want to optimize > for that case? If it can be done without any harm to the sunny day scenario, then why not? I think that 'robustness' is the word here, not 'optimization'. > Second, you're really saying 'If you can't read it fast, fail" since we only > control the software side of read retry. Am I? That's not what I wanted to say, really. I just wanted to say, if this I/O fails, don't retry it, leave it to me. This is very simple, simplistic as you say, but I like simple. > There's new op codes being proposed > that say 'read or fail within Xms' which is really what you want: if it's taking > too long on disk A you want to move to disk B. The notion here was we'd return > EAGAIN (or some other error) if it failed after Xms, and maybe do some emulation > in software for drives that don't support this. You'd tweak this number to > control performance. You're likely to get a much bigger performance win all the > time by scheduling I/O to drives that have the best recent latency. ZFS already does some latency based decisions. The things that you describe are very interesting, but they are for the future. > Third, do you have numbers that show this is actually a win? I do not have any numbers right now. What kind of numbers would you like? What kind of scenarios? > This is a terrible > thing from an architectural view. You have said this several times, but unfortunately you haven't explained it yet. > Absent numbers that show it's a big win, I'm > very hesitant to say OK. > > Forth, there's a large number of places in the stack today that need to > communicate their I/O is more urgent, and we don't have any good way to > communicate even that simple concept down the stack. That's unfortunately, but my proposal has quite little to do with I/O scheduling, priorities, etc. > Finally, the only places that ZFS uses the TRYHARDER flag are for things like > the super block if I'm reading the code right. It doesn't do it for normal I/O. Right. But for normal I/O there is ZIO_FLAG_IO_RETRY which is honored in the same way as ZIO_FLAG_TRYHARD. > There's no code to cope with what would happen if all the copies of a block > couldn't be read with the NORETRY flag. One of them might contain the data. ZFS is not that fragile :) see ZIO_FLAG_IO_RETRY above. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f18e2760-85b9-2b5e-4269-edfe5468f9db>