Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2017 18:08:54 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>, Scott Long <scottl@samsco.org>
Cc:        FreeBSD FS <freebsd-fs@FreeBSD.org>, Warner Losh <imp@bsdimp.com>, freebsd-geom@FreeBSD.org
Subject:   Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom
Message-ID:  <38078290-ce16-d3a6-2256-c9b7fec17e72@FreeBSD.org>
In-Reply-To: <30379.1511609830@critter.freebsd.dk>
References:  <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <CANCZdfoE5UWMC6v4bbov6zizvcEMCbrSdGeJ019axCUfS_T_6w@mail.gmail.com> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <c9a96004-9998-c96d-efd7-d7e510c3c460@FreeBSD.org> <DC23D104-F5F3-4844-8638-4644DC9DD411@samsco.org> <30379.1511609830@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
On 25/11/2017 13:37, Poul-Henning Kamp wrote:
> The real fundamental deficiency is that we do not have a way to say "give up
> if this bio cannot be completed in X time" which is what people actually want.

Indeed.
And I think that that was also what Warner tried to help me understand.
That it is not about absolute retry count, but about a time budget for a request.

> That is suprisingly hard to provide, there are far too many
> corner-cases for me to enumerate them all, but let me just give one
> example:

This is true and this is a good example.
I think that we might want to try first to handle simpler cases like deciding
whether to retry a request if we get a transient error
Dealing with a request that just doesn't come back is the much harder piece, of
course.

> Imagine you issue a deadlined write to a RAID5 thing.  Thee component
> writes happen smoothly, but the last two fail the deadline, with
> no way to predict how long time it will take before they complete
> or fail.
> 
> * Does the bio write transaction fail ?
> 
> * Does the bio write transaction time out ?
> 
> * Do you attempt to complete the write to the RAID5 ?
> 
> * Where do you store a copy of the data if you do ?
> 
> * What happens next time a read happens on this bio's extent ?
> 
> Then for an encore, imagine it was a read bio: Three DMAs go smoothly,
> two are outstanding and you don't know if/when they will complete/fail.
> 
> * If you fail or time out the bio, how do you "taint" the space
>   being read into until the two remaining DMAs are outstanding?
> 
> * What if that space is mapped into userland ?
> 
> * What if that space is being executed ?
> 
> * What if one of the two outstanding DMAs later return garbage ?
> 
> My conclusion back when I did GEOM, was that the only way to
> do something like this sanely, is to have a special GEOM do it
> for you, which always allocates a temp-space:
> 
> 	allocate temp buffer
> 	if (write)
> 		copy write data to temp buffer
> 	issue bio downwards on temp buffer
> 	if timeout
> 		park temp buffer until biodone
> 		return(timeout)
> 	if (read)
> 		copy temp buffer to read space
> 	return (ok/error)


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38078290-ce16-d3a6-2256-c9b7fec17e72>