From owner-freebsd-fs@freebsd.org Tue Dec 12 16:09:09 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4CBB9E9D62C; Tue, 12 Dec 2017 16:09:09 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A79C063913; Tue, 12 Dec 2017 16:09:06 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f45.google.com with SMTP id f13so23768288lff.12; Tue, 12 Dec 2017 08:09:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=rEjAXKiNhiLxxae9bIErnts23buG/BnfSx+DA8Wip2U=; b=FkBVPhDto8eHlfyGECibI4RgYCLpzZqWPXeGDiFxqzzW5XYnU058w6JvMuHpttEUJV GL0I8ZjEVtfr4gPNqKC0Ip0TnraQYWtccbHmIdF7L95quQt35zgUIozoBQHc7DpFkxAo T0Cjkzc4SzC2SQy7s+x3O+37AOVxTD2NnQwQxMod1cE9gy85AkN0dKzNwwZVdfU1Peip NqbzyOos2i0uB4XZBhufNaJDCxoRHUixqkwn9zUbyoxv5YGrYJ+zKv96vjgHOteLjx7t AZ0ueZ2Y+SuWlKwoWtt5JwrtasLFhu569seIfUUhBvv7suwJixo0x4+7rEs0DvKsr4sI uH1A== X-Gm-Message-State: AKGB3mLWBCAh6fJfJkstj/J9P5dNvwmmK+sI9bSBLdYA742sPRnAZM6M rvYFTa2JlcTX9uufTiGYfX/8SYX9 X-Google-Smtp-Source: ACJfBouOA4QjF577to/UTgbuAvsNVNaMONlIUeldMj8WleFv6as/M9oWVvTHFVdkTE6Je5zWXKCHVA== X-Received: by 10.46.42.130 with SMTP id q124mr2129583ljq.59.1513094937404; Tue, 12 Dec 2017 08:08:57 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id q11sm3349169lje.87.2017.12.12.08.08.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 08:08:55 -0800 (PST) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Poul-Henning Kamp , Scott Long Cc: FreeBSD FS , Warner Losh , freebsd-geom@FreeBSD.org References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <30379.1511609830@critter.freebsd.dk> From: Andriy Gapon Message-ID: <38078290-ce16-d3a6-2256-c9b7fec17e72@FreeBSD.org> Date: Tue, 12 Dec 2017 18:08:54 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <30379.1511609830@critter.freebsd.dk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 16:09:09 -0000 On 25/11/2017 13:37, Poul-Henning Kamp wrote: > The real fundamental deficiency is that we do not have a way to say "give up > if this bio cannot be completed in X time" which is what people actually want. Indeed. And I think that that was also what Warner tried to help me understand. That it is not about absolute retry count, but about a time budget for a request. > That is suprisingly hard to provide, there are far too many > corner-cases for me to enumerate them all, but let me just give one > example: This is true and this is a good example. I think that we might want to try first to handle simpler cases like deciding whether to retry a request if we get a transient error Dealing with a request that just doesn't come back is the much harder piece, of course. > Imagine you issue a deadlined write to a RAID5 thing. Thee component > writes happen smoothly, but the last two fail the deadline, with > no way to predict how long time it will take before they complete > or fail. > > * Does the bio write transaction fail ? > > * Does the bio write transaction time out ? > > * Do you attempt to complete the write to the RAID5 ? > > * Where do you store a copy of the data if you do ? > > * What happens next time a read happens on this bio's extent ? > > Then for an encore, imagine it was a read bio: Three DMAs go smoothly, > two are outstanding and you don't know if/when they will complete/fail. > > * If you fail or time out the bio, how do you "taint" the space > being read into until the two remaining DMAs are outstanding? > > * What if that space is mapped into userland ? > > * What if that space is being executed ? > > * What if one of the two outstanding DMAs later return garbage ? > > My conclusion back when I did GEOM, was that the only way to > do something like this sanely, is to have a special GEOM do it > for you, which always allocates a temp-space: > > allocate temp buffer > if (write) > copy write data to temp buffer > issue bio downwards on temp buffer > if timeout > park temp buffer until biodone > return(timeout) > if (read) > copy temp buffer to read space > return (ok/error) -- Andriy Gapon