From owner-freebsd-geom@freebsd.org Sun Dec 10 12:16:07 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6AB70E8ACB9 for ; Sun, 10 Dec 2017 12:16:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 57CDE6EAC7 for ; Sun, 10 Dec 2017 12:16:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id vBACG7Iq084417 for ; Sun, 10 Dec 2017 12:16:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-geom@FreeBSD.org Subject: [Bug 200214] Asks for passphrase twice when booting into encrypted ZFS single disk install Date: Sun, 10 Dec 2017 12:16:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mark.zsombok@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-geom@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Dec 2017 12:16:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D200214 MZsombok changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mark.zsombok@gmail.com --- Comment #2 from MZsombok --- Greetings, I had this issue quite a few times, but only decided to look further into it now. To me, the cause appears to be - and this is just an assumption - how GELI handles the keyboard layout during boot. (after a guided installation. I faintly remember not having this issue after a completely manual installati= on) I tested with hungarian layout (qwertz). 1. - boot into install medium 2. - select hungarian 102 key layout 3. - start installation with "guided ZFS" -> choose to encrypt disk 4. - provide passphrase that contains a "y". 5. - complete installation 6. - reboot 7. - bootloader asks for passphrase -> enter passphrase assuming hungarian 102key layout is used. 8. - passphrase is accepted and boot process starts. 9. - halfway in, GELI requests the passphrase again, but this time the keyb= oard layout is not the selected hungarian 102key, but apperas to be US-english. = "y" and "z" are switched and need to be taken into account while entering the passphrase again. 10. - boot process completes, system is ready for action. On a second installation where the used keys of in the passphrase are laid = out the same way in both us-english and hu-102. - Install successful, - Bootloader requests passphrase -> enter passphrase - Passphrase is not asked for a second time during boot and bootup complete= s. I hope this helps. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-geom@freebsd.org Tue Dec 12 16:09:09 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4CBB9E9D62C; Tue, 12 Dec 2017 16:09:09 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A79C063913; Tue, 12 Dec 2017 16:09:06 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f45.google.com with SMTP id f13so23768288lff.12; Tue, 12 Dec 2017 08:09:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=rEjAXKiNhiLxxae9bIErnts23buG/BnfSx+DA8Wip2U=; b=FkBVPhDto8eHlfyGECibI4RgYCLpzZqWPXeGDiFxqzzW5XYnU058w6JvMuHpttEUJV GL0I8ZjEVtfr4gPNqKC0Ip0TnraQYWtccbHmIdF7L95quQt35zgUIozoBQHc7DpFkxAo T0Cjkzc4SzC2SQy7s+x3O+37AOVxTD2NnQwQxMod1cE9gy85AkN0dKzNwwZVdfU1Peip NqbzyOos2i0uB4XZBhufNaJDCxoRHUixqkwn9zUbyoxv5YGrYJ+zKv96vjgHOteLjx7t AZ0ueZ2Y+SuWlKwoWtt5JwrtasLFhu569seIfUUhBvv7suwJixo0x4+7rEs0DvKsr4sI uH1A== X-Gm-Message-State: AKGB3mLWBCAh6fJfJkstj/J9P5dNvwmmK+sI9bSBLdYA742sPRnAZM6M rvYFTa2JlcTX9uufTiGYfX/8SYX9 X-Google-Smtp-Source: ACJfBouOA4QjF577to/UTgbuAvsNVNaMONlIUeldMj8WleFv6as/M9oWVvTHFVdkTE6Je5zWXKCHVA== X-Received: by 10.46.42.130 with SMTP id q124mr2129583ljq.59.1513094937404; Tue, 12 Dec 2017 08:08:57 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id q11sm3349169lje.87.2017.12.12.08.08.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 08:08:55 -0800 (PST) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Poul-Henning Kamp , Scott Long Cc: FreeBSD FS , Warner Losh , freebsd-geom@FreeBSD.org References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <30379.1511609830@critter.freebsd.dk> From: Andriy Gapon Message-ID: <38078290-ce16-d3a6-2256-c9b7fec17e72@FreeBSD.org> Date: Tue, 12 Dec 2017 18:08:54 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <30379.1511609830@critter.freebsd.dk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 16:09:09 -0000 On 25/11/2017 13:37, Poul-Henning Kamp wrote: > The real fundamental deficiency is that we do not have a way to say "give up > if this bio cannot be completed in X time" which is what people actually want. Indeed. And I think that that was also what Warner tried to help me understand. That it is not about absolute retry count, but about a time budget for a request. > That is suprisingly hard to provide, there are far too many > corner-cases for me to enumerate them all, but let me just give one > example: This is true and this is a good example. I think that we might want to try first to handle simpler cases like deciding whether to retry a request if we get a transient error Dealing with a request that just doesn't come back is the much harder piece, of course. > Imagine you issue a deadlined write to a RAID5 thing. Thee component > writes happen smoothly, but the last two fail the deadline, with > no way to predict how long time it will take before they complete > or fail. > > * Does the bio write transaction fail ? > > * Does the bio write transaction time out ? > > * Do you attempt to complete the write to the RAID5 ? > > * Where do you store a copy of the data if you do ? > > * What happens next time a read happens on this bio's extent ? > > Then for an encore, imagine it was a read bio: Three DMAs go smoothly, > two are outstanding and you don't know if/when they will complete/fail. > > * If you fail or time out the bio, how do you "taint" the space > being read into until the two remaining DMAs are outstanding? > > * What if that space is mapped into userland ? > > * What if that space is being executed ? > > * What if one of the two outstanding DMAs later return garbage ? > > My conclusion back when I did GEOM, was that the only way to > do something like this sanely, is to have a special GEOM do it > for you, which always allocates a temp-space: > > allocate temp buffer > if (write) > copy write data to temp buffer > issue bio downwards on temp buffer > if timeout > park temp buffer until biodone > return(timeout) > if (read) > copy temp buffer to read space > return (ok/error) -- Andriy Gapon From owner-freebsd-geom@freebsd.org Tue Dec 12 16:20:13 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36C5AE9DBA9; Tue, 12 Dec 2017 16:20:13 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D08A6640B0; Tue, 12 Dec 2017 16:20:12 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f48.google.com with SMTP id x20so23869243lff.1; Tue, 12 Dec 2017 08:20:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=pvyxcyeBsAeXccLTKSTHyn8rJf8X375ECowLUb6YoL8=; b=cJiZLKP6eZXYIHwjDf2xbjeCoEiYE40Rvath+CbmVoFnMdL49zL0IGcNab7lJgkgoC wj26joQFYJLlBelrs9dJg2MBB+CwI3GwDeinkl4ZmLfnhizqjzO1IEi4c0X4fEC8aLf3 KH8+b4Kfl82OLBpKTKlk9LH71hLdHHYlGyZnGfvrIHNScIsTapVqJjrKDnqKDsc5B87X BsuZllnGavfy2hHIjvLwkMlM1w72VxpO50M57n9eQ1I2sAkI544X4e6FeZj7VhMgGox0 8e+7P8d+pLqKRXYB0bxpeUWFe5iL4SgJNY3LzyF9NaGmBYliQOiylOPkPav2JI5D/HUO WgRQ== X-Gm-Message-State: AKGB3mK1EhkqIwke6Vxns3eZ27dKPTgizP+wevdnotiUHG0bpa3esAV+ aZdsoYFuFGrSgzaz1oTdmQFlo9mG X-Google-Smtp-Source: ACJfBosN7UGwbsGu3w0DX4edklyH7oX2rwxJhKs8v1aXezug/WriC8eki5Asqs1gV8j48iW0+3EhOA== X-Received: by 10.25.242.65 with SMTP id d1mr1968326lfk.18.1513095149607; Tue, 12 Dec 2017 08:12:29 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id r90sm3352936lje.80.2017.12.12.08.12.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 08:12:29 -0800 (PST) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Scott Long Cc: Warner Losh , FreeBSD FS , freebsd-geom@freebsd.org References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> From: Andriy Gapon Message-ID: Date: Tue, 12 Dec 2017 18:12:28 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 16:20:13 -0000 On 25/11/2017 12:54, Scott Long wrote: > Why is overloading EIO so bad? brelse() will call bdirty() when a BIO_WRITE > command has failed with EIO. Calling bdirty() has the effect of retrying the I/O. > This disregards the fact that disk drivers only return EIO when they’ve decided > that the I/O cannot be retried. It has no termination condition for the retries, and > will endlessly retry I/O in vain; I’ve seen this quite frequently. It also disregards > the fact that I/O marked as B_PAGING can’t be retried in this fashion, and will > trigger a panic. Because we pretend that EIO can be retried, we are left with > a system that is very fragile when I/O actually does fail. Instead of adding > more special cases and blurred lines, I want to go back to enforcing strict > contracts between the layers and force the core parts of the system to respect > those contracts and handle errors properly, instead of just retrying and > hoping for the best. I agree with your intention. But let's not project what I consider to be a bug in the buffer cache code on all consumers of bio / geom interface. In fact, I am much surprised that there is any code that treats EIO as retriable. I don't know of any other such code except for specialized disk recovery tools. -- Andriy Gapon From owner-freebsd-geom@freebsd.org Tue Dec 12 16:26:41 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 116A7E9E078; Tue, 12 Dec 2017 16:26:41 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f53.google.com (mail-lf0-f53.google.com [209.85.215.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AAE21645D9; Tue, 12 Dec 2017 16:26:40 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f53.google.com with SMTP id 74so23874302lfs.0; Tue, 12 Dec 2017 08:26:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=D+fvP8SIW1g26aBGiBUCVq4Dn+6VuY2SQaiBWoALppU=; b=DuQrKGzFwTDtSCjfyUUTn8BHN2wTCJO9y2Z2j3MSoaRLQoCzXEjaAiz7AcZ884qXMi ut2EGqhjKFVtyOY4jFgZ8iCAKrgTjitkDxr9M9TFDiCrXqerj+iAqtlPHdVI5td8FzJe dapj0JMz852e76qq/MXJv6dULTh7Gs8yCgXHexK42mn/CXU6fO1BG2EVlb4YEMx2zbMv 3zJOnkppDiN2Q8HB3XvZ1X3zJMFHAqAI7+2bDEVxWX3mDrDiuY1m/zr+wJWpIzAM85Ce aF98bQYDE3naLL5oPWLbF5HPv9b3PPlDt63u0VOIQ6Kxoo7OJmVMlZkbJ00LsnCOwsTJ qljg== X-Gm-Message-State: AKGB3mIsd9NvZvJXFUX/3E2gzokdV6cBe3/9z/TVndXcapfj8SMI6KCI xnyaKZo+GrudeWqTUd6q3PAk4EmW X-Google-Smtp-Source: ACJfBouT0adJBAlON6E0hEBJ4IdIfRDYPAxZcYbXVvbgFowQSatuBtfr8MIGowbYlMXbfHjE2i69mA== X-Received: by 10.46.27.24 with SMTP id b24mr2313996ljb.54.1513095992314; Tue, 12 Dec 2017 08:26:32 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id q81sm3252603lfb.3.2017.12.12.08.26.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 08:26:31 -0800 (PST) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Warner Losh Cc: Scott Long , FreeBSD FS , freebsd-geom@freebsd.org References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <33101e6c-0c74-34b7-ee92-f9c4a11685d5@FreeBSD.org> From: Andriy Gapon Message-ID: <8fde1a9e-ea32-af9e-de7f-30e7fe1738cd@FreeBSD.org> Date: Tue, 12 Dec 2017 18:26:30 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 16:26:41 -0000 On 25/11/2017 19:57, Warner Losh wrote: > Let's walk through this. You see that it takes a long time to fail an I/O. > Perfectly reasonable observation. There's two reasons for this. One is that the > disks take a while to make an attempt to get the data. The second is that the > system has a global policy that's biased towards 'recover the data' over 'fail > fast'. These can be fixed by reducing the timeouts, or lowing the read-retry > count for a given drive or globally as a policy decision made by the system > administrator. > > It may be perfectly reasonable to ask the lower layers to 'fail fast' and have > either a hard or a soft deadline on the I/O for a subset of I/O. A hard deadline > would return ETIMEDOUT or something when it's passed and cancel the I/O. This > gives better determinism in the system, but some systems can't cancel just 1 I/O > (like SATA drives), so we have to flush the whole queue. If we get a lot of > these, performance suffers. However, for some class of drives, you know that if > it doesn't succeed in 1s after you submit it to the drive, it's unlikely to > complete successfully and it's worth the performance hit on a drive that's > already acting up. > > You could have a soft timeout, which says 'don't do any additional action after > X time has elapsed and you get word about this I/O. This is similar to the hard > timeout, but just stops retrying after the deadline has passed. This scenario is > better on the other users of the drive, assuming that the read-recovery > operations aren't starving them. It's also easier to implement, but has worse > worst case performance characteristics. > > You aren't asking to limit retries. You're really asking to the I/O subsystem to > limit, where it can, the amount of time on an I/O so you can try another one. > You're means to doing this is to tell it not to retry. That's the wrong means. > It shouldn't be listed in the API that it's a 'NO RETRY' request. It should be a > QoS request flag: fail fast. I completely agree. 'NO RETRY' was a bad name and now I see it with painful clarity. Just to clarify, I agree not only on the name, but also on everything else you said above. > Part of why I'm being so difficult is that you don't understand this and are > proposing a horrible API. It should have a different name. I completely agree. > The other reason is > that I  absolutely do not want to overload EIO. You must return a different > error back up the stack. You've show no interest in this past, which is also a > needless argument. We've given good reasons, and you've poopooed them with bad > arguments. I still honestly don't understand this. I think that bio_error and bio_flags are sufficient to properly interpret the "fail-fast EIO". And I never intended for that error to be ever propagated by any means other than in bio_error. > Also, this isn't the data I asked for. I know things can fail slowly. I was > asking for how it would improve systems running like this. As in "I implemented > it, and was able to fail over to this other drive faster" or something like > that. Actual drive failure scenarios vary widely, and optimizing for this one > failure is unwise. It may be the right optimization, but it may not. There's > lots of tricky edges in this space. Well, I implemented my quick hack (as you absolutely correctly characterized it) in response to something that I observed happening in the past and that hasn't happen to me since then. But, realistically, I do not expect myself to be able to reproduce and test every tricky failure scenario. -- Andriy Gapon From owner-freebsd-geom@freebsd.org Tue Dec 12 16:36:59 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2DD17E9E63F; Tue, 12 Dec 2017 16:36:59 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f43.google.com (mail-lf0-f43.google.com [209.85.215.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D333C64E29; Tue, 12 Dec 2017 16:36:58 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f43.google.com with SMTP id x204so23863680lfa.11; Tue, 12 Dec 2017 08:36:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=YRt6rkLRn6TLNgYxe3Sd2HrmEMbOa3nQmThHZmE1NhA=; b=JGP4BBO1/SthiEYwMJNjP/hyOpkGrshgsrW4YMWTVNpQoxAZ6jKsOngzcMsijuElbf YJRrU8IVU/ogC6c80oeWUt38EURtXEVUy5n2dfZQH5Osvcw4IyAu9KaBbBAv1zCw6YaT khUE0KG1EZKH/Vo4x3teR4IBYEmqAWkPHJeNqebo9M5s5xRPXwUYgbxYApszo62Ut9WB wIXQ5DB+UkevMRNGldXS/tlIOQG5ZqMxLvTvk3Em+w3xYexAgyHB+yLRdQi6yAT7doYV cuWmNMwtvF09KVjLfQ9mD/az3qwu3vh6lIrIkIYv048qwnIUTr8hhwxncke2mkmmXXZx jOEA== X-Gm-Message-State: AKGB3mIKfHuYt44OGt8vM0x0rn+4XAL8q44tSaj0QyLqAYv/CeItlLpp wWgsVTkepLyOSfldHiRH2Dc= X-Google-Smtp-Source: ACJfBotCEvH1N0NRbhohYdAxW4Kz1Jn40hOAP1fDk/naRm/qt9d5CTPR+UaCS6nsTStO/pVtcIaSVg== X-Received: by 10.46.83.9 with SMTP id h9mr2445871ljb.68.1513096611383; Tue, 12 Dec 2017 08:36:51 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id h11sm3226856lfd.54.2017.12.12.08.36.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 08:36:50 -0800 (PST) Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Warner Losh Cc: FreeBSD FS , freebsd-geom@freebsd.org, Scott Long References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <9f23f97d-3614-e4d2-62fe-99723c5e3879@FreeBSD.org> From: Andriy Gapon Message-ID: <38122ea9-ab20-f18a-90a2-04d2681e2ac9@FreeBSD.org> Date: Tue, 12 Dec 2017 18:36:49 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 16:36:59 -0000 On 26/11/2017 00:17, Warner Losh wrote: > > > On Sat, Nov 25, 2017 at 10:40 AM, Andriy Gapon > wrote: > > > Before anything else, I would like to say that I got an impression that we speak > from so different angles that we either don't understand each other's words or, > even worse, misinterpret them. > > > I understand what you are suggesting. Don't take my disagreement with your > proposal as willful misinterpretation. You are proposing something that's a > quick hack. Very true. > Maybe a useful one, but it's still problematical because it has the > upper layers telling the lower layers what to do (don't do your retry), rather > than what service to provide (I prefer a fast error exit to over every effort to > recover the data). Also true. > And it also does it by overloading the meaning of EIO, which > has real problems which you've not been open to listening, I assume due to your > narrow use case apparently blinding you to the bigger picture issues with that > route. Quite likely. > However, there's a way forward which I think that will solve these objections. > First, designate that I/O that fails due to short-circuiting the normal recovery > process, return ETIMEDOUT. The I/O stack currently doesn't use this at all (it > was introduced for the network side of things). This is a general catch-all for > an I/O that we complete before the lower layers have given it the maximum amount > of effort to recover the data, at the user request. Next, don't use a flag. > Instead add a 32-bit field that is call bio_qos for quality of service hints and > another 32-bit field for bio_qos_param. This allows us to pass down specific > quality of service desires from the filesystem to the lower layers. The > parameter will be unused in your proposal. BIO_QOS_FAIL_EARLY may be a good name > for a value to set it to (at the moment, just use 1). We'll assign the other QOS > values later for other things. It would allow us to implement the other sorts of > QoS things I talked about as well. That's a very interesting and workable suggestion. I will try to work on it. > As for B_FAILFAST, it's quite unlike what you're proposing, except in one > incidental detail. It's a complicated state machine that the sd driver in > solaris implemented. It's an entire protocol. When the device gets errors, it > goes into this failfast state machine. The state machine makes a determination > that the errors are indicators the device is GONE, at least for the moment, and > it will fail I/Os in various ways from there. Any new I/Os that are submitted > will be failed (there's conditional behavior here: depending on a global setting > it's either all I/O or just B_FAILFAST I/O). Yeah, I realized that B_FAILFAST was quite different from the first impression that I got from its name. Thank you for doing and sharing your analysis of how it actually works. > ZFS appears to set this bit for its > discovery code only, when a device not being there would significantly delay > things. I think that ZFS sets the bit for all 'first-attempt' I/O. It's the various retries / recovery where this bit is not set. > Anyway, when the device returns (basically an I/O gets through or maybe > some other event happens), the driver exists this mode and returns to normal > operation. It appears to be designed not for the use case that you described, > but rather for a drive that's failing all over the place so that any pending > I/Os get out of the way quickly. Your use case is only superficially similar to > that use case, so the Solaris / Illumos experiences are mildly interesting, but > due to the differences not a strong argument for doing this. This facility in > Illumos is interesting, but would require significantly more retooling of the > lower I/O layers in FreeBSD to implement fully. Plus Illumos (or maybe just > Solaris) a daemon that looks at failures to manage them at a higher level, which > might make for a better user experience for FreeBSD, so that's something that > needs to be weighed as well. Okay. > We've known for some time that HDD retry algorithms take a long time. Same is > true of some SSD or NVMe algorithms, but not all. The other objection I have to > 'noretry' naming  is that it bakes the current observed HDD behavior and > recovery into the API. This is undesirable as other storage technologies have > retry mechanisms that happen quite quickly (and sometimes in the drive itself). > The cutoff between fast and slow recovery is device specific, as are the methods > used. For example, there's new proposals out in NVMe (and maybe T10/T13 land) to > have new types of READ commands that specify the quality of service you expect, > including providing some sort of deadline hint to clip how much effort is > expended in trying to recover the data. It would be nice to design a mechanism > that allows us to start using these commands when drives are available with > them, and possibly using timeouts to allow for a faster abort. Most of your HDD > I/O will complete within maybe ~150ms, with a long tail out to maybe as long as > ~400ms. It might be desirable to set a policy that says 'don't let any I/Os > remain in the device longer than a second' and use this mechanism to enforce > that. Or don't let any I/Os last more than 20x the most recent median I/O time. > A single bit is insufficiently expressive to allow these sorts of things, which > is another reason for my objection to your proposal. With the QOS fields being > independent, the clone routines just copies them and makes no judgement value on > them. I now agree with this. Thank you for the detailed explanation. > So, those are my problems with your proposal, and also some hopefully useful > ways to move forward. I've chatted with others for years about introducing QoS > things into the I/O stack, so I know most of the above won't be too contentious > (though ETIMEDOUT I haven't socialized, so that may be an area of concern for > people). Thank you! -- Andriy Gapon From owner-freebsd-geom@freebsd.org Thu Dec 14 09:41:01 2017 Return-Path: Delivered-To: freebsd-geom@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4BE2CE9E30F; Thu, 14 Dec 2017 09:41:01 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f51.google.com (mail-lf0-f51.google.com [209.85.215.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C9D937CC46; Thu, 14 Dec 2017 09:41:00 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f51.google.com with SMTP id f20so5882672lfe.3; Thu, 14 Dec 2017 01:41:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=nNTnuQswS1AYZc2U/wyOpdL6RvGXoegKvnU8CBYS+MI=; b=Hgjxq/feua8HLMHSadq6aN9sHLgNhaCUtBYGZIYsqk2xrTCbVBGB5ngR2kfdFowKMo SSshsj3Kab2nNU4j/ogv1dLNpFeYdPdMX/KW1tQOXFRLtoWNnRagJMjyhx7KSaif959c rPhV7pQ/dL+Qd/l9KasWZeMnttJUZVD26KXzFhoM4KkO7gQdeEUKZ7i60cSBsK4V49VV QzYPxsFXURjqsyi8jwQJlptpQ+1GLLCrJbCJfOOo/Mj7HqaNSONha6PPFmnA5FfpYrbO hqRiyzewEEgTMRsYVIgZuzqgzvFC5mLNRnwHI8JNEYhG1xHqCBxAW5+hxW06NtaEaVgc ZlsA== X-Gm-Message-State: AKGB3mLSOrrBMkT0zcQMtHGn7xEhupWTDzFd3derrgmb9L8ryhJZGcmq a/P8P8gU0wWij1kQwYw+6N4UOUeM X-Google-Smtp-Source: ACJfBosUjL00d1OG34TV0tBHXHmAxQMGU7GzUz2h1q7g0H3NhwA2SOI0hvH2bzAYqRNQgtiSaUIJvQ== X-Received: by 10.46.55.20 with SMTP id e20mr3482902lja.118.1513244452499; Thu, 14 Dec 2017 01:40:52 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id h142sm711545lfh.37.2017.12.14.01.40.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Dec 2017 01:40:51 -0800 (PST) From: Andriy Gapon Subject: Re: add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom To: Scott Long Cc: Warner Losh , FreeBSD FS , freebsd-geom@freebsd.org References: <391f2cc7-0036-06ec-b6c9-e56681114eeb@FreeBSD.org> <64f37301-a3d8-5ac4-a25f-4f6e4254ffe9@FreeBSD.org> <39E8D9C4-6BF3-4844-85AD-3568A6D16E64@samsco.org> <33101e6c-0c74-34b7-ee92-f9c4a11685d5@FreeBSD.org> <783CA790-C0D3-442D-A5A2-4CB424FCBD62@samsco.org> Message-ID: <59654c50-b24e-ac87-2154-681cc57627de@FreeBSD.org> Date: Thu, 14 Dec 2017 11:40:49 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <783CA790-C0D3-442D-A5A2-4CB424FCBD62@samsco.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Dec 2017 09:41:01 -0000 This email is rather large, with a lot of contexts. Replying just to a single piece here. On 27/11/2017 17:29, Scott Long wrote: > > >> On Nov 25, 2017, at 10:36 AM, Andriy Gapon wrote: >> Let's assume that I am talking about the case of not being able to read an HDD >> sector that is gone bad. >> Here is a real world example: >> >> Jun 16 10:40:18 trant kernel: ahcich0: NCQ error, slot = 20, port = -1 >> Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 >> 00 00 58 62 40 2c 00 00 08 00 00 >> Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error >> Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), >> error: 40 (UNC ) >> Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): RES: 41 40 68 58 62 40 2c 00 >> 00 00 00 >> Jun 16 10:40:18 trant kernel: (ada0:ahcich0:0:0:0): Retrying command >> Jun 16 10:40:20 trant kernel: ahcich0: NCQ error, slot = 22, port = -1 > ... >> I do not see anything wrong in what CAM / ahci /ata_da did here. >> They did what I would expect them to do. They tried very hard to get data that >> I told them I need. > > Two things I see here. The first is that the drive is trying for 2 seconds to get good > data off of the media. The second is that it’s failing and reporting the error as > uncorrectable. I think that retries at the OS/driver > layer are therefore useless; the drive is already doing a bunch of its own retries and > is failing, and is telling you that it’s failing. In the past, the conventional wisdom has > been to do retries, because 30 years ago drives had minimal firmware and didn’t do > a good job at managing data integrity. Now they do an extensive amount of > analysis-driven error correction and retries, so I think it’s time to change the > conventional wisdom. I’d propose that for direct-attach SSDs and HDDs we treat this > error as non-retriable. Normally this would be a one-line change, but I think it needs > to be nuanced to distinguish between optical drives, simple flash media drives, and > regular HDDs and SSDs. > > An interim solution would be to just set the kern.cam.ada.retry_count to 0. I went through some ada errors that I have in my logs and I think that there can be a difference between HDDs and SSDs too. I thought that the HDD internal retry mechanism would be thorough enough, but, to my surprise, I see that majority of read failures are recovered by the first retry. Sometimes it's the second retry that's successful, in all other cases the standard four retries do not help. So, it may be too early to set ada.retry_count to 0 for all types of supported disks. -- Andriy Gapon