From owner-freebsd-stable@freebsd.org Sat Jul 20 10:42:19 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id CA7CCBAF8B for ; Sat, 20 Jul 2019 10:42:19 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [IPv6:2a01:4f8:c2c:26d8::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E68F975E82; Sat, 20 Jul 2019 10:42:08 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [IPv6:2a03:3100:c:13:0:0:0:5]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id x6KAfmgK079952 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 20 Jul 2019 10:41:49 GMT (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: wollman@csail.mit.edu Received: from [10.58.0.4] ([10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id x6KAficP000740 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 20 Jul 2019 17:41:44 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: ZFS root mount regression To: Garrett Wollman , freebsd-stable@freebsd.org, Alexander Motin References: <23858.2573.932364.128957@khavrinen.csail.mit.edu> From: Eugene Grosbein Message-ID: <73cddcd9-97f0-e73f-da9d-2a454fd3ea1a@grosbein.net> Date: Sat, 20 Jul 2019 17:41:39 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <23858.2573.932364.128957@khavrinen.csail.mit.edu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00,LOCAL_FROM, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record * -0.0 SPF_PASS SPF: sender matches SPF record * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on hz.grosbein.net X-Rspamd-Queue-Id: E68F975E82 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=permerror (mx1.freebsd.org: domain of eugen@grosbein.net uses mechanism not recognized by this client) smtp.mailfrom=eugen@grosbein.net X-Spamd-Result: default: False [-1.87 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.996,0]; MX_INVALID(0.50)[greylisted]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[grosbein.net]; NEURAL_SPAM_SHORT(0.49)[0.488,0]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; R_SPF_PERMFAIL(0.00)[]; IP_SCORE(-0.76)[ipnet: 2a01:4f8::/29(-1.97), asn: 24940(-1.83), country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jul 2019 10:42:19 -0000 CC'ing Alexander Motin who comitted the change. 20.07.2019 1:21, Garrett Wollman wrote: > I recently upgraded several file servers from 11.2 to 11.3. All of > them boot from a ZFS pool called "tank" (the data is in a different > pool). In a couple of instances (which caused me to have to take a > late-evening 140-mile drive to the remote data center where they are > located), the servers crashed at the root mount phase. In one case, > it bailed out with error 5 (I believe that's [EIO]) to the usual > mountroot prompt. In the second case, the kernel panicked instead. > > The root cause (no pun intended) on both servers was a disk which was > supplied by the vendor with a label on it that claimed to be part of > the "tank" pool, and for some reason the 11.3 kernel was trying to > mount that (faulted) pool rather than the real one. The disks and > pool configuration were unchanged from 11.2 (and probably 11.1 as > well) so I am puzzled. > > Other than laboriously running "zpool labelclear -f /dev/somedisk" for > every piece of media that comes into my hands, is there anything else > I could have done to avoid this? Both 11.3-RELEASE announcement and Release Notes mention this: > The ZFS filesystem has been updated to implement parallel mounting. I strongly suggest reading Release documentation in case of troubles after upgrade, at least. Or better, read *before* updating. I guess this parallelism created some race for your case. Unfortunately, a way to fall back to sequential mounting seems undocumented. libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value. I'm not sure how you set it for mounting root, maybe it will use kenv, so try adding to /boot/loader.conf: ZFS_SERIAL_MOUNT=1 Alexander should have more knowledge on this. And of course, attaching unrelated device having label conflicting with root pool is asking for trouble. Re-label it ASAP.