Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jul 2019 17:41:59 -0400
From:      Alexander Motin <mav@FreeBSD.org>
To:        Eugene Grosbein <eugen@grosbein.net>, Garrett Wollman <wollman@csail.mit.edu>, freebsd-stable@freebsd.org
Subject:   Re: ZFS root mount regression
Message-ID:  <841d26dd-7433-2e6d-9011-76ed7ad3d5d2@FreeBSD.org>
In-Reply-To: <73cddcd9-97f0-e73f-da9d-2a454fd3ea1a@grosbein.net>
References:  <23858.2573.932364.128957@khavrinen.csail.mit.edu> <73cddcd9-97f0-e73f-da9d-2a454fd3ea1a@grosbein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

I am not sure how the original description leads to conclusion that
problem is related to parallel mounting.  From my point of view it
sounds like a problem that root pool mounting happens based on name, not
pool GUID that needs to be passed from the loader.  We have seen problem
like that ourselves too when boot pool names collide.  So I doubt it is
a new problem, just nobody got to fixing it yet.

On 20.07.2019 06:41, Eugene Grosbein wrote:
> CC'ing Alexander Motin who comitted the change.
> 
> 20.07.2019 1:21, Garrett Wollman wrote:
> 
>> I recently upgraded several file servers from 11.2 to 11.3.  All of
>> them boot from a ZFS pool called "tank" (the data is in a different
>> pool).  In a couple of instances (which caused me to have to take a
>> late-evening 140-mile drive to the remote data center where they are
>> located), the servers crashed at the root mount phase.  In one case,
>> it bailed out with error 5 (I believe that's [EIO]) to the usual
>> mountroot prompt.  In the second case, the kernel panicked instead.
>>
>> The root cause (no pun intended) on both servers was a disk which was
>> supplied by the vendor with a label on it that claimed to be part of
>> the "tank" pool, and for some reason the 11.3 kernel was trying to
>> mount that (faulted) pool rather than the real one.  The disks and
>> pool configuration were unchanged from 11.2 (and probably 11.1 as
>> well) so I am puzzled.
>>
>> Other than laboriously running "zpool labelclear -f /dev/somedisk" for
>> every piece of media that comes into my hands, is there anything else
>> I could have done to avoid this?
> 
> Both 11.3-RELEASE announcement and Release Notes mention this:
> 
>> The ZFS filesystem has been updated to implement parallel mounting.
> 
> I strongly suggest reading Release documentation in case of troubles
> after upgrade, at least. Or better, read *before* updating.
> 
> I guess this parallelism created some race for your case.
> 
> Unfortunately, a way to fall back to sequential mounting seems undocumented.
> libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value.
> I'm not sure how you set it for mounting root, maybe it will use kenv,
> so try adding to /boot/loader.conf:
> 
> ZFS_SERIAL_MOUNT=1
> 
> Alexander should have more knowledge on this.
> 
> And of course, attaching unrelated device having label conflicting
> with root pool is asking for trouble. Re-label it ASAP.
> 

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?841d26dd-7433-2e6d-9011-76ed7ad3d5d2>