From owner-freebsd-fs@FreeBSD.ORG  Sun Nov  3 16:57:10 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id E0D45DC7;
 Sun,  3 Nov 2013 16:57:10 +0000 (UTC)
 (envelope-from artemb@gmail.com)
Received: from mail-vc0-x22d.google.com (mail-vc0-x22d.google.com
 [IPv6:2607:f8b0:400c:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7C4E92756;
 Sun,  3 Nov 2013 16:57:10 +0000 (UTC)
Received: by mail-vc0-f173.google.com with SMTP id lh4so4138168vcb.32
 for <multiple recipients>; Sun, 03 Nov 2013 08:57:09 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=HLzOy22XTehLapNhcuUAjHZoKvV2y2VC3uJ4IoUf8NA=;
 b=RBzJ6zWpR7fU7ap4h3nWJ721m1mbxdTGQ3IRSQxJWh/2iwuOsRKOLVg7X0yGJFCdLQ
 EN5Y7WS9JGGNLCBlW5E424WxepgUOpx7360fFr/o0HJUVCecCZ5WFDkNVQiBCb0rCCet
 ncd4gzsmNvTfr+m28Etmorz5mHy++Hl6qPPthlhBGq6+uZpKvL3uIKKuQXlhpQqdAkEc
 8j8RmwQXqaGql+ky1Xj1elqqqeT4lc8tYd2pBry5XzfBdFCNnActXAmpnjgUouCQQJnb
 NWpUhdht4gJpSYhfdMVjOq5t4DkkeFJvwd1LN2Y6whpBH5fZ8F349mIBCEbrM2fkueOW
 wejQ==
MIME-Version: 1.0
X-Received: by 10.52.230.202 with SMTP id ta10mr25582vdc.41.1383497829542;
 Sun, 03 Nov 2013 08:57:09 -0800 (PST)
Sender: artemb@gmail.com
Received: by 10.221.9.2 with HTTP; Sun, 3 Nov 2013 08:57:09 -0800 (PST)
In-Reply-To: <5276030E.5040100@FreeBSD.org>
References: <CAFqOu6jfZc5bGF4n0tLa+Y7=UkqmbsK589o6G+UiP3OTdyLdTg__13033.8046853014$1383448959$gmane$org@mail.gmail.com>
 <5276030E.5040100@FreeBSD.org>
Date: Sun, 3 Nov 2013 08:57:09 -0800
X-Google-Sender-Auth: iq9FehKtcph_4Wxfmd_-UJ_AA2M
Message-ID: <CAFqOu6hASSZJ5F-a0Svrz_5+G0143o6FzUJi5B_RsY9PNJ2PDw@mail.gmail.com>
Subject: Re: Can't mount root from raidz2 after r255763 in stable/9
From: Artem Belevich <art@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "stable@freebsd.org" <stable@freebsd.org>, fs@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Nov 2013 16:57:11 -0000

TL;DR; version -- Solved.

The failure was caused by zombie ZFS volume labels from the previous
life of the disks in another pool. For some reason kernel picks labels
from the raw device first now and tries to boot from the pool that
does not exist. Nuking old labels with dd solved my booting issues.

On Sun, Nov 3, 2013 at 1:02 AM, Andriy Gapon <avg@freebsd.org> wrote:
> on 03/11/2013 05:22 Artem Belevich said the following:
>> Hi,
>>
>> I have a box with root mounted from 8-disk raidz2 ZFS volume.
>> After recent buildworld I've ran into an issue that kernel fails to
>> mount root with error 6.
>> r255763 on stable/9 is the first revision that fails to mount root on
>> mybox. Preceding r255749 boots fine.
>>
>> Commit r255763 (http://svnweb.freebsd.org/base?view=revision&revision=255763)
>> MFCs bunch of changes from 10 but I don't see anything that obviously
>> impacts ZFS.
>
> Indeed.
>
>> Attempting to boot with vfs.zfs.debug=1 shows that order in which geom
>> providers are probed by zfs has apparently changed. Kernels that boot,
>> show "guid match for provider /dev/gpt/<valid pool slice>" while
>> failing kernels show "guid match for provider /dev/daX" -- the raw
>> disks that are *not* the right geom provider for my pool slices. Beats
>> me why ZFS picks raw disks over GPT partitions it should have.
>
> Perhaps the kernel gpart code fails to recognize the partitions and thus ZFS
> can't see them?
>
>> Pool configuration:
>> #zpool status z0
>>   pool: z0
>>  state: ONLINE
>>   scan: scrub repaired 0 in 8h57m with 0 errors on Sat Oct 19 20:23:52 2013
>> config:
>>
>>         NAME                 STATE     READ WRITE CKSUM
>>         z0                   ONLINE       0     0     0
>>           raidz2-0           ONLINE       0     0     0
>>             gpt/da0p4-z0     ONLINE       0     0     0
>>             gpt/da1p4-z0     ONLINE       0     0     0
>>             gpt/da2p4-z0     ONLINE       0     0     0
>>             gpt/da3p4-z0     ONLINE       0     0     0
>>             gpt/da4p4-z0     ONLINE       0     0     0
>>             gpt/da5p4-z0     ONLINE       0     0     0
>>             gpt/da6p4-z0     ONLINE       0     0     0
>>             gpt/da7p4-z0     ONLINE       0     0     0
>>         logs
>>           mirror-1           ONLINE       0     0     0
>>             gpt/ssd-zil-z0   ONLINE       0     0     0
>>             gpt/ssd1-zil-z0  ONLINE       0     0     0
>>         cache
>>           gpt/ssd1-l2arc-z0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> Here are screen captures from a failed boot:
>> https://plus.google.com/photos/+ArtemBelevich/albums/5941857781891332785
>
> I don't have permission to view this album.

Argh. Copy-paste error. Try these :
https://plus.google.com/photos/101142993171487001774/albums/5941857781891332785?authkey=CPm-4YnarsXhKg
https://plus.google.com/photos/+ArtemBelevich/albums/5941857781891332785?authkey=CPm-4YnarsXhKg

>
>> And here's boot log from successful boot on the same system:
>> http://pastebin.com/XCwebsh7
>>
>> Removing ZIL and L2ARC makes no difference -- r255763 still fails to mount root.
>>
>> I'm thoroughly baffled. Is there's something wrong with the pool --
>> some junk metadata somewhere on the disk that now screws with the root
>> mounting? Changed order in geom provider enumeration? Something else?
>> Any suggestions on what I can do to debug this further?
>
> gpart.

Long version of the story: It was stale metadata after all.

'zdb -l /dev/daN' showed that one of the four pool labels was still
found on every drive in the pool.
Long ago the drives were temporarily used as raw drives in a ZFS pool
on a test box. Then I destroyed the pool, sliced them into partitions
with GPT and used one of partitions to build current pool. Apparently
not all old pool labels were overwritten by the new pool, but by
accident that went unnoticed until now because new pool was detected
first. Now detection order has changed (I'm still not sure how or why)
and that resurrected the old non-existing pool and caused boot
failures.

After finding location of volume labels on the disk and nuking them
with dd boot issues went away. The scary part was that the label was
*inside* the current pool slice so I had to corrupt current pool data.
I figured that considering that label is still alive, current pool
didn't write anything there yet and therefore it should be safe to
overwrite the label. I first did it on one drive only. In case I was
wrong, ZFS should have been able to rebuild the pool. Lucky for me no
vital data was hurt in the process and zfs scrub reported zero errors.
After nuking old labels on other drives, boot issues went away.

Even though my problem has been dealt with, I still wonder whether
pool detection should be more robust. I've been lucky that it was
kernel that changed pool detection and not the bootloader. It would've
made troubleshooting even more interesting.

Would it make sense to prefer partitions over whole drives?
Or, perhaps prefer pools with all the labels intact over devices that
only have small fraction of valid labels?

--Artem