Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2018 08:27:56 -0400
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Allan Jude <allanjude@freebsd.org>, freebsd-current@freebsd.org
Subject:   Re: Re: Odd ZFS boot module issue on r332158
Message-ID:  <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu>
In-Reply-To: <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org>
References:  <b772ee51-b27c-6591-c925-b4abd19678e8@cs.duke.edu> <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 04/09/18 23:33, Allan Jude wrote:
> On 2018-04-09 19:11, Andrew Gallatin wrote:
>> I updated my main amd64 workstation to r332158 from something much
>> earlier (mid Jan).
>>
>> Upon reboot, all seemed well.  However, I later realized that the vmm.ko
>> module was not loaded at boot, because bhyve PCI passthru did not
>> work.  My loader.conf looks like (I'm passing a USB interface through):
>>
>> #######
>> vmm_load="YES"
>> opensolaris_load="YES"
>> zfs_load="YES"
>> nvidia_load="YES"
>> nvidia-modeset_load="YES"
>>
>> # Tune ZFS Arc Size - Change to adjust memory used for disk cache
>> vfs.zfs.arc_max="4096M"
>> hint.xhci.2.disabled="1"
>> pptdevs="8/0/0"
>> hw.dmar.enable="0"
>> cuse_load="YES"
>> #######
>>
>> The problem seems "random".  I rebooted into single-user to
>> see if somehow, vmm.ko was loaded at boot and something
>> was unloading vmm.ko.  However, on this boot it was loaded.  I then
>> ^D'ed and continued to multi-user, where X failed to start because
>> this time, the nvidia modules were not loaded.  (but nvidia had
>> been loaded on the 1st boot).
>>
>> So it *seems* like different modules are randomly not loaded by the
>> loader, at boot.   The ZFS config is:
>>
>> config:
>>
>>          NAME        STATE     READ WRITE CKSUM
>>          tank        ONLINE       0     0     0
>>            mirror-0  ONLINE       0     0     0
>>              ada0p2  ONLINE       0     0     0
>>              da3p2   ONLINE       0     0     0
>>            mirror-1  ONLINE       0     0     0
>>              ada1p2  ONLINE       0     0     0
>>              da0p2   ONLINE       0     0     0
>>          cache
>>            da2s1d    ONLINE       0     0     0
>>
>> The data drives in the pool are all exactly like this:
>>
>> =>        34  9767541101  ada0  GPT  (4.5T)
>>            34           6        - free -  (3.0K)
>>            40      204800     1  efi  (100M)
>>        204840  9763209216     2  freebsd-zfs  (4.5T)
>>    9763414056     4096000     3  freebsd-swap  (2.0G)
>>    9767510056       31079        - free -  (15M)
>>
>>
>> There is about 1.44T used in the pool.  I have no idea
>> how ZFS mirrors work, but I'm wondering if somehow this
>> is a 2T problem, and there are issues with blocks on
>> difference sides of the mirror being across the 2T boundary.
>>
>> Sorry to be so vague.. but this is the one machine I *don't* have
>> a serial console on, so I don't have good logs.
>>
>> Drew
>>
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 
> What makes you think it is related to ZFS?
> 
> Are there any error messages when the nvidia module did not load?
> 

I think it is related to ZFS simply because I'm booting from ZFS and
it is not working reliably.  Our systems at work, booting from UFS on
roughly the same svn rev seem to still load modules reliably from
the loader.  I know there has been a lot of work on the loader
recently, and in a UEFE + UFS context, I've seen it fail to boot
the right partition, etc.  However, I've never seen it fail to load
just some modules.  The one difference between what I run at home
and what we run at work is ZFS vs UFS.

Given that it is a glass console, I have no confidence in my ability
to log error messages.   However, I could have sworn that I saw
something like "io error" when it failed to load vmm.ko
(I actually rebooted several times when I was diagnosing it..
at first I thought xhci was holding on to the pass-thru device)

I vaguely remembered reading something about this recently.
I just tracked it down to the "ZFS i/o error in recent 12.0"
thread from last month, and this message in particular:

https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html

I'm booting via UEFI into a ZFS system with a FS that
extends across 2TB..

Is there something like tools/diag/prtblknos for ZFS?

Drew




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5316e5ea-17a2-2f23-3c88-1671f41b5642>