Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2018 17:54:10 +0300
From:      Toomas Soome <tsoome@me.com>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        Allan Jude <allanjude@freebsd.org>, freebsd-current@freebsd.org
Subject:   Re: Odd ZFS boot module issue on r332158
Message-ID:  <3DC3DAEB-A627-4488-873E-0AB6EA124D3F@me.com>
In-Reply-To: <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu>
References:  <b772ee51-b27c-6591-c925-b4abd19678e8@cs.duke.edu> <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org> <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 10 Apr 2018, at 15:27, Andrew Gallatin <gallatin@cs.duke.edu> =
wrote:
>=20
> On 04/09/18 23:33, Allan Jude wrote:
>> On 2018-04-09 19:11, Andrew Gallatin wrote:
>>> I updated my main amd64 workstation to r332158 from something much
>>> earlier (mid Jan).
>>>=20
>>> Upon reboot, all seemed well.  However, I later realized that the =
vmm.ko
>>> module was not loaded at boot, because bhyve PCI passthru did not
>>> work.  My loader.conf looks like (I'm passing a USB interface =
through):
>>>=20
>>> #######
>>> vmm_load=3D"YES"
>>> opensolaris_load=3D"YES"
>>> zfs_load=3D"YES"
>>> nvidia_load=3D"YES"
>>> nvidia-modeset_load=3D"YES"
>>>=20
>>> # Tune ZFS Arc Size - Change to adjust memory used for disk cache
>>> vfs.zfs.arc_max=3D"4096M"
>>> hint.xhci.2.disabled=3D"1"
>>> pptdevs=3D"8/0/0"
>>> hw.dmar.enable=3D"0"
>>> cuse_load=3D"YES"
>>> #######
>>>=20
>>> The problem seems "random".  I rebooted into single-user to
>>> see if somehow, vmm.ko was loaded at boot and something
>>> was unloading vmm.ko.  However, on this boot it was loaded.  I then
>>> ^D'ed and continued to multi-user, where X failed to start because
>>> this time, the nvidia modules were not loaded.  (but nvidia had
>>> been loaded on the 1st boot).
>>>=20
>>> So it *seems* like different modules are randomly not loaded by the
>>> loader, at boot.   The ZFS config is:
>>>=20
>>> config:
>>>=20
>>>         NAME        STATE     READ WRITE CKSUM
>>>         tank        ONLINE       0     0     0
>>>           mirror-0  ONLINE       0     0     0
>>>             ada0p2  ONLINE       0     0     0
>>>             da3p2   ONLINE       0     0     0
>>>           mirror-1  ONLINE       0     0     0
>>>             ada1p2  ONLINE       0     0     0
>>>             da0p2   ONLINE       0     0     0
>>>         cache
>>>           da2s1d    ONLINE       0     0     0
>>>=20
>>> The data drives in the pool are all exactly like this:
>>>=20
>>> =3D>        34  9767541101  ada0  GPT  (4.5T)
>>>           34           6        - free -  (3.0K)
>>>           40      204800     1  efi  (100M)
>>>       204840  9763209216     2  freebsd-zfs  (4.5T)
>>>   9763414056     4096000     3  freebsd-swap  (2.0G)
>>>   9767510056       31079        - free -  (15M)
>>>=20
>>>=20
>>> There is about 1.44T used in the pool.  I have no idea
>>> how ZFS mirrors work, but I'm wondering if somehow this
>>> is a 2T problem, and there are issues with blocks on
>>> difference sides of the mirror being across the 2T boundary.
>>>=20
>>> Sorry to be so vague.. but this is the one machine I *don't* have
>>> a serial console on, so I don't have good logs.
>>>=20
>>> Drew
>>>=20
>>> _______________________________________________
>>> freebsd-current@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"
>> What makes you think it is related to ZFS?
>> Are there any error messages when the nvidia module did not load?
>=20
> I think it is related to ZFS simply because I'm booting from ZFS and
> it is not working reliably.  Our systems at work, booting from UFS on
> roughly the same svn rev seem to still load modules reliably from
> the loader.  I know there has been a lot of work on the loader
> recently, and in a UEFE + UFS context, I've seen it fail to boot
> the right partition, etc.  However, I've never seen it fail to load
> just some modules.  The one difference between what I run at home
> and what we run at work is ZFS vs UFS.
>=20
> Given that it is a glass console, I have no confidence in my ability
> to log error messages.   However, I could have sworn that I saw
> something like "io error" when it failed to load vmm.ko
> (I actually rebooted several times when I was diagnosing it..
> at first I thought xhci was holding on to the pass-thru device)
>=20
> I vaguely remembered reading something about this recently.
> I just tracked it down to the "ZFS i/o error in recent 12.0"
> thread from last month, and this message in particular:
>=20
> =
https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html=

>=20
> I'm booting via UEFI into a ZFS system with a FS that
> extends across 2TB..
>=20
> Is there something like tools/diag/prtblknos for ZFS?
>=20

run zpool scrub first, however, if you were able to load that module =
manually from OS, there is no reason to suspect the zfs corruption.

But if you really are getting IO errors, I would actually suspect that =
the firmware is is buggy and can not really read past 2TB, so the =
obvious second suggestion is to check for firmware update. The ZFS =
reader code does try all block copies before giving up on the block, so =
the third option you can test is:

1. reboot
2. press esc when the boot menu is up to get to OK prompt
3. enter:  start

this would load the configured files and you will get the error =
messages. Also once you have kernel loaded, you can try to load modules =
manually with load command.

If still nothing, the only way to ensure your data is below 2TB line is =
to create separate partition for boot pool or use smaller disks for OS.

rgds,
toomas=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DC3DAEB-A627-4488-873E-0AB6EA124D3F>