Date: Tue, 10 Apr 2018 17:54:10 +0300 From: Toomas Soome <tsoome@me.com> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: Allan Jude <allanjude@freebsd.org>, freebsd-current@freebsd.org Subject: Re: Odd ZFS boot module issue on r332158 Message-ID: <3DC3DAEB-A627-4488-873E-0AB6EA124D3F@me.com> In-Reply-To: <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu> References: <b772ee51-b27c-6591-c925-b4abd19678e8@cs.duke.edu> <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org> <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 10 Apr 2018, at 15:27, Andrew Gallatin <gallatin@cs.duke.edu> = wrote: >=20 > On 04/09/18 23:33, Allan Jude wrote: >> On 2018-04-09 19:11, Andrew Gallatin wrote: >>> I updated my main amd64 workstation to r332158 from something much >>> earlier (mid Jan). >>>=20 >>> Upon reboot, all seemed well. However, I later realized that the = vmm.ko >>> module was not loaded at boot, because bhyve PCI passthru did not >>> work. My loader.conf looks like (I'm passing a USB interface = through): >>>=20 >>> ####### >>> vmm_load=3D"YES" >>> opensolaris_load=3D"YES" >>> zfs_load=3D"YES" >>> nvidia_load=3D"YES" >>> nvidia-modeset_load=3D"YES" >>>=20 >>> # Tune ZFS Arc Size - Change to adjust memory used for disk cache >>> vfs.zfs.arc_max=3D"4096M" >>> hint.xhci.2.disabled=3D"1" >>> pptdevs=3D"8/0/0" >>> hw.dmar.enable=3D"0" >>> cuse_load=3D"YES" >>> ####### >>>=20 >>> The problem seems "random". I rebooted into single-user to >>> see if somehow, vmm.ko was loaded at boot and something >>> was unloading vmm.ko. However, on this boot it was loaded. I then >>> ^D'ed and continued to multi-user, where X failed to start because >>> this time, the nvidia modules were not loaded. (but nvidia had >>> been loaded on the 1st boot). >>>=20 >>> So it *seems* like different modules are randomly not loaded by the >>> loader, at boot. The ZFS config is: >>>=20 >>> config: >>>=20 >>> NAME STATE READ WRITE CKSUM >>> tank ONLINE 0 0 0 >>> mirror-0 ONLINE 0 0 0 >>> ada0p2 ONLINE 0 0 0 >>> da3p2 ONLINE 0 0 0 >>> mirror-1 ONLINE 0 0 0 >>> ada1p2 ONLINE 0 0 0 >>> da0p2 ONLINE 0 0 0 >>> cache >>> da2s1d ONLINE 0 0 0 >>>=20 >>> The data drives in the pool are all exactly like this: >>>=20 >>> =3D> 34 9767541101 ada0 GPT (4.5T) >>> 34 6 - free - (3.0K) >>> 40 204800 1 efi (100M) >>> 204840 9763209216 2 freebsd-zfs (4.5T) >>> 9763414056 4096000 3 freebsd-swap (2.0G) >>> 9767510056 31079 - free - (15M) >>>=20 >>>=20 >>> There is about 1.44T used in the pool. I have no idea >>> how ZFS mirrors work, but I'm wondering if somehow this >>> is a 2T problem, and there are issues with blocks on >>> difference sides of the mirror being across the 2T boundary. >>>=20 >>> Sorry to be so vague.. but this is the one machine I *don't* have >>> a serial console on, so I don't have good logs. >>>=20 >>> Drew >>>=20 >>> _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >> What makes you think it is related to ZFS? >> Are there any error messages when the nvidia module did not load? >=20 > I think it is related to ZFS simply because I'm booting from ZFS and > it is not working reliably. Our systems at work, booting from UFS on > roughly the same svn rev seem to still load modules reliably from > the loader. I know there has been a lot of work on the loader > recently, and in a UEFE + UFS context, I've seen it fail to boot > the right partition, etc. However, I've never seen it fail to load > just some modules. The one difference between what I run at home > and what we run at work is ZFS vs UFS. >=20 > Given that it is a glass console, I have no confidence in my ability > to log error messages. However, I could have sworn that I saw > something like "io error" when it failed to load vmm.ko > (I actually rebooted several times when I was diagnosing it.. > at first I thought xhci was holding on to the pass-thru device) >=20 > I vaguely remembered reading something about this recently. > I just tracked it down to the "ZFS i/o error in recent 12.0" > thread from last month, and this message in particular: >=20 > = https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html= >=20 > I'm booting via UEFI into a ZFS system with a FS that > extends across 2TB.. >=20 > Is there something like tools/diag/prtblknos for ZFS? >=20 run zpool scrub first, however, if you were able to load that module = manually from OS, there is no reason to suspect the zfs corruption. But if you really are getting IO errors, I would actually suspect that = the firmware is is buggy and can not really read past 2TB, so the = obvious second suggestion is to check for firmware update. The ZFS = reader code does try all block copies before giving up on the block, so = the third option you can test is: 1. reboot 2. press esc when the boot menu is up to get to OK prompt 3. enter: start this would load the configured files and you will get the error = messages. Also once you have kernel loaded, you can try to load modules = manually with load command. If still nothing, the only way to ensure your data is below 2TB line is = to create separate partition for boot pool or use smaller disks for OS. rgds, toomas=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DC3DAEB-A627-4488-873E-0AB6EA124D3F>