From owner-freebsd-current@freebsd.org Tue Apr 10 15:54:25 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 411F8F82A0B for ; Tue, 10 Apr 2018 15:54:25 +0000 (UTC) (envelope-from tsoome@me.com) Received: from st13p35im-asmtp002.me.com (st13p35im-asmtp002.me.com [17.164.199.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DAE7C7071E; Tue, 10 Apr 2018 15:54:24 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.st13p35im-asmtp002.me.com by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) id <0P6Z0060019THO00@st13p35im-asmtp002.me.com>; Tue, 10 Apr 2018 14:54:17 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=04042017; t=1523372057; bh=XXgXxf9eaLrZguholz1KLLhDYAhErN7uJktGjRKSoVM=; h=Content-type:MIME-version:Subject:From:Date:Message-id:To; b=kxSEaLzvN4xMQHcNjJyaSrIoLZ5dSBAJYzkJqHaqQPZ5g7u8eGP/TCu1VaLUrdunR /AqyaO/F2TnpsYBY06RURGma29ZyDfraKI1y2uoGBWy74trGbg9l7hFMtnDMQPE9ra CxJNf9Fdr+GoNE4Ci7Mm9Jt5MiMNuRufpDvUcVigKUX6AdCYd4NhtS6A90PmjngqgZ AEEVawTUvY2gDVt3m+N8ohsU0Tmo7ylX3vFjqSTW8LjupPOWSQWK+Tyd6lNq4vzuRU FIB0B2VHIYpxUpm7BGgAITZcjna0+iRyZ7Upy2QiiT5czXuT7MXRnGwSZDUABgwHWD BPQvXoZrFxj7Q== Received: from icloud.com ([127.0.0.1]) by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) with ESMTPSA id <0P6Z0052V5EB1B30@st13p35im-asmtp002.me.com>; Tue, 10 Apr 2018 14:54:14 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-04-10_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1011 suspectscore=27 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1804100146 Content-type: text/plain; charset=us-ascii MIME-version: 1.0 (Mac OS X Mail 11.3 \(3445.6.18\)) Subject: Re: Odd ZFS boot module issue on r332158 From: Toomas Soome In-reply-to: <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu> Date: Tue, 10 Apr 2018 17:54:10 +0300 Cc: Allan Jude , freebsd-current@freebsd.org Content-transfer-encoding: quoted-printable Message-id: <3DC3DAEB-A627-4488-873E-0AB6EA124D3F@me.com> References: <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org> <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu> To: Andrew Gallatin X-Mailer: Apple Mail (2.3445.6.18) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2018 15:54:25 -0000 > On 10 Apr 2018, at 15:27, Andrew Gallatin = wrote: >=20 > On 04/09/18 23:33, Allan Jude wrote: >> On 2018-04-09 19:11, Andrew Gallatin wrote: >>> I updated my main amd64 workstation to r332158 from something much >>> earlier (mid Jan). >>>=20 >>> Upon reboot, all seemed well. However, I later realized that the = vmm.ko >>> module was not loaded at boot, because bhyve PCI passthru did not >>> work. My loader.conf looks like (I'm passing a USB interface = through): >>>=20 >>> ####### >>> vmm_load=3D"YES" >>> opensolaris_load=3D"YES" >>> zfs_load=3D"YES" >>> nvidia_load=3D"YES" >>> nvidia-modeset_load=3D"YES" >>>=20 >>> # Tune ZFS Arc Size - Change to adjust memory used for disk cache >>> vfs.zfs.arc_max=3D"4096M" >>> hint.xhci.2.disabled=3D"1" >>> pptdevs=3D"8/0/0" >>> hw.dmar.enable=3D"0" >>> cuse_load=3D"YES" >>> ####### >>>=20 >>> The problem seems "random". I rebooted into single-user to >>> see if somehow, vmm.ko was loaded at boot and something >>> was unloading vmm.ko. However, on this boot it was loaded. I then >>> ^D'ed and continued to multi-user, where X failed to start because >>> this time, the nvidia modules were not loaded. (but nvidia had >>> been loaded on the 1st boot). >>>=20 >>> So it *seems* like different modules are randomly not loaded by the >>> loader, at boot. The ZFS config is: >>>=20 >>> config: >>>=20 >>> NAME STATE READ WRITE CKSUM >>> tank ONLINE 0 0 0 >>> mirror-0 ONLINE 0 0 0 >>> ada0p2 ONLINE 0 0 0 >>> da3p2 ONLINE 0 0 0 >>> mirror-1 ONLINE 0 0 0 >>> ada1p2 ONLINE 0 0 0 >>> da0p2 ONLINE 0 0 0 >>> cache >>> da2s1d ONLINE 0 0 0 >>>=20 >>> The data drives in the pool are all exactly like this: >>>=20 >>> =3D> 34 9767541101 ada0 GPT (4.5T) >>> 34 6 - free - (3.0K) >>> 40 204800 1 efi (100M) >>> 204840 9763209216 2 freebsd-zfs (4.5T) >>> 9763414056 4096000 3 freebsd-swap (2.0G) >>> 9767510056 31079 - free - (15M) >>>=20 >>>=20 >>> There is about 1.44T used in the pool. I have no idea >>> how ZFS mirrors work, but I'm wondering if somehow this >>> is a 2T problem, and there are issues with blocks on >>> difference sides of the mirror being across the 2T boundary. >>>=20 >>> Sorry to be so vague.. but this is the one machine I *don't* have >>> a serial console on, so I don't have good logs. >>>=20 >>> Drew >>>=20 >>> _______________________________________________ >>> freebsd-current@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-current >>> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >> What makes you think it is related to ZFS? >> Are there any error messages when the nvidia module did not load? >=20 > I think it is related to ZFS simply because I'm booting from ZFS and > it is not working reliably. Our systems at work, booting from UFS on > roughly the same svn rev seem to still load modules reliably from > the loader. I know there has been a lot of work on the loader > recently, and in a UEFE + UFS context, I've seen it fail to boot > the right partition, etc. However, I've never seen it fail to load > just some modules. The one difference between what I run at home > and what we run at work is ZFS vs UFS. >=20 > Given that it is a glass console, I have no confidence in my ability > to log error messages. However, I could have sworn that I saw > something like "io error" when it failed to load vmm.ko > (I actually rebooted several times when I was diagnosing it.. > at first I thought xhci was holding on to the pass-thru device) >=20 > I vaguely remembered reading something about this recently. > I just tracked it down to the "ZFS i/o error in recent 12.0" > thread from last month, and this message in particular: >=20 > = https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html= >=20 > I'm booting via UEFI into a ZFS system with a FS that > extends across 2TB.. >=20 > Is there something like tools/diag/prtblknos for ZFS? >=20 run zpool scrub first, however, if you were able to load that module = manually from OS, there is no reason to suspect the zfs corruption. But if you really are getting IO errors, I would actually suspect that = the firmware is is buggy and can not really read past 2TB, so the = obvious second suggestion is to check for firmware update. The ZFS = reader code does try all block copies before giving up on the block, so = the third option you can test is: 1. reboot 2. press esc when the boot menu is up to get to OK prompt 3. enter: start this would load the configured files and you will get the error = messages. Also once you have kernel loaded, you can try to load modules = manually with load command. If still nothing, the only way to ensure your data is below 2TB line is = to create separate partition for boot pool or use smaller disks for OS. rgds, toomas=