From owner-freebsd-current@freebsd.org Fri Jun 29 06:22:13 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9326F1032AD0 for ; Fri, 29 Jun 2018 06:22:12 +0000 (UTC) (envelope-from tsoome@me.com) Received: from st13p35im-asmtp002.me.com (st13p35im-asmtp002.me.com [17.164.199.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 119CC7DF14; Fri, 29 Jun 2018 06:22:12 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.st13p35im-asmtp002.me.com by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) id <0PB200G00MTKOJ00@st13p35im-asmtp002.me.com>; Fri, 29 Jun 2018 06:21:55 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=04042017; t=1530253315; bh=7Q5TgLc+g36IaJQY1vtezrcym/ZE7tyyD9II1WsxyDQ=; h=From:Message-id:Content-type:MIME-version:Subject:Date:To; b=XF6BvOQx1ZyJI7WbcisYD+LxstWZ8+58UOECjotaaFatjVasTlkYXvNDTLuA5v+Ui Tr2d2k8ERKQC9sBp9eYN6yYPaD9AQFWzLviuyqZmhhHgHBCId4elfHIc9v9iY8dFw4 gCQdZQy2ho8mosl70bmUN41Y/Xb/CziWdRmGoYqumGVlY3kCS+susGY8AObbwe7P+K 1FnWxMM7M9ly3nQsWm9joN1R5wXaLsZS73wUjt78YlN4fgAXUqPMBmBq1A3nrekEDQ dG9r+9QiXcxUi5jSGW7tIHL68ympIkqwt6cRPt27txwnPY+ifrbD6qlp7kXpJ2rfZl MhDYB+R1rA8CA== Received: from icloud.com ([127.0.0.1]) by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) with ESMTPSA id <0PB200FPSN0AXM20@st13p35im-asmtp002.me.com>; Fri, 29 Jun 2018 06:21:54 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-06-29_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1015 suspectscore=8 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1806290071 From: Toomas Soome Message-id: <21645D05-EE5F-4BC3-B9F2-6AE4E851BCFE@me.com> MIME-version: 1.0 (Mac OS X Mail 11.4 \(3445.8.2\)) Subject: Re: ZFS: I/O error - blocks larger than 16777216 are not supported Date: Fri, 29 Jun 2018 09:21:45 +0300 In-reply-to: <201806290247.w5T2lDd6065483@kx.openedu.org> Cc: Allan Jude , freebsd-current@freebsd.org To: KIRIYAMA Kazuhiko References: <201806210136.w5L1a5Nv074194@kx.openedu.org> <21493592-4eb2-59c5-1b0d-e1d08217a96b@freebsd.org> <201806210600.w5L60mYn079435@kx.openedu.org> <1CDD5AFE-F115-406C-AB92-9DC58B57E1D5@me.com> <201806260208.w5Q28Una093666@kx.openedu.org> <63C1AB52-1A4B-430E-9D88-6406107785BA@me.com> <201806290247.w5T2lDd6065483@kx.openedu.org> X-Mailer: Apple Mail (2.3445.8.2) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2018 06:22:13 -0000 > On 29 Jun 2018, at 05:47, KIRIYAMA Kazuhiko = wrote: >=20 > At Tue, 26 Jun 2018 09:48:10 +0300, > Toomas Soome wrote: >>=20 >>=20 >>=20 >>> On 26 Jun 2018, at 05:08, KIRIYAMA Kazuhiko = wrote: >>>=20 >>> At Thu, 21 Jun 2018 10:48:28 +0300, >>> Toomas Soome wrote: >>>>=20 >>>>=20 >>>>=20 >>>>> On 21 Jun 2018, at 09:00, KIRIYAMA Kazuhiko = wrote: >>>>>=20 >>>>> At Wed, 20 Jun 2018 23:34:48 -0400, >>>>> Allan Jude wrote: >>>>>>=20 >>>>>> On 2018-06-20 21:36, KIRIYAMA Kazuhiko wrote: >>>>>>> Hi all, >>>>>>>=20 >>>>>>> I've been reported ZFS boot disable problem [1], and found >>>>>>> that this issue occers form RAID configuration [2]. So I >>>>>>> rebuit with RAID5 and re-installed 12.0-CURRENT >>>>>>> (r333982). But failed to boot with: >>>>>>>=20 >>>>>>> ZFS: i/o error - all block copies unavailable >>>>>>> ZFS: can't read MOS of pool zroot >>>>>>> gptzfsboot: failed to mount default pool zroot >>>>>>>=20 >>>>>>> FreeBSD/x86 boot >>>>>>> ZFS: I/O error - blocks larger than 16777216 are not supported >>>>>>> ZFS: can't find dataset u >>>>>>> Default: zroot/<0x0>: >>>>>>>=20 >>>>>>> In this case, the reason is "blocks larger than 16777216 are >>>>>>> not supported" and I guess this means datasets that have >>>>>>> recordsize greater than 8GB is NOT supported by the >>>>>>> FreeBSD boot loader(zpool-features(7)). Is that true ? >>>>>>>=20 >>>>>>> My zpool featues are as follows: >>>>>>>=20 >>>>>>> # kldload zfs >>>>>>> # zpool import=20 >>>>>>> pool: zroot >>>>>>> id: 13407092850382881815 >>>>>>> state: ONLINE >>>>>>> status: The pool was last accessed by another system. >>>>>>> action: The pool can be imported using its name or numeric = identifier and >>>>>>> the '-f' flag. >>>>>>> see: http://illumos.org/msg/ZFS-8000-EY >>>>>>> config: >>>>>>>=20 >>>>>>> zroot ONLINE >>>>>>> mfid0p3 ONLINE >>>>>>> # zpool import -fR /mnt zroot >>>>>>> # zpool list >>>>>>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP = HEALTH ALTROOT >>>>>>> zroot 19.9T 129G 19.7T - 0% 0% 1.00x = ONLINE /mnt >>>>>>> # zpool get all zroot >>>>>>> NAME PROPERTY VALUE = SOURCE >>>>>>> zroot size 19.9T = - >>>>>>> zroot capacity 0% = - >>>>>>> zroot altroot /mnt = local >>>>>>> zroot health ONLINE = - >>>>>>> zroot guid = 13407092850382881815 default >>>>>>> zroot version - = default >>>>>>> zroot bootfs = zroot/ROOT/default local >>>>>>> zroot delegation on = default >>>>>>> zroot autoreplace off = default >>>>>>> zroot cachefile none = local >>>>>>> zroot failmode wait = default >>>>>>> zroot listsnapshots off = default >>>>>>> zroot autoexpand off = default >>>>>>> zroot dedupditto 0 = default >>>>>>> zroot dedupratio 1.00x = - >>>>>>> zroot free 19.7T = - >>>>>>> zroot allocated 129G = - >>>>>>> zroot readonly off = - >>>>>>> zroot comment - = default >>>>>>> zroot expandsize - = - >>>>>>> zroot freeing 0 = default >>>>>>> zroot fragmentation 0% = - >>>>>>> zroot leaked 0 = default >>>>>>> zroot feature@async_destroy enabled = local >>>>>>> zroot feature@empty_bpobj active = local >>>>>>> zroot feature@lz4_compress active = local >>>>>>> zroot feature@multi_vdev_crash_dump enabled = local >>>>>>> zroot feature@spacemap_histogram active = local >>>>>>> zroot feature@enabled_txg active = local >>>>>>> zroot feature@hole_birth active = local >>>>>>> zroot feature@extensible_dataset enabled = local >>>>>>> zroot feature@embedded_data active = local >>>>>>> zroot feature@bookmarks enabled = local >>>>>>> zroot feature@filesystem_limits enabled = local >>>>>>> zroot feature@large_blocks enabled = local >>>>>>> zroot feature@sha512 enabled = local >>>>>>> zroot feature@skein enabled = local >>>>>>> zroot unsupported@com.delphix:device_removal inactive = local >>>>>>> zroot unsupported@com.delphix:obsolete_counts inactive = local >>>>>>> zroot unsupported@com.delphix:zpool_checkpoint inactive = local >>>>>>> #=20 >>>>>>>=20 >>>>>>> Regards >>>>>>>=20 >>>>>>> [1] = https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068886.html= >>>>>>> [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D151910 >>>>>>>=20 >>>>>>> --- >>>>>>> KIRIYAMA Kazuhiko >>>>>>> _______________________________________________ >>>>>>> freebsd-current@freebsd.org mailing list >>>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-current >>>>>>> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >>>>>>>=20 >>>>>>=20 >>>>>> I am guessing it means something is corrupt, as 16MB is the = maximum size >>>>>> of a record in ZFS. Also, the 'large_blocks' feature is = 'enabled', not >>>>>> 'active', so this suggest you do not have any records larger than = 128kb >>>>>> on your pool. >>>>>=20 >>>>> As I mentioned above, [2] says ZFS on RAID disks have any >>>>> serious bugs except for mirror. Anyway I gave up to use ZFS >>>>> on RAID{5,6}* until Bug 151910 [2] fixed. >>>>>=20 >>>>=20 >>>> if you boot from usb stick (or cd), press esc at boot loader menu = and enter lsdev -v. what sector and disk sizes are reported? >>>=20 >>> OK lsdev -v >>> disk devices: >>> disk0: BIOS drive C (31588352 X 512) >>> disk0p1: FreeBSD boot 512KB >>> disk0p2: FreeBSD UFS 13GB >>> disk0p3: FreeBSD swap 771MB >>> disk1: BIOS drive D (4294967295 X 512) >>> disk0p1: FreeBSD boot 512KB >>> disk0p2: FreeBSD swap 128GB >>> disk0p3: FreeBSD ZFS 19TB >>> OK >>>=20 >>> Does this means whole disk size that I can use is >>> 2TB (4294967295 X 512) ?=20 >>=20 >>=20 >> Yes, or to be exact, that is the disk size reported by the INT13; and = as below you do get the same value from UEFI, the limit seems to be set = by the RAID controller itself. In this case it means that the best way = to address the issue is to create one smaller lun for boot disk (zroot = pool) and larger for data. Or of course you can have separate FreeBSD = ZFS partition for zroot, just make sure it will fit inside the first = 2TB. >>=20 >> Of course there may be option for RAID firmware update, or = configuration settings for lun, or use JBOD mode (if supported by the = card). JBOD would be the best because in the current setup, the pool is = vulnerable against silent data corruption (checksum errors) and has no = way to recover (this is the reason why RAID setups are not preferred = with zfs). >=20 > My RAID card is AVAGO MegaRAID (SAS-MFI BIOS Version > 6.36.00.0) and find it to be enable JBOD-mode. So I change > RAID-mode to JBOD-mode and make each disk to JBOD. Then > reboot and checked at loader prompt 'lsdev -v', all disk is > recognized as single device 'mfidx' (x=3D0,2,..,11). Anyway I > re-installed as ZFS RAIDZ-3 with UEFI boot. Result is fine !!! >=20 > Each disk was recoginized up to 2TB as a ZFS file system and > built a zpool (zroot) as raidz3 from those disks: >=20 > OK lsdev -v > = PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/VenHw(CF31FAC5-C24E-11D2-85F3-00A0C= 93EC93B,80) > disk0: 3907029168 X 512 blocks > disk0p1: EFI 200MB > disk0p2: FreeBSD swap 8192MBB > disk0p3: FreeBSD ZFS 1854GBB > = PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/VenHw(CF31FAC5-C24E-11D2-85F3-00A0C= 93EC93B,81) > disk1: 3907029168 X 512 blocks > disk1p1: EFI 200MB > disk1p2: FreeBSD swap 8192MBB > disk1p3: FreeBSD ZFS 1854GBB > = PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/VenHw(CF31FAC5-C24E-11D2-85F3-00A0C= 93EC93B,82) > disk1: 3907029168 X 512 blocks > disk1p1: EFI 200MB > disk1p2: FreeBSD swap 8192MBB > disk1p3: FreeBSD ZFS 1854GBB > : > = PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/VenHw(CF31FAC5-C24E-11D2-85F3-00A0C= 93EC93B,8B) > disk11: 3907029168 X 512 blocks > disk11p1: EFI 200MB > disk11p2: FreeBSD swap 8192MBB > disk11p3: FreeBSD ZFS 1854GBB > net devices: > zfs devices: > pool: zroot > bootfs: zroot/ROOT/default > config: >=20 > NAME STATE > zroot ONLINE > raidz3 ONLINE > mfid0p3 ONLINE > mfid1p3 ONLINE > mfid2p3 ONLINE > mfid3p3 ONLINE > mfid4p3 ONLINE > mfid5p3 ONLINE > mfid6p3 ONLINE > mfid7p3 ONLINE > mfid8p3 ONLINE > mfid9p3 ONLINE > mfid10p3 ONLINE > mfid11p3 ONLINE > OK >=20 > Built-up ZFS file system on FreeBSD 12.0-CURRENT (r335317) > is as follwos: >=20 > # gpart show mfid0 > =3D> 40 3907029088 mfid0 GPT (1.8T) > 40 409600 1 efi (200M) > 409640 2008 - free - (1.0M) > 411648 16777216 2 freebsd-swap (8.0G) > 17188864 3889840128 3 freebsd-zfs (1.8T) > 3907028992 136 - free - (68K) >=20 > # zpool status > pool: zroot > state: ONLINE > scan: none requested > config: >=20 > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > raidz3-0 ONLINE 0 0 0 > mfid0p3 ONLINE 0 0 0 > mfid1p3 ONLINE 0 0 0 > mfid2p3 ONLINE 0 0 0 > mfid3p3 ONLINE 0 0 0 > mfid4p3 ONLINE 0 0 0 > mfid5p3 ONLINE 0 0 0 > mfid6p3 ONLINE 0 0 0 > mfid7p3 ONLINE 0 0 0 > mfid8p3 ONLINE 0 0 0 > mfid9p3 ONLINE 0 0 0 > mfid10p3 ONLINE 0 0 0 > mfid11p3 ONLINE 0 0 0 >=20 > errors: No known data errors > # zpool list > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP = HEALTH ALTROOT > zroot 21.6T 2.55G 21.6T - - 0% 0% 1.00x = ONLINE - > # zpool get all zroot > NAME PROPERTY VALUE = SOURCE > zroot size 21.6T - > zroot capacity 0% - > zroot altroot - = default > zroot health ONLINE - > zroot guid 2002381236893751526 = default > zroot version - = default > zroot bootfs zroot/ROOT/default = local > zroot delegation on = default > zroot autoreplace off = default > zroot cachefile - = default > zroot failmode wait = default > zroot listsnapshots off = default > zroot autoexpand off = default > zroot dedupditto 0 = default > zroot dedupratio 1.00x - > zroot free 21.6T - > zroot allocated 2.55G - > zroot readonly off - > zroot comment - = default > zroot expandsize - - > zroot freeing 0 = default > zroot fragmentation 0% - > zroot leaked 0 = default > zroot bootsize - = default > zroot checkpoint - - > zroot feature@async_destroy enabled = local > zroot feature@empty_bpobj active = local > zroot feature@lz4_compress active = local > zroot feature@multi_vdev_crash_dump enabled = local > zroot feature@spacemap_histogram active = local > zroot feature@enabled_txg active = local > zroot feature@hole_birth active = local > zroot feature@extensible_dataset enabled = local > zroot feature@embedded_data active = local > zroot feature@bookmarks enabled = local > zroot feature@filesystem_limits enabled = local > zroot feature@large_blocks enabled = local > zroot feature@sha512 enabled = local > zroot feature@skein enabled = local > zroot feature@device_removal enabled = local > zroot feature@obsolete_counts enabled = local > zroot feature@zpool_checkpoint enabled = local > # uname -a > FreeBSD vm.openedu.org 12.0-CURRENT FreeBSD = 12.0-CURRENT #0 r335317: Mon Jun 18 16:21:17 UTC 2018 = root@releng3.nyi.freebsd.org = :/usr/obj/usr/src/amd64.amd64/sys/GEN= ERIC amd64 > # df -h > Filesystem Size Used Avail Capacity Mounted on > zroot/ROOT/default 15T 191M 15T 0% / > devfs 1.0K 1.0K 0B 100% /dev > zroot/.dake 15T 256K 15T 0% /.dake > zroot/ds 15T 279K 15T 0% /ds > zroot/ds/backup 15T 256K 15T 0% /ds/backup > zroot/ds/distfiles 15T 256K 15T 0% /ds/distfiles > zroot/ds/obj 15T 256K 15T 0% /ds/obj > zroot/ds/packages 15T 256K 15T 0% /ds/packages > zroot/ds/ports 15T 256K 15T 0% /ds/ports > zroot/ds/src 15T 256K 15T 0% /ds/src > zroot/tmp 15T 302K 15T 0% /tmp > zroot/usr 15T 1.6G 15T 0% /usr > zroot/usr/home 15T 372K 15T 0% /usr/home > zroot/usr/local 15T 256K 15T 0% /usr/local > zroot/var 15T 395K 15T 0% /var > zroot/var/audit 15T 256K 15T 0% /var/audit > zroot/var/crash 15T 256K 15T 0% /var/crash > zroot/var/db 15T 9.2M 15T 0% /var/db > zroot/var/empty 15T 256K 15T 0% /var/empty > zroot/var/log 15T 337K 15T 0% /var/log > zroot/var/mail 15T 256K 15T 0% /var/mail > zroot/var/ports 15T 256K 15T 0% /var/ports > zroot/var/run 15T 442K 15T 0% /var/run > zroot/var/tmp 15T 256K 15T 0% /var/tmp > zroot/vm 15T 256K 15T 0% /vm > zroot 15T 256K 15T 0% /zroot > #=20 >=20 > Thankx for benignant advice ! >=20 Glad to hear! There is still an issue - you got the error about too = large block, meaning that somehow the read result is not validated or = something else is going on - it is good the dnode_read() has this check, = but I think, we should not have had got that far at all=E2=80=A6=20 rgds, toomas