From owner-freebsd-fs@freebsd.org Tue Mar 7 08:48:22 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 49627CFA0A6 for ; Tue, 7 Mar 2017 08:48:22 +0000 (UTC) (envelope-from tsoome@me.com) Received: from st13p35im-asmtp002.me.com (st13p35im-asmtp002.me.com [17.164.199.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0C6BC15E3; Tue, 7 Mar 2017 08:48:22 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.st13p35im-asmtp002.me.com by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) id <0OMF00A00SGD8D00@st13p35im-asmtp002.me.com>; Tue, 07 Mar 2017 08:48:21 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=4d515a; t=1488876500; bh=MGl6ufMZiRlsU6CqLJr1PNRrJe/PNq1ZDWVDDRW6raA=; h=From:Message-id:Content-type:MIME-version:Subject:Date:To; b=OcK/3+ozSiiKfmRus9OiCVbA3JkTUC7EN0cnsdO6U3TEYheVa5W30M9Rz1zUh8D/R tlcfxMlKWd6rhxhET3+ceJIBwZ5PDHNPOYHkPrSFswexoHpAdZbStcnKsJ1jj+UB0T 4yiEnELKmpLX+1gVnT6WxYEFVspV+6QurnPAgKexVGjepsdLGpFA4+HdH1v1XtAdiN PjVT7MOwKs9IPkTdzNL0NEeTS3mIt4+uy472ZPwTS2l+oeCH0Vkfg4cIOS8p7N5Sz+ gMFLdsdOvqTnd/kWkPMhS2z2+e6++v7GZjNfH2FAHC6MqGUKtIolqcfB+J61iYAc0h Q/PSTeNRBoi7w== Received: from icloud.com ([127.0.0.1]) by st13p35im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) with ESMTPSA id <0OMF00OIWSGHCJ40@st13p35im-asmtp002.me.com>; Tue, 07 Mar 2017 08:48:20 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-07_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1034 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1701120000 definitions=main-1703070076 From: Toomas Soome Message-id: <814E1C65-23E3-42A1-8093-8008DF188506@me.com> MIME-version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: svn commit: r308089 - in head Date: Tue, 07 Mar 2017 10:48:16 +0200 In-reply-to: <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org> Cc: Andriy Gapon , freebsd-fs@freebsd.org, Toomas Soome , allanjude@freebsd.org To: Lawrence Stewart References: <201610291409.u9TE9WXJ020650@repo.freebsd.org> <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org> X-Mailer: Apple Mail (2.3259) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Mar 2017 08:48:22 -0000 > On 7. m=C3=A4rts 2017, at 10:25, Lawrence Stewart = wrote: >=20 > On 07/03/2017 18:04, Toomas Soome wrote: >>=20 >>> On 7. m=C3=A4rts 2017, at 7:25, Lawrence Stewart = wrote: >>>=20 >>> Hi Andriy, >>>=20 >>> On 30/10/2016 01:09, Andriy Gapon wrote: >>>> Author: avg >>>> Date: Sat Oct 29 14:09:32 2016 >>>> New Revision: 308089 >>>> URL: https://svnweb.freebsd.org/changeset/base/308089 >>>>=20 >>>> Log: >>>> zfsbootcfg: a simple tool to set next boot (one time) options for = zfsboot >>>>=20 >>>> (gpt)zfsboot will read one-time boot directives from a special ZFS = pool >>>> area. The area was previously described as "Boot Block Header", = but >>>> currently it is know as Pad2, marked as reserved and is zeroed out = on >>>> pool creation. The new code interprets data in this area, if any, = using >>>> the same format as boot.config. The area is immediately wiped out. >>>> Failure to parse the directives results in a reboot right after the >>>> cleanup. Otherwise the boot sequence proceeds as usual. >>>>=20 >>>> zfsbootcfg writes zfsboot arguments specified on its command line = to the >>>> Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and >>>> vfs.zfs.boot.primary_vdev kenv variables that are set by loader = during >>>> boot. Please see the manual page for more. >>>>=20 >>>> Thanks to all who reviewed, contributed and made suggestions! = There are >>>> many potential improvements to the feature, please see the review = for >>>> details. >>>>=20 >>>> Reviewed by: wblock (docs) >>>> Discussed with: jhb, tsoome >>>> MFC after: 3 weeks >>>> Relnotes: yes >>>> Differential Revision: https://reviews.freebsd.org/D7612 >>>>=20 >>>> Added: >>>> head/sbin/zfsbootcfg/ >>>> head/sbin/zfsbootcfg/Makefile (contents, props changed) >>>> head/sbin/zfsbootcfg/zfsbootcfg.8 (contents, props changed) >>>> head/sbin/zfsbootcfg/zfsbootcfg.c (contents, props changed) >>>> Modified: >>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h >>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c >>>> head/sbin/Makefile >>>> head/sys/boot/i386/common/drv.c >>>> head/sys/boot/i386/common/drv.h >>>> head/sys/boot/i386/gptzfsboot/Makefile >>>> head/sys/boot/i386/zfsboot/Makefile >>>> head/sys/boot/i386/zfsboot/zfsboot.c >>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h >>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c >>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> head/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h >>>>=20 >>> [snip] >>>> @@ -634,7 +712,39 @@ main(void) >>>> primary_spa =3D spa; >>>> primary_vdev =3D spa_get_primary_vdev(spa); >>>>=20 >>>> - if (zfs_spa_init(spa) !=3D 0 || zfs_mount(spa, 0, &zfsmount) = !=3D 0) { >>>> + nextboot =3D 0; >>>> + rc =3D vdev_read_pad2(primary_vdev, cmd, sizeof(cmd)); >>>> + if (vdev_clear_pad2(primary_vdev)) >>>> + printf("failed to clear pad2 area of primary vdev\n"); >>>> + if (rc =3D=3D 0) { >>>> + if (*cmd) { >>>> + /* >>>> + * We could find an old-style ZFS Boot Block header here. >>>> + * Simply ignore it. >>>> + */ >>>> + if (*(uint64_t *)cmd !=3D 0x2f5b007b10c) { >>>> + /* >>>> + * Note that parse() is destructive to cmd[] and we also = want >>>> + * to honor RBX_QUIET option that could be present in = cmd[]. >>>> + */ >>>> + nextboot =3D 1; >>>> + memcpy(cmddup, cmd, sizeof(cmd)); >>>> + if (parse()) { >>>> + printf("failed to parse pad2 area of primary = vdev\n"); >>>> + reboot(); >>>> + } >>>> + if (!OPT_CHECK(RBX_QUIET)) >>>> + printf("zfs nextboot: %s\n", cmddup); >>>> + } >>>> + /* Do not process this command twice */ >>>> + *cmd =3D 0; >>>> + } >>>> + } else >>>> + printf("failed to read pad2 area of primary vdev\n"); >>>> + >>>=20 >>> I've just taken Allan Jude's & co-conspirators' work for a spin that >>> allows gptzfsboot to boot from a geli + ZFS partition. Everything is >>> working amazingly well, but I see the above "failed to read pad2 = area of >>> primary vdev" message on every boot. >>>=20 >>> It doesn't appear to cause any problems per se and the system >>> boots/works fine. I assume that message is printed to signal an >>> unexpected situation though, so figured I'd get in touch to get your >>> thoughts. >>>=20 >>>=20 >>>=20 >>> I installed the KVM-based virtual machine system manually from the = live >>> shell of: >>>=20 >>> FreeBSD-12.0-CURRENT-amd64-20170301-r314495-disc1.iso >>>=20 >>>=20 >>>=20 >>> The partitioning is very simple: >>>=20 >>> gpart create -s gpt /dev/vtbd0 >>> gpart add -t freebsd-boot -a 8 -b 40 -s 512k vtbd0 >>> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0 >>> gpart add -t freebsd-zfs -b 2088 vtbd0 >>>=20 >>> root# gpart show >>> =3D> 40 83886000 vtbd0 GPT (40G) >>> 40 1024 1 freebsd-boot (512K) >>> 1064 1024 - free - (512K) >>> 2088 83883952 2 freebsd-zfs (40G) >>>=20 >>>=20 >>>=20 >>> geli was inited/attached to vtbd0p2 and the zpool was created with = command: >>>=20 >>> zpool create -o altroot=3D/tmp/zroot -o cachefile=3D/tmp/zpool.cache = -O >>> checksum=3Dskein -O compression=3Dlz4 vtbd0p2.eli >>>=20 >>> i.e. the entire pool including bootfs is using skein for = checksumming >>> and lz4 for compression. >>>=20 >>>=20 >>>=20 >>> I hit another boot bug using skein previously which Toomas (CCed) = fixed, >>> and am wondering if this issue might also be related to the skein >>> implementation. >>>=20 >>> I haven't tested if the zfsbootcfg functionality works for fear that = the >>> printf is indicating a low level problem with the zpool. I can test >>> potentially destructive things and break the pool though if that = would >>> be helpful. >>>=20 >>> Any thoughts? >>>=20 >>> Cheers, >>> Lawrence >>=20 >>=20 >> The problem with having pool on geli encrypted partition is that all = the reads done on such partition, gave to go through geli aware read() = function, and the same is true for writes (which is important for = nextboot feature). So what it means for gptzfsboot/zfsboot is that we = would need to have the disk reads/writes go through the geli aware = functions and we can not issue =E2=80=9Cpure=E2=80=9D disk io directly. >=20 > [+Allan] >=20 > Presumably that functionality exists given that the geli support Allan > added to gptzfsboot is able to read loader and loader is able to read > everything in /boot from the geli-encrypted ZFS pool? The problem is deeper, the idea behind the nextboot is that it is = attempting to provide recovery from failed boot, so if you set nextboot = dataset, attempt to boot from it, you need to do 2 things: 1. detect the = nextboot config, so you would actually be able to use it, and 2, you = want to reset it as early as possible, because later you may not have a = chance. So it means the gptzfsboot has to read out the config to know where from = it has to load the zfsloader, and gptzfsboot has to reset the config, so = that if anything will go wrong, on next boot the fallback or = =E2=80=9Cnormal=E2=80=9D boot will be done. Which means that either = gptzfsboot has to know how to deal with geli in context of handling = nextboot, or with geli, you just can not use nextboot config. The similar issue is with using boot block area in zfs pool label - to = be able to store and use gptzfsboot in pool label boot area, the boot1 = either has to know how to read the geli, or geli must be able not to = encrypt the bootblock area, or we just can not use that area [with = geli]. All in all, it is another example of the chicken and the egg = issue:) rgds, toomas=