Date: Wed, 8 Mar 2017 13:50:18 +0800 From: Julian Elischer <julian@freebsd.org> To: Toomas Soome <tsoome@me.com>, Lawrence Stewart <lstewart@freebsd.org> Cc: freebsd-fs@freebsd.org, Toomas Soome <tsoome@freebsd.org>, allanjude@freebsd.org, Andriy Gapon <avg@freebsd.org> Subject: Re: svn commit: r308089 - in head Message-ID: <4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a@freebsd.org> In-Reply-To: <814E1C65-23E3-42A1-8093-8008DF188506@me.com> References: <201610291409.u9TE9WXJ020650@repo.freebsd.org> <c4cc03d0-d26e-f7c0-8399-d65f2aa0c5ef@freebsd.org> <CCB18F77-A9C3-4D22-82A3-9DD84DF783F9@me.com> <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org> <814E1C65-23E3-42A1-8093-8008DF188506@me.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7/3/17 4:48 pm, Toomas Soome wrote: >> On 7. märts 2017, at 10:25, Lawrence Stewart <lstewart@freebsd.org> wrote: >> >> On 07/03/2017 18:04, Toomas Soome wrote: >>>> On 7. märts 2017, at 7:25, Lawrence Stewart <lstewart@freebsd.org> wrote: >>>> >>>> Hi Andriy, >>>> >>>> On 30/10/2016 01:09, Andriy Gapon wrote: >>>>> Author: avg >>>>> Date: Sat Oct 29 14:09:32 2016 >>>>> New Revision: 308089 >>>>> URL: https://svnweb.freebsd.org/changeset/base/308089 >>>>> >>>>> Log: >>>>> zfsbootcfg: a simple tool to set next boot (one time) options for zfsboot >>>>> >>>>> (gpt)zfsboot will read one-time boot directives from a special ZFS pool >>>>> area. The area was previously described as "Boot Block Header", but >>>>> currently it is know as Pad2, marked as reserved and is zeroed out on >>>>> pool creation. The new code interprets data in this area, if any, using >>>>> the same format as boot.config. The area is immediately wiped out. >>>>> Failure to parse the directives results in a reboot right after the >>>>> cleanup. Otherwise the boot sequence proceeds as usual. >>>>> >>>>> zfsbootcfg writes zfsboot arguments specified on its command line to the >>>>> Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and >>>>> vfs.zfs.boot.primary_vdev kenv variables that are set by loader during >>>>> boot. Please see the manual page for more. >>>>> >>>>> Thanks to all who reviewed, contributed and made suggestions! There are >>>>> many potential improvements to the feature, please see the review for >>>>> details. >>>>> >>>>> Reviewed by: wblock (docs) >>>>> Discussed with: jhb, tsoome >>>>> MFC after: 3 weeks >>>>> Relnotes: yes >>>>> Differential Revision: https://reviews.freebsd.org/D7612 >>>>> >>>>> Added: >>>>> head/sbin/zfsbootcfg/ >>>>> head/sbin/zfsbootcfg/Makefile (contents, props changed) >>>>> head/sbin/zfsbootcfg/zfsbootcfg.8 (contents, props changed) >>>>> head/sbin/zfsbootcfg/zfsbootcfg.c (contents, props changed) >>>>> Modified: >>>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h >>>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c >>>>> head/sbin/Makefile >>>>> head/sys/boot/i386/common/drv.c >>>>> head/sys/boot/i386/common/drv.h >>>>> head/sys/boot/i386/gptzfsboot/Makefile >>>>> head/sys/boot/i386/zfsboot/Makefile >>>>> head/sys/boot/i386/zfsboot/zfsboot.c >>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h >>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c >>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> head/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h >>>>> >>>> [snip] >>>>> @@ -634,7 +712,39 @@ main(void) >>>>> primary_spa = spa; >>>>> primary_vdev = spa_get_primary_vdev(spa); >>>>> >>>>> - if (zfs_spa_init(spa) != 0 || zfs_mount(spa, 0, &zfsmount) != 0) { >>>>> + nextboot = 0; >>>>> + rc = vdev_read_pad2(primary_vdev, cmd, sizeof(cmd)); >>>>> + if (vdev_clear_pad2(primary_vdev)) >>>>> + printf("failed to clear pad2 area of primary vdev\n"); >>>>> + if (rc == 0) { >>>>> + if (*cmd) { >>>>> + /* >>>>> + * We could find an old-style ZFS Boot Block header here. >>>>> + * Simply ignore it. >>>>> + */ >>>>> + if (*(uint64_t *)cmd != 0x2f5b007b10c) { >>>>> + /* >>>>> + * Note that parse() is destructive to cmd[] and we also want >>>>> + * to honor RBX_QUIET option that could be present in cmd[]. >>>>> + */ >>>>> + nextboot = 1; >>>>> + memcpy(cmddup, cmd, sizeof(cmd)); >>>>> + if (parse()) { >>>>> + printf("failed to parse pad2 area of primary vdev\n"); >>>>> + reboot(); >>>>> + } >>>>> + if (!OPT_CHECK(RBX_QUIET)) >>>>> + printf("zfs nextboot: %s\n", cmddup); >>>>> + } >>>>> + /* Do not process this command twice */ >>>>> + *cmd = 0; >>>>> + } >>>>> + } else >>>>> + printf("failed to read pad2 area of primary vdev\n"); >>>>> + >>>> I've just taken Allan Jude's & co-conspirators' work for a spin that >>>> allows gptzfsboot to boot from a geli + ZFS partition. Everything is >>>> working amazingly well, but I see the above "failed to read pad2 area of >>>> primary vdev" message on every boot. >>>> >>>> It doesn't appear to cause any problems per se and the system >>>> boots/works fine. I assume that message is printed to signal an >>>> unexpected situation though, so figured I'd get in touch to get your >>>> thoughts. >>>> >>>> >>>> >>>> I installed the KVM-based virtual machine system manually from the live >>>> shell of: >>>> >>>> FreeBSD-12.0-CURRENT-amd64-20170301-r314495-disc1.iso >>>> >>>> >>>> >>>> The partitioning is very simple: >>>> >>>> gpart create -s gpt /dev/vtbd0 >>>> gpart add -t freebsd-boot -a 8 -b 40 -s 512k vtbd0 >>>> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0 >>>> gpart add -t freebsd-zfs -b 2088 vtbd0 >>>> >>>> root# gpart show >>>> => 40 83886000 vtbd0 GPT (40G) >>>> 40 1024 1 freebsd-boot (512K) >>>> 1064 1024 - free - (512K) >>>> 2088 83883952 2 freebsd-zfs (40G) >>>> >>>> >>>> >>>> geli was inited/attached to vtbd0p2 and the zpool was created with command: >>>> >>>> zpool create -o altroot=/tmp/zroot -o cachefile=/tmp/zpool.cache -O >>>> checksum=skein -O compression=lz4 <pool> vtbd0p2.eli >>>> >>>> i.e. the entire pool including bootfs is using skein for checksumming >>>> and lz4 for compression. >>>> >>>> >>>> >>>> I hit another boot bug using skein previously which Toomas (CCed) fixed, >>>> and am wondering if this issue might also be related to the skein >>>> implementation. >>>> >>>> I haven't tested if the zfsbootcfg functionality works for fear that the >>>> printf is indicating a low level problem with the zpool. I can test >>>> potentially destructive things and break the pool though if that would >>>> be helpful. >>>> >>>> Any thoughts? >>>> >>>> Cheers, >>>> Lawrence >>> >>> The problem with having pool on geli encrypted partition is that all the reads done on such partition, gave to go through geli aware read() function, and the same is true for writes (which is important for nextboot feature). So what it means for gptzfsboot/zfsboot is that we would need to have the disk reads/writes go through the geli aware functions and we can not issue “pure” disk io directly. >> [+Allan] >> >> Presumably that functionality exists given that the geli support Allan >> added to gptzfsboot is able to read loader and loader is able to read >> everything in /boot from the geli-encrypted ZFS pool? > > The problem is deeper, the idea behind the nextboot is that it is attempting to provide recovery from failed boot, so if you set nextboot dataset, attempt to boot from it, you need to do 2 things: 1. detect the nextboot config, so you would actually be able to use it, and 2, you want to reset it as early as possible, because later you may not have a chance. > > So it means the gptzfsboot has to read out the config to know where from it has to load the zfsloader, and gptzfsboot has to reset the config, so that if anything will go wrong, on next boot the fallback or “normal” boot will be done. Which means that either gptzfsboot has to know how to deal with geli in context of handling nextboot, or with geli, you just can not use nextboot config. > > The similar issue is with using boot block area in zfs pool label - to be able to store and use gptzfsboot in pool label boot area, the boot1 either has to know how to read the geli, or geli must be able not to encrypt the bootblock area, or we just can not use that area [with geli]. All in all, it is another example of the chicken and the egg issue:) this is why the ORIGINAL nextboot in freebsd 3 (+ or -) wrote the data into block 1 of the drive and read it from boot0, and rewrote block 1 after zeroing out teh entry. All using bios calls. 1/ read and remove ASAP, 2 don't depend on the filesystem.. it may be dead, and that is why we are redirecting somewhere else. the current nextboot is not nearly as useful and needs to be replaced as soon as possible as a failed experiment. things we coudl do to improve nextboot functionality: 1/ declare a partition type freebsd-bootinfo tha t is just raw boot info. 2/ store the info in a known place in the freebsd-zfs partition (what andriy is doing I believe) 3/ store it at the end of the freebsd-boot partition. It should be read by gptzfsboot and set into the environment (what comes earlier in a gpt system?) originally I read it using bios calls from boot0. that was of course a UFS system on a dedicated drive. > > rgds, > toomas > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4a498b08-7417-e7b1-2e5d-b0dbe5f3c49a>