Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Mar 2017 10:48:16 +0200
From:      Toomas Soome <tsoome@me.com>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        Andriy Gapon <avg@freebsd.org>, freebsd-fs@freebsd.org, Toomas Soome <tsoome@freebsd.org>, allanjude@freebsd.org
Subject:   Re: svn commit: r308089 - in head
Message-ID:  <814E1C65-23E3-42A1-8093-8008DF188506@me.com>
In-Reply-To: <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org>
References:  <201610291409.u9TE9WXJ020650@repo.freebsd.org> <c4cc03d0-d26e-f7c0-8399-d65f2aa0c5ef@freebsd.org> <CCB18F77-A9C3-4D22-82A3-9DD84DF783F9@me.com> <9f0b2f93-04b8-b90b-3cb5-13b8539b9171@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

> On 7. m=C3=A4rts 2017, at 10:25, Lawrence Stewart =
<lstewart@freebsd.org> wrote:
>=20
> On 07/03/2017 18:04, Toomas Soome wrote:
>>=20
>>> On 7. m=C3=A4rts 2017, at 7:25, Lawrence Stewart =
<lstewart@freebsd.org> wrote:
>>>=20
>>> Hi Andriy,
>>>=20
>>> On 30/10/2016 01:09, Andriy Gapon wrote:
>>>> Author: avg
>>>> Date: Sat Oct 29 14:09:32 2016
>>>> New Revision: 308089
>>>> URL: https://svnweb.freebsd.org/changeset/base/308089
>>>>=20
>>>> Log:
>>>> zfsbootcfg: a simple tool to set next boot (one time) options for =
zfsboot
>>>>=20
>>>> (gpt)zfsboot will read one-time boot directives from a special ZFS =
pool
>>>> area.  The area was previously described as "Boot Block Header", =
but
>>>> currently it is know as Pad2, marked as reserved and is zeroed out =
on
>>>> pool creation.  The new code interprets data in this area, if any, =
using
>>>> the same format as boot.config.  The area is immediately wiped out.
>>>> Failure to parse the directives results in a reboot right after the
>>>> cleanup.  Otherwise the boot sequence proceeds as usual.
>>>>=20
>>>> zfsbootcfg writes zfsboot arguments specified on its command line =
to the
>>>> Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and
>>>> vfs.zfs.boot.primary_vdev kenv variables that are set by loader =
during
>>>> boot.  Please see the manual page for more.
>>>>=20
>>>> Thanks to all who reviewed, contributed and made suggestions!  =
There are
>>>> many potential improvements to the feature, please see the review =
for
>>>> details.
>>>>=20
>>>> Reviewed by:	wblock (docs)
>>>> Discussed with:	jhb, tsoome
>>>> MFC after:	3 weeks
>>>> Relnotes:	yes
>>>> Differential Revision: https://reviews.freebsd.org/D7612
>>>>=20
>>>> Added:
>>>> head/sbin/zfsbootcfg/
>>>> head/sbin/zfsbootcfg/Makefile   (contents, props changed)
>>>> head/sbin/zfsbootcfg/zfsbootcfg.8   (contents, props changed)
>>>> head/sbin/zfsbootcfg/zfsbootcfg.c   (contents, props changed)
>>>> Modified:
>>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs.h
>>>> head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_pool.c
>>>> head/sbin/Makefile
>>>> head/sys/boot/i386/common/drv.c
>>>> head/sys/boot/i386/common/drv.h
>>>> head/sys/boot/i386/gptzfsboot/Makefile
>>>> head/sys/boot/i386/zfsboot/Makefile
>>>> head/sys/boot/i386/zfsboot/zfsboot.c
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev.h
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
>>>> head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> head/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h
>>>>=20
>>> [snip]
>>>> @@ -634,7 +712,39 @@ main(void)
>>>>    primary_spa =3D spa;
>>>>    primary_vdev =3D spa_get_primary_vdev(spa);
>>>>=20
>>>> -    if (zfs_spa_init(spa) !=3D 0 || zfs_mount(spa, 0, &zfsmount) =
!=3D 0) {
>>>> +    nextboot =3D 0;
>>>> +    rc  =3D vdev_read_pad2(primary_vdev, cmd, sizeof(cmd));
>>>> +    if (vdev_clear_pad2(primary_vdev))
>>>> +	printf("failed to clear pad2 area of primary vdev\n");
>>>> +    if (rc =3D=3D 0) {
>>>> +	if (*cmd) {
>>>> +	    /*
>>>> +	     * We could find an old-style ZFS Boot Block header here.
>>>> +	     * Simply ignore it.
>>>> +	     */
>>>> +	    if (*(uint64_t *)cmd !=3D 0x2f5b007b10c) {
>>>> +		/*
>>>> +		 * Note that parse() is destructive to cmd[] and we also =
want
>>>> +		 * to honor RBX_QUIET option that could be present in =
cmd[].
>>>> +		 */
>>>> +		nextboot =3D 1;
>>>> +		memcpy(cmddup, cmd, sizeof(cmd));
>>>> +		if (parse()) {
>>>> +		    printf("failed to parse pad2 area of primary =
vdev\n");
>>>> +		    reboot();
>>>> +		}
>>>> +		if (!OPT_CHECK(RBX_QUIET))
>>>> +		    printf("zfs nextboot: %s\n", cmddup);
>>>> +	    }
>>>> +	    /* Do not process this command twice */
>>>> +	    *cmd =3D 0;
>>>> +	}
>>>> +    } else
>>>> +	printf("failed to read pad2 area of primary vdev\n");
>>>> +
>>>=20
>>> I've just taken Allan Jude's & co-conspirators' work for a spin that
>>> allows gptzfsboot to boot from a geli + ZFS partition. Everything is
>>> working amazingly well, but I see the above "failed to read pad2 =
area of
>>> primary vdev" message on every boot.
>>>=20
>>> It doesn't appear to cause any problems per se and the system
>>> boots/works fine. I assume that message is printed to signal an
>>> unexpected situation though, so figured I'd get in touch to get your
>>> thoughts.
>>>=20
>>>=20
>>>=20
>>> I installed the KVM-based virtual machine system manually from the =
live
>>> shell of:
>>>=20
>>> FreeBSD-12.0-CURRENT-amd64-20170301-r314495-disc1.iso
>>>=20
>>>=20
>>>=20
>>> The partitioning is very simple:
>>>=20
>>> gpart create -s gpt /dev/vtbd0
>>> gpart add -t freebsd-boot -a 8 -b 40 -s 512k vtbd0
>>> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 vtbd0
>>> gpart add -t freebsd-zfs -b 2088 vtbd0
>>>=20
>>> root# gpart show
>>> =3D>      40  83886000  vtbd0  GPT  (40G)
>>> 	  40      1024      1  freebsd-boot  (512K)
>>> 	1064      1024         - free -  (512K)
>>> 	2088  83883952      2  freebsd-zfs  (40G)
>>>=20
>>>=20
>>>=20
>>> geli was inited/attached to vtbd0p2 and the zpool was created with =
command:
>>>=20
>>> zpool create -o altroot=3D/tmp/zroot -o cachefile=3D/tmp/zpool.cache =
-O
>>> checksum=3Dskein -O compression=3Dlz4 <pool> vtbd0p2.eli
>>>=20
>>> i.e. the entire pool including bootfs is using skein for =
checksumming
>>> and lz4 for compression.
>>>=20
>>>=20
>>>=20
>>> I hit another boot bug using skein previously which Toomas (CCed) =
fixed,
>>> and am wondering if this issue might also be related to the skein
>>> implementation.
>>>=20
>>> I haven't tested if the zfsbootcfg functionality works for fear that =
the
>>> printf is indicating a low level problem with the zpool. I can test
>>> potentially destructive things and break the pool though if that =
would
>>> be helpful.
>>>=20
>>> Any thoughts?
>>>=20
>>> Cheers,
>>> Lawrence
>>=20
>>=20
>> The problem with having pool on geli encrypted partition is that all =
the reads done on such partition, gave to go through geli aware read() =
function, and the same is true for writes (which is important for =
nextboot feature). So what it means for gptzfsboot/zfsboot is that we =
would need to have the disk reads/writes go through the geli aware =
functions and we can not issue =E2=80=9Cpure=E2=80=9D disk io directly.
>=20
> [+Allan]
>=20
> Presumably that functionality exists given that the geli support Allan
> added to gptzfsboot is able to read loader and loader is able to read
> everything in /boot from the geli-encrypted ZFS pool?


The problem is deeper, the idea behind the nextboot is that it is =
attempting to provide recovery from failed boot, so if you set nextboot =
dataset, attempt to boot from it, you need to do 2 things: 1. detect the =
nextboot config, so you would actually be able to use it, and 2, you =
want to reset it as early as possible, because later you may not have a =
chance.

So it means the gptzfsboot has to read out the config to know where from =
it has to load the zfsloader, and gptzfsboot has to reset the config, so =
that if anything will go wrong, on next boot the fallback or =
=E2=80=9Cnormal=E2=80=9D boot will be done. Which means that either =
gptzfsboot has to know how to deal with geli in context of handling =
nextboot, or with geli, you just can not use nextboot config.

The similar issue is with using boot block area in zfs pool label - to =
be able to store and use gptzfsboot in pool label boot area, the boot1 =
either has to know how to read the geli, or geli must be able not to =
encrypt the bootblock area, or we just can not use that area [with =
geli]. All in all, it is another example of the chicken and the egg =
issue:)

rgds,
toomas=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?814E1C65-23E3-42A1-8093-8008DF188506>