Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Apr 2023 18:59:03 +0200
From:      =?UTF-8?Q?Jos=C3=A9_P=C3=A9rez?= <fbl@aoek.com>
To:        Pawel Jakub Dawidek <pjd@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <4ab79579555b34317e9210d5e9f52832@mail.yourbox.net>
In-Reply-To: <c0301648-cb54-a0f8-3297-8c4a98ea9111@dawidek.net>
References:  <20230413071032.18BFF31F@slippy.cwsent.com> <20230413135635.6B62F354@slippy.cwsent.com> <c41f9ed6-e557-9255-5a46-1a22d4b32d66@dawidek.net> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <b60807e9-f393-6e6d-3336-042652ddd03c@freebsd.org> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <ABC9F3DB-289E-455E-AF43-B3C13525CB2C@yahoo.com> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de> <6792aded-6e2e-a118-259d-0df0f80c361c@smeets.xyz> <80ea8a67-9b64-c723-6d97-21cfa127ae43@dawidek.net> <b3d8b8f7a35312b1211b76b111c01242@mail.yourbox.net> <01430095-33a3-a949-3772-2ec90b4c3fe6@dawidek.net> <0164e42a-e7cd-a1e8-295c-21f414edf67b@dawidek.net> <a45ea1f22a59a88e65790b81ebce9c73@mail.yourbox.net> <c0301648-cb54-a0f8-3297-8c4a98ea9111@dawidek.net>

next in thread | previous in thread | raw e-mail | index | archive | help
El 2023-04-17 21:59, Pawel Jakub Dawidek escribió:
> José,
> 
> I can only speak of block cloning in details, but I'll try to address
> everything.
> 
> The easiest way to avoid block_cloning-related corruption on the
> kernel after the last OpenZFS merge, but before e0bb199925 is to set
> the compress property to 'off' and the sync property to something
> other than 'disabled'. This will avoid the block_cloning-related
> corruption and zil_replaying() panic.
> 
> As for the other corruption, unfortunately I don't know the details,
> but my understanding is that it is happening under higher load. Not
> sure I'd trust a kernel built on a machine with this bug present. What
> I would do is to compile the kernel as of 068913e4ba somewhere else,
> boot the problematic machine in single-user mode and install the newly
> built kernel.
> 
> As far as I can tell, contrary to some initial reports, none of the
> problems introduced by the recent OpenZFS merge corrupt the pool
> metadata, only file's data. You can locate the files modified with the
> bogus kernel using find(1) with a proper modification time, but you
> have to decide what to do with them (either throw them away, restore
> them from backup or inspect them).

Sharing my experience on how to get out of the worst case scenario with 
a building machine that is affected by the bug.

CAVEAT: this is my experience, take it at your own risk. It worked for 
me, there is no guarantee that it will work for your. You may create 
corrupted files and make your system harder to recover or definitely 
brick it. Don't blame me, you have been warned. YMMV.

Boot in single user mode and check if your pool has block cloning in 
use:
# zpool get feature@block_cloning zroot
NAME     PROPERTY               VALUE                  SOURCE
zroot    feature@block_cloning  active                 local

In this case it does because the value is "active". If it's "enabled" 
you do not need to do anything.

1) When in single user mode set compression property to "off" on any zfs 
active dataset that has compression other than "off" and the sync 
property to something other than "disabled".
2) Boot multiuser and update your current sources, e.g.
    git update --rebase
3) Build and install a new kernel without too much pressure (e.g. with 
-j 1):
    make -j 1 kernel
4) Reboot with the new kernel
5) Now you have to reinstall the kernel with
    make installkernel
    This is because the new kernel files were written by the old kernel 
and need to be removed.
6) Find out when the pool was upgraded (I used command history) and 
create a file with that date, in my case:
    touch -t 2304161957 /tmp/from
7) Find out when you booted the new kernel (I used fgrep Copyright 
/var/log/messages | tail -n 1) and create a file with that date, in my 
case:
    touch -t 2304172142 /tmp/to
8) Find the files/firs created between the two dates:
    find / -newerBm /tmp/from -and -not -newerBm /tmp/to > 
/tmp/filelist.txt
9) Inspect /tmp/filelist.txt and save any important items. If the 
important files are not corrupted you can do:
    cp important_file new; mv new important_file
    NOTA BENE: "touch important_file" would not work, you do need to 
re-create the file.
10) Delete the remaining files/dirs in /tmp/filelist.txt. If you did 5) 
you will remove /boot/kernel.old files, but not /boot/kernel files.
11) Restore your compression and sync properties where appropiate.

BR,

-- 
José Pérez



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ab79579555b34317e9210d5e9f52832>