Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Apr 2023 12:59:14 -0700
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        =?UTF-8?B?Sm9zw6kgUMOpcmV6?= <fbl@aoek.com>
Cc:        freebsd-current@freebsd.org
Subject:   Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Message-ID:  <c0301648-cb54-a0f8-3297-8c4a98ea9111@dawidek.net>
In-Reply-To: <a45ea1f22a59a88e65790b81ebce9c73@mail.yourbox.net>
References:  <20230413071032.18BFF31F@slippy.cwsent.com> <20230413135635.6B62F354@slippy.cwsent.com> <c41f9ed6-e557-9255-5a46-1a22d4b32d66@dawidek.net> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <b60807e9-f393-6e6d-3336-042652ddd03c@freebsd.org> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <ABC9F3DB-289E-455E-AF43-B3C13525CB2C@yahoo.com> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de> <6792aded-6e2e-a118-259d-0df0f80c361c@smeets.xyz> <80ea8a67-9b64-c723-6d97-21cfa127ae43@dawidek.net> <b3d8b8f7a35312b1211b76b111c01242@mail.yourbox.net> <01430095-33a3-a949-3772-2ec90b4c3fe6@dawidek.net> <0164e42a-e7cd-a1e8-295c-21f414edf67b@dawidek.net> <a45ea1f22a59a88e65790b81ebce9c73@mail.yourbox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4/17/23 21:28, José Pérez wrote:
> Hi Pawel,
> thank you for your reply and for the fixes.
> 
> I think there is a 4th issue that needs to be addressed: how do we 
> recover from the worst case scenario which is a machine with a kernel > 
> 2a58b312b62f and ZFS root upgraded with block cloning enabled.
> 
> In particular, is it safe to turn such a machine on in the first place, 
> and what are the risks involved in doing so? Any potential data loss?
> 
> Would such a machine be able to fix itself by compiling a kernel, or 
> would compilation fail and might data be corrupted in the process?
> 
> I have two poudriere builders powered off (I am not alone in this 
> situation) and I need to recover them, ideally minimizing data loss. The 
> builders are also hosting current and used to build kernels and worlds 
> for 13 and current: as of now all my production machines are stuck on 
> the 13 they run, I cannot update binaries nor packages and I would like 
> to be back online.

José,

I can only speak of block cloning in details, but I'll try to address 
everything.

The easiest way to avoid block_cloning-related corruption on the kernel 
after the last OpenZFS merge, but before e0bb199925 is to set the 
compress property to 'off' and the sync property to something other than 
'disabled'. This will avoid the block_cloning-related corruption and 
zil_replaying() panic.

As for the other corruption, unfortunately I don't know the details, but 
my understanding is that it is happening under higher load. Not sure I'd 
trust a kernel built on a machine with this bug present. What I would do 
is to compile the kernel as of 068913e4ba somewhere else, boot the 
problematic machine in single-user mode and install the newly built kernel.

As far as I can tell, contrary to some initial reports, none of the 
problems introduced by the recent OpenZFS merge corrupt the pool 
metadata, only file's data. You can locate the files modified with the 
bogus kernel using find(1) with a proper modification time, but you have 
to decide what to do with them (either throw them away, restore them 
from backup or inspect them).

-- 
Pawel Jakub Dawidek




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c0301648-cb54-a0f8-3297-8c4a98ea9111>