From nobody Wed Nov 20 08:15:55 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XtZ264HWCz5dRBs for ; Wed, 20 Nov 2024 08:16:14 +0000 (UTC) (envelope-from pen@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4XtZ254Rpfz4VQ1 for ; Wed, 20 Nov 2024 08:16:13 +0000 (UTC) (envelope-from pen@lysator.liu.se) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of pen@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=pen@lysator.liu.se; dmarc=pass (policy=none) header.from=lysator.liu.se Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 789ABD944 for ; Wed, 20 Nov 2024 09:16:10 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 6C6EBD865; Wed, 20 Nov 2024 09:16:10 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,AWL,HTML_MESSAGE, T_FILL_THIS_FORM_SHORT,T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.0 Received: from smtpclient.apple (unknown [IPv6:2001:6b0:17:f002:1000::897]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 6D946D991 for ; Wed, 20 Nov 2024 09:16:05 +0100 (CET) From: Peter Eriksson Content-Type: multipart/alternative; boundary="Apple-Mail=_30ABFAAF-129A-4792-A3F9-88E6B923AB8E" List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51.11.1\)) Subject: Re: zfs snapshot corruption when using encryption Date: Wed, 20 Nov 2024 09:15:55 +0100 References: <03E4CCF5-0F9A-4B0E-A9DA-81C7C677860C@FreeBSD.org> <3E85AAAE-8B1E-47C7-B581-E3D98AB03907@FreeBSD.org> <6F51D3D6-D9A3-46D8-94F2-535262F43EF4@FreeBSD.org> To: FreeBSD FS In-Reply-To: <6F51D3D6-D9A3-46D8-94F2-535262F43EF4@FreeBSD.org> Message-Id: X-Mailer: Apple Mail (2.3776.700.51.11.1) X-Virus-Scanned: ClamAV using ClamSMTP X-Spamd-Result: default: False [-3.47 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.995]; NEURAL_HAM_MEDIUM(-0.98)[-0.977]; DMARC_POLICY_ALLOW(-0.50)[lysator.liu.se,none]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; RCVD_IN_DNSWL_MED(-0.20)[130.236.254.3:from]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_TLS_LAST(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_THREE(0.00)[3]; MID_RHS_MATCH_FROM(0.00)[]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_ALL(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4XtZ254Rpfz4VQ1 X-Spamd-Bar: --- --Apple-Mail=_30ABFAAF-129A-4792-A3F9-88E6B923AB8E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I=E2=80=99m seeing something similar on one of our systems - the one = system where I=E2=80=99ve just now started trying to use ZFS native = encryption. Setup: FreeBSD 13.4-RELEASE-p1, 512GB RAM 3 Zpools:=20 zroot - mirror of two SSD drives ENCRYPTED - ZFS over GELI-encrypted SAS 10TB drives SEKUR01D1 - ZFS over SAS 18TB drives with ZFS encryption enabled for = individual filesystems - ZFS snapshots are taken every hour of the ENCRYPTED zpool. - zfs send is being done on some filesystem on the ENCRYPTED spool - A big =E2=80=9Ccp -a=E2=80=9D (about 70TB of files) of data from zfs = filesystems in ENCRYPTED to SEKUR01D1 filesystems is running. CKSUM errors pop up in zroot! Fixed some errors yesterday, ran =E2=80=98zpool scrub=E2=80=99 & = =E2=80=98zpool clear=E2=80=99 and got a clean bill of health: # zpool status -v zroot pool: zroot state: ONLINE scan: scrub repaired 0B in 00:07:15 with 0 errors on Tue Nov 19 = 21:46:36 2024 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 diskid/DISK-PHDW817002MK150Ap4 ONLINE 0 0 0 errors: No known data errors This morning: # zpool scrub zroot # zpool status -v zroot pool: zroot state: ONLINE scan: scrub in progress since Wed Nov 20 08:11:31 2024 19.4G scanned at 6.48G/s, 772K issued at 257K/s, 49.3G total 0B repaired, 0.00% done, no estimated completion time config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 diskid/DISK-PHDW817002MK150Ap4 ONLINE 0 0 0 errors: No known data errors # zpool status -v zroot pool: zroot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:07:20 with 1 errors on Wed Nov 20 = 08:18:51 2024 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p4 ONLINE 0 0 2 diskid/DISK-PHDW817002MK150Ap4 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /var/audit/20241119235400.20241120000543 Snapshots & zfs send only being done on the =E2=80=9CENCRYPTED=E2=80=9D = zpool, not on =E2=80=9Czroot=E2=80=9D or =E2=80=9CSEKUR01D1=E2=80=9D. Ie = not on the zpool with the Zfs-native-encrypted filesystems. Not 100% sure it is related but something is fishy. This is a server = that has been running fine with GELI-encrypted disks for many years = now=E2=80=A6=20 - Peter > On 9 Nov 2024, at 15:53, Palle Girgensohn wrote: >=20 >=20 >=20 >> 9 nov. 2024 kl. 02:59 skrev void : >>=20 >> % zfs version >=20 > Ah, of course. >=20 > $ zfs version=20 > zfs-2.2.4-FreeBSD_g256659204 > zfs-kmod-2.2.4-FreeBSD_g256659204 >=20 > Palle >=20 --Apple-Mail=_30ABFAAF-129A-4792-A3F9-88E6B923AB8E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 I=E2=80=99m = seeing something similar on one of our systems - the one system where = I=E2=80=99ve just now started trying to use ZFS native = encryption.

Setup:
FreeBSD 13.4-RELEASE-p1, = 512GB RAM

3 Zpools: 
  = zroot - mirror of two SSD drives
  ENCRYPTED - ZFS over = GELI-encrypted SAS 10TB drives
  SEKUR01D1 - ZFS over SAS = 18TB drives with ZFS encryption enabled for individual = filesystems

- ZFS snapshots are taken every = hour of the ENCRYPTED zpool.
- zfs send is being done on some = filesystem on the ENCRYPTED spool
- A big =E2=80=9Ccp -a=E2=80=9D= (about 70TB of files) of data from zfs filesystems in ENCRYPTED to = SEKUR01D1 filesystems is running.

CKSUM errors = pop up in zroot!

Fixed some errors yesterday, = ran =E2=80=98zpool scrub=E2=80=99 & =E2=80=98zpool clear=E2=80=99 = and got a clean bill of health:

# zpool = status -v zroot
  pool: zroot
 state: = ONLINE
  scan: scrub repaired 0B in 00:07:15 with 0 = errors on Tue Nov 19 21:46:36 = 2024
config:

NAME =                     =            STATE     READ WRITE = CKSUM
zroot               =                 ONLINE   =     0     0     0
=   mirror-0             =              ONLINE     =   0     0     0
=     ada0p4           =                ONLINE   =     0     0     0
=     diskid/DISK-PHDW817002MK150Ap4  ONLINE =       0     0     = 0

errors: No known data = errors

This morning:

# = zpool scrub zroot

# zpool status -v = zroot
  pool: zroot
 state: = ONLINE
  scan: scrub in progress since Wed Nov 20 = 08:11:31 2024
19.4G scanned at 6.48G/s, 772K = issued at 257K/s, 49.3G total
0B repaired, 0.00% done, no = estimated completion = time
config:

NAME =                     =            STATE     READ WRITE = CKSUM
zroot               =                 ONLINE   =     0     0     0
=   mirror-0             =              ONLINE     =   0     0     0
=     ada0p4           =                ONLINE   =     0     0     0
=     diskid/DISK-PHDW817002MK150Ap4  ONLINE =       0     0     = 0

errors: No known data = errors

# zpool status -v zroot
  = pool: zroot
 state: ONLINE
status: One or more = devices has experienced an error resulting in data
= corruption.  Applications may be affected.
action: = Restore the file in question if possible.  Otherwise restore = the
entire pool from backup.
   see: = https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  = scan: scrub repaired 0B in 00:07:20 with 1 errors on Wed Nov 20 08:18:51 = 2024
config:

NAME =                     =            STATE     READ WRITE = CKSUM
zroot               =                 ONLINE   =     0     0     0
=   mirror-0             =              ONLINE     =   0     0     0
=     ada0p4           =                ONLINE   =     0     0     2
=     diskid/DISK-PHDW817002MK150Ap4  ONLINE =       0     0     = 2

errors: Permanent errors have been detected = in the following files:

      =   = /var/audit/20241119235400.20241120000543


Snapshots & zfs send only being done on the =E2=80=9CENCRYPTED=E2= =80=9D zpool, not on =E2=80=9Czroot=E2=80=9D or =E2=80=9CSEKUR01D1=E2=80=9D= . Ie not on the zpool with the
Zfs-native-encrypted = filesystems.

Not 100% sure it is related but = something is fishy. This is a server that has been running fine with = GELI-encrypted disks for many years = now=E2=80=A6 

- = Peter

On 9 = Nov 2024, at 15:53, Palle Girgensohn <girgen@FreeBSD.org> = wrote:



9 nov. 2024 kl. 02:59 skrev void = <void@f-m.fm>:

% zfs = version

Ah, of = course.

$ zfs = version 
zfs-2.2.4-FreeBSD_g256659204
zfs-kmod-2.= 2.4-FreeBSD_g256659204

Palle

=

= --Apple-Mail=_30ABFAAF-129A-4792-A3F9-88E6B923AB8E--