From nobody Sun Dec 21 15:35:52 2025 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4dZ54M37Bmz6Kybl for ; Sun, 21 Dec 2025 15:37:23 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Received: from mailgate.Leidinger.net (bastille.leidinger.net [89.238.82.207]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature ECDSA (prime256v1) client-digest SHA256) (Client CN "mailgate.leidinger.net", Issuer "E7" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4dZ54K2JQxz46Fb for ; Sun, 21 Dec 2025 15:37:21 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=leidinger.net header.s=outgoing-alex header.b="o44w/tLX"; dmarc=pass (policy=quarantine) header.from=leidinger.net; spf=pass (mx1.freebsd.org: domain of Alexander@Leidinger.net designates 89.238.82.207 as permitted sender) smtp.mailfrom=Alexander@Leidinger.net List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leidinger.net; s=outgoing-alex; t=1766331397; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XgjJOxSsxgJiPKTWLc0mumBJ9ZCkzLXv2pyPW244GtA=; b=o44w/tLXkrAD5+HUp+jX2468xrmX/KjoUBvhXrQ7/g/unipgAuGNU9lHO27g+2vyRLrKGR 8uZRM0dmSg+3HFifAVq/FKybkqYCxC7w1GgSiolA0klnjix6KQrFTqvhQESl7UZfybKwu7 DsDMzNDmLw7YDlOqdkffTY7f6vY7k87/ezGevu/ww1HtjrG/NIPjXJhV6PE3LlZByIqH7R uwliSnEwS0DU9hmMuEy0rQMGUnTlE32LCv1cLdgGBjG0wWxnqYokyQRPRoHNDUrhD7OaH5 N41XfXF1S/VA4OBbMrGStETIvpkdh4YlovMpiG+ddrqZLgriF2xr27jGnuq6Ow== Date: Sun, 21 Dec 2025 16:35:52 +0100 From: Alexander Leidinger To: Warner Losh Cc: Current Subject: Re: Changes in cam/nvme causes issues? In-Reply-To: References: <198170948d34f4dc169e94934da82161@Leidinger.net> Message-ID: <89a92e0a926239e2c192dc0ff9c80d6e@Leidinger.net> Organization: No organization, this is a private message. Content-Type: multipart/signed; protocol="application/pgp-signature"; boundary="=_fe7a32b3c7bd73bc9c42e2d75a422414"; micalg=pgp-sha256 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.98 / 15.00]; SIGNED_PGP(-2.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_MEDIUM(-0.98)[-0.983]; DMARC_POLICY_ALLOW(-0.50)[leidinger.net,quarantine]; MIME_GOOD(-0.20)[multipart/signed,multipart/alternative,text/plain]; R_DKIM_ALLOW(-0.20)[leidinger.net:s=outgoing-alex]; R_SPF_ALLOW(-0.20)[+mx]; ONCE_RECEIVED(0.10)[]; MIME_TRACE(0.00)[0:+,1:+,2:+,3:~,4:~]; HAS_ORG_HEADER(0.00)[]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_TRACE(0.00)[leidinger.net:+]; RCPT_COUNT_TWO(0.00)[2]; MID_RHS_MATCH_FROM(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MLMMJ_DEST(0.00)[current@freebsd.org]; TO_DN_ALL(0.00)[]; HAS_ATTACHMENT(0.00)[] X-Rspamd-Queue-Id: 4dZ54K2JQxz46Fb This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_fe7a32b3c7bd73bc9c42e2d75a422414 Content-Type: multipart/alternative; boundary="=_a86e5e1aff604bf5c7421895ca232ac1" --=_a86e5e1aff604bf5c7421895ca232ac1 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8; format=flowed Am 2025-12-14 14:05, schrieb Warner Losh: > Let's do one issue at a time. There's too much missing info. Top > posting since there's not a lot of context to this request The disk died now completely, so the CRC errors are out of reach now. > First, let's start with pciconf -l of the nvme drive. I have a strong > idea, but need some data. While already provided privately with some other data, here for the public so that people are aware that currently there is an issue with such drives: nvme0@pci0:5:0:0: class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa809 subvendor=0x144d subdevice=0xa801 Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V Bye, Alexander. > Also, the disk report needs full logs with and without the settings > that have uncorrectable in them. I'd expect that a shorter timeout > would lead to different behavior, but maybe that error syndrome isn't > one I've seen. It would also be helpful to know which of the times > changes the behavior... > > Warner > > On Sun, Dec 14, 2025, 5:06 AM Alexander Leidinger > wrote: > >> Hi Warner, >> >> I try to update a 15-current (as of 2025-11-27-110715) to a recent 16 >> (as of 2025-12-13-132815). It fails to import a pool due to a missing >> nvme. I also have a broken HD in this system... to be on the safe side >> I >> mention it. >> >> This is from 15-current: >> ---snip--- >> NAME STATE READ WRITE CKSUM >> rpool DEGRADED 0 0 0 >> mirror-0 DEGRADED 0 0 0 >> diskid/DISK-WD-WCC4N4KLEZT7p3 ONLINE 0 0 0 >> diskid/DISK-WD-WCC4N1DF9DA2p3 ONLINE 0 0 0 >> diskid/DISK-WD-WX52D625R0NTp3 ONLINE 0 0 0 >> diskid/DISK-WD-WCC4N1PYJ3F8p3 OFFLINE 0 0 0 >> logs >> diskid/DISK-493504058890547p1 ONLINE 0 0 0 >> cache >> diskid/DISK-493504058890547p2 ONLINE 0 0 0 >> >> NAME STATE READ WRITE CKSUM >> space DEGRADED 0 0 0 >> raidz2-0 DEGRADED 0 0 0 >> diskid/DISK-WD-WCC4N4KLEZT7p4 ONLINE 0 0 0 >> diskid/DISK-WD-WCC4N1DF9DA2p4 ONLINE 0 0 0 >> diskid/DISK-WD-WX52D625R0NTp4 ONLINE 0 0 0 >> diskid/DISK-WD-WX52D625R2TPp4 ONLINE 0 0 0 >> diskid/DISK-WD-WCC4N1PYJ3F8p4 OFFLINE 0 0 0 >> logs >> diskid/DISK-S649NL0T819360Vp2 ONLINE 0 0 0 >> cache >> diskid/DISK-S649NL0T819360Vp3 ONLINE 0 0 0 >> ---snip--- >> >> The offline marked partitions are on the same HD (the broken one). The >> DISK-S649NL0T819360V device use as log and cache in the second pool >> causes the issue on 16-current. >> >> On 16-current I get "uncorrectable parity/CRC error" messages on boot >> from the broken disk. I used this to get rid of those errors: >> ---snip--- >> # grep kern.cam /tmp/be_mount.MhLw/boot/loader.conf >> kern.cam.tur_timeout="60" >> kern.cam.inquiry_timeout="60" >> kern.cam.modesense_timeout="60" >> ---snip--- >> >> But the second pool ("space") fails to get imported. When I import it >> via "zpool import -m space" it shows me that the log and cache devices >> (different partitions on the same hardware) are not available. >> This is the device in question as seen from 15-current: >> ---snip--- >> nda0: >> nda0: Serial Number S649NL0T819360V >> [1] nda0: nvme version 1.4 >> nda0: 953869MB (1953525168 512 byte sectors) >> [1] GEOM: new disk nda0 >> ... >> [1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1 >> pass6: >> pass6: Serial Number S649NL0T819360V >> [1] pass6: nvme version 1.4 >> ---snip--- >> >> In case you need some info from the 15- or 16-current BE, which info >> do >> you need? >> >> Bye, >> Alexander. >> >> -- >> http://www.Leidinger.net Alexander@Leidinger.net: PGP >> 0x8F31830F9F2772BF >> http://www.FreeBSD.org netchild@FreeBSD.org : PGP >> 0x8F31830F9F2772BF -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_a86e5e1aff604bf5c7421895ca232ac1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Am 2025-12-14 14:05, schrieb Warner Losh:

Let's do one issue at a time. There's too much missi= ng info. Top posting since there's  not a lot of context to this reque= st 
 
The disk died now completely, so the CRC errors are out o= f reach now.
 
First, let's start with pciconf -l of the nvme drive. I h= ave a strong idea, but need some data.
 
While already provided privately with some other data, he= re for the public so that people are aware that currently there is an issue= with such drives:
nvme0@pci0:5:0:0: class=3D0x010802 rev=3D0x00 hdr=3D0x00 = vendor=3D0x144d device=3D0xa809 subvendor=3D0x144d subdevice=3D0xa801
Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V
 
Bye,
Alexander.
 
Also, the disk report needs full logs with and without th= e settings that have uncorrectable in them. I'd expect that a shorter timeo= ut would lead to different behavior, but maybe that error syndrome isn't on= e I've seen. It would also be helpful to know which of the times changes th= e behavior...
 
Warner

On Sun, Dec 14, 2025, 5:06=E2=80=AF= AM Alexander Leidinger <Alexander@leidinger.net> wrote:
Hi Warner,

I try to updat= e a 15-current (as of 2025-11-27-110715) to a recent 16
(as of 2025-1= 2-13-132815). It fails to import a pool due to a missing
nvme. I also= have a broken HD in this system... to be on the safe side I
mention = it.

This is from 15-current:
---snip---
    =      NAME              &n= bsp;                STATE  &nb= sp;  READ WRITE CKSUM
         rpool&nbs= p;                     &n= bsp;       DEGRADED     0     = 0     0
           mirror= -0                    &nb= sp;    DEGRADED     0     0  &= nbsp;  0
             diskid/D= ISK-WD-WCC4N4KLEZT7p3  ONLINE       0   =  0     0
           = ;  diskid/DISK-WD-WCC4N1DF9DA2p3  ONLINE      &nbs= p;0     0     0
      &nb= sp;      diskid/DISK-WD-WX52D625R0NTp3  ONLINE  &n= bsp;    0     0     0
  &= nbsp;          diskid/DISK-WD-WCC4N1PYJ3F8p3 = OFFLINE      0     0     0         logs
        =    diskid/DISK-493504058890547p1    ONLINE   =    0     0     0
   = ;      cache
           d= iskid/DISK-493504058890547p2    ONLINE       = 0     0     0

     =    NAME                &= nbsp;              STATE    &n= bsp;READ WRITE CKSUM
         space  &nb= sp;                     &= nbsp;     DEGRADED     0     0 = ;    0
           raidz2-0&nbs= p;                     &n= bsp;  DEGRADED     0     0    =  0
             diskid/DISK-WD= -WCC4N4KLEZT7p4  ONLINE       0     = ;0     0
            &nbs= p;diskid/DISK-WD-WCC4N1DF9DA2p4  ONLINE       0&nb= sp;    0     0
        &n= bsp;    diskid/DISK-WD-WX52D625R0NTp4  ONLINE    &= nbsp;  0     0     0
    =          diskid/DISK-WD-WX52D625R2TPp4  ONLIN= E       0     0     0
             diskid/DISK-WD-WCC4N1PYJ3= F8p4  OFFLINE      0     0    =  0
         logs
    &nbs= p;      diskid/DISK-S649NL0T819360Vp2    ONLINE&nb= sp;      0     0     0
&n= bsp;        cache
        &nbs= p;  diskid/DISK-S649NL0T819360Vp3    ONLINE    &nb= sp;  0     0     0
---snip---
=
The offline marked partitions are on the same HD (the broken one). Th= e
DISK-S649NL0T819360V device use as log and cache in the second pool=
causes the issue on 16-current.

On 16-current I get "unco= rrectable parity/CRC error" messages on boot
from the broken disk. I = used this to get rid of those errors:
---snip---
# grep kern.cam = /tmp/be_mount.MhLw/boot/loader.conf
kern.cam.tur_timeout=3D"60"
k= ern.cam.inquiry_timeout=3D"60"
kern.cam.modesense_timeout=3D"60"
= ---snip---

But the second pool ("space") fails to get imported. = When I import it
via "zpool import -m space" it shows me that the log= and cache devices
(different partitions on the same hardware) are no= t available.
This is the device in question as seen from 15-current:---snip---
nda0: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V&= gt;
nda0: Serial Number S649NL0T819360V
[1] nda0: nvme version 1.= 4
nda0: 953869MB (1953525168 512 byte sectors)
[1] GEOM: new disk= nda0
...
[1] pass6 at nvme0 bus 0 scbus6 target 0 lun 1
pas= s6: <Samsung SSD 980 1TB 2B4QFXO7 S649NL0T819360V>
pass6: Serial= Number S649NL0T819360V
[1] pass6: nvme version 1.4
---snip---
In case you need some info from the 15- or 16-current BE, which in= fo do
you need?

Bye,
Alexander.

--
<= a href=3D"http://www.Leidinger.net" target=3D"_blank" rel=3D"noopener noref= errer">http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F= 9F2772BF
http://www.FreeBSD.org    netchild@FreeBS= D.org  : PGP 0x8F31830F9F2772BF


--
--=_a86e5e1aff604bf5c7421895ca232ac1-- --=_fe7a32b3c7bd73bc9c42e2d75a422414 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=833 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmlIE+cACgkQEg2wmwP4 2IZs5A/+K1+0VppUCNzQvTTDDedpRcvF1boE7/6FW5wu4xgPJbz8+uZGjlvdrbAB J/zilDG4vv3YjN4Ca8XafPHIteUu5/QHHsUwl1wgtZIUjLTMws+vHOvVaoCNOoAP ArLykHcQ9vUoWTCLLXoFt88X83fSUUFglNeyjZ9MDYL1kOzThUgfWgcxT9l0voD1 OXO9DcgBe4PqCOdVy4b/4OO/jIi429Nu0CR5R2i2mPnwRmOVbVshvXTx0mH1RCGf iRj5RzG6hXINQ7Qv5oVDfe52/pzltIQP04PYSYyOKb/9rKXq+jp/DpLL56T4lLMK r/00cQVLtFURGvQXxgquhevo3P7sWSD7DOWhN+icDrScY+/Jq0Mi//z+10wtBH4C KzZ9hWSfo735uoujnpFI78cle6BGpdUl+62kYPgaMcNW2dggqD7r0AmLMNuK+VJt 10TkQIbTCTcFO+sFkHA9ts0tdT1guBOQNllUgXdgOteKHCbwtunwMUF9fJdvZZe8 hcIhQ79ofKfyVv7RZjH2YouRCRlW3cKRlDtJ7nii6ShmCJrFwi+vhGohcxWjzLyX r67d/e1m5oOwAQT9Z1YX9CrdC6mD0qthCuJXQfj4MysMvkWnn/FUkAZX9iQxqjyw jHXNI8dDv0Cbkxb7BVKSCxskugP6BDmeAtEmKMVGAXHM9HQP6hQ= =IAsv -----END PGP SIGNATURE----- --=_fe7a32b3c7bd73bc9c42e2d75a422414--