From nobody Sat Mar 26 16:45:57 2022
X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4B2411A31731
	for <freebsd-questions@mlmmj.nyi.freebsd.org>; Sat, 26 Mar 2022 16:46:09 +0000 (UTC)
	(envelope-from bram@diomedia.be)
Received: from ebifccidjbei.ams03.turbo-smtp.net (ebifccidjbei.ams03.turbo-smtp.net [185.228.39.148])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "", Issuer "Internet Widgits Pty Ltd" (not verified))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4KQlG71rSCz4Y9W
	for <freebsd-questions@freebsd.org>; Sat, 26 Mar 2022 16:46:06 +0000 (UTC)
	(envelope-from bram@diomedia.be)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=diomedia.be; s=turbo-smtp; x=1648917967; h=Received:Received:
	From:To:Subject:Date:Message-ID:Reply-To:User-Agent:MIME-Version:
	Content-Type:Feedback-Id; bh=VkFNYtz5Fknc1YiZFMk4wZc1sRq8bvtJ7rq
	YfEgTwNQ=; b=RnhPI60lAYIeZ2kLVCTeZfY7b9lLXgk0BB00YGj9Fsj5WyWhnc1
	xjC7k/r+mqF/M4+9A4c4B7iCXzxJIVr/oadZUHQOSQBeebnVy8N141fvWigiFXaw
	JZaAnJl65P59GoA/PGobwKCvPfTMPGNsrgp37gPycjQTbOSrbMI3Y/lg=
Received: (qmail 2435557 invoked from network); 26 Mar 2022 16:45:59 -0000
Received: from ?UNAVAILABLE? (HELO ?192.168.3.215?) (authenticated@81.82.228.129)  by turbo-smtp.com with SMTP; 26 Mar 2022 16:45:58 -0000
X-TurboSMTP-Tracking: 64-0138fe03-00001677fc1594412000-000-35f4f8
From: "Bram Van Steenlandt" <bram@diomedia.be>
To: "Freebsd Questions" <freebsd-questions@freebsd.org>
Subject: zfs mirror pool online but drives have read errors
Date: Sat, 26 Mar 2022 16:45:57 +0000
Message-ID: <emf36013e4-0469-47cd-a99d-d06600df1565@winserver>
Reply-To: "Bram Van Steenlandt" <bram@diomedia.be>
User-Agent: eM_Client/8.2.1659.0
List-Id: User questions <freebsd-questions.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-questions
List-Help: <mailto:questions+help@freebsd.org>
List-Post: <mailto:questions@freebsd.org>
List-Subscribe: <mailto:questions+subscribe@freebsd.org>
List-Unsubscribe: <mailto:questions+unsubscribe@freebsd.org>
Sender: owner-freebsd-questions@freebsd.org
X-BeenThere: freebsd-questions@freebsd.org
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="------=_MB21C53DF8-236F-4888-B964-B1E94D40931A"
Feedback-Id: 20512259
X-Rspamd-Queue-Id: 4KQlG71rSCz4Y9W
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=diomedia.be header.s=turbo-smtp header.b=RnhPI60l;
	dmarc=pass (policy=quarantine) header.from=diomedia.be;
	spf=pass (mx1.freebsd.org: domain of bram@diomedia.be designates 185.228.39.148 as permitted sender) smtp.mailfrom=bram@diomedia.be
X-Spamd-Result: default: False [-3.50 / 15.00];
	 HAS_REPLYTO(0.00)[bram@diomedia.be];
	 ARC_NA(0.00)[];
	 R_DKIM_ALLOW(-0.20)[diomedia.be:s=turbo-smtp];
	 REPLYTO_EQ_FROM(0.00)[];
	 FROM_HAS_DN(0.00)[];
	 TO_MATCH_ENVRCPT_ALL(0.00)[];
	 R_SPF_ALLOW(-0.20)[+ip4:185.228.36.0/22];
	 MIME_GOOD(-0.10)[multipart/alternative,text/plain];
	 MID_RHS_NOT_FQDN(0.50)[];
	 NEURAL_HAM_LONG(-1.00)[-1.000];
	 RCPT_COUNT_ONE(0.00)[1];
	 NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	 TO_DN_ALL(0.00)[];
	 DKIM_TRACE(0.00)[diomedia.be:+];
	 DMARC_POLICY_ALLOW(-0.50)[diomedia.be,quarantine];
	 NEURAL_HAM_SHORT(-1.00)[-1.000];
	 MLMMJ_DEST(0.00)[freebsd-questions];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+,1:+,2:~];
	 RCVD_TLS_LAST(0.00)[];
	 ASN(0.00)[asn:36351, ipnet:185.228.36.0/22, country:US];
	 RCVD_COUNT_TWO(0.00)[2]
X-ThisMailContainsUnwantedMimeParts: N

This is a MIME-formatted message.  If you see this text it means that your
E-mail software does not support MIME-formatted messages.

--------=_MB21C53DF8-236F-4888-B964-B1E94D40931A
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Hi all,

English is not my native language,sorry about any errors

I'm experiencing something which I don't fully understand, maybe someone=20
here can offer some insight.

I have a zfs mirror of 2 Samsung 980 pro 2TB nvme drives, according to=20
zfs the pool is online,
It did repair 54M on the last scrub, I did another scrub today and again=20
repairs are needed (only 128K this time).

   pool: zextra
  state: ONLINE
   scan: scrub repaired 54M in 0 days 00:41:42 with 0 errors on Thu Mar=20
24 09:44:02 2022
config:

         NAME        STATE     READ WRITE CKSUM
         zextra      ONLINE       0     0     0
           mirror-0  ONLINE       0     0     0
             nvd2    ONLINE       0     0     0
             nvd3    ONLINE       0     0     0

errors: No known data errors

In dmesg I have messages like this:
nvme2: UNRECOVERED READ ERROR (02/81) sqid:3 cid:80 cdw0:0
nvme2: READ sqid:8 cid:119 nsid:1 lba:3831589512 len:256
nvme2: UNRECOVERED READ ERROR (02/81) sqid:8 cid:119 cdw0:0
nvme2: READ sqid:2 cid:123 nsid:1 lba:186822304 len:256
nvme2: UNRECOVERED READ ERROR (02/81) sqid:2 cid:123 cdw0:0
nvme2: READ sqid:5 cid:97 nsid:1 lba:186822560 len:256
also for the other drive:
nvme3: READ sqid:7 cid:84 nsid:1 lba:1543829024 len:256
nvme3: UNRECOVERED READ ERROR (02/81) sqid:7 cid:84 cdw0:0

smartctl does see the errors (but still says SMART overall-health=20
self-assessment test result: PASSED ):
Media and Data Integrity Errors:    190
Error Information Log Entries:      190
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
   0        190     1  0x006e  0xc502  0x000   3649951416     1     -
   1        189     6  0x0067  0xc502  0x000   2909882960     1     -

and for the other drive:
Media and Data Integrity Errors:    284
Error Information Log Entries:      284

Is the following thinking somewhat correct ?
-zfs doesn't remove the drives because it has no write errors and I've=20
been lucky so far in that read errors were repairable.
-Both drives are unreliable, if it was a hardware (both sit on a pcie=20
card, not the motherboard) or software problem elsewhere smartctl would=20
not find these errors in the drive logs.

I'll replace one drive and see if any of the errors go away for that=20
drive, If this works I'll replace the other one as well, I have this=20
same setup on another machine, this one is error free.
Could more expensive ssd's made a difference here ? according to=20
smartctl I've now written 50TB, these drives should be good for 1200TBW

I backup the drives by making a snapshot and then using "zfs send >=20
imgfile" to a hard drive, what would have have happened here if more and=20
more read errors would occur ?
I may change this to a separate imgfile for the even and uneven days, or=20
even one for every day of the week if I have enough room for that.

thx for any input
Bram


--------=_MB21C53DF8-236F-4888-B964-B1E94D40931A
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head>

<style id=3D"css_styles">=20
blockquote.cite { margin-left: 5px; margin-right: 0px; padding-left: 10px; =
padding-right:0px; border-left: 1px solid #cccccc }
blockquote.cite2 {margin-left: 5px; margin-right: 0px; padding-left: 10px; =
padding-right:0px; border-left: 1px solid #cccccc; margin-top: 3px; padding=
-top: 0px; }
a img { border: 0px; }
li[style=3D'text-align: center;'], li[style=3D'text-align: center; '], li[s=
tyle=3D'text-align: right;'], li[style=3D'text-align: right; '] {  list-sty=
le-position: inside;}
body { font-family: Segoe UI; font-size: 12pt;   }=20
.quote { margin-left: 1em; margin-right: 1em; border-left: 5px #ebebeb soli=
d; padding-left: 0.3em; }
 </style>
</head>
<body>Hi all,<div><br /></div><div>English is not my native language,sorry =
about any errors<br /><div><br /></div><div>I'm experiencing something whic=
h I don't fully understand, maybe someone here can offer some insight.</div=
><div><br /></div><div>I have a zfs mirror of 2 Samsung 980 pro 2TB nvme dr=
ives, according to zfs the pool is online,</div><div>It did repair 54M on t=
he last scrub, I did another scrub today and again repairs are needed (only=
 128K this time).</div><div><br /></div><div>=C2=A0 pool: zextra
</div><div>=C2=A0state: ONLINE
</div><div>=C2=A0 scan: scrub repaired 54M in 0 days 00:41:42 with 0 errors=
 on Thu Mar 24 09:44:02 2022
</div><div>config:
</div><div><br /></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 NAME=C2=A0 =C2=A0=
=C2=A0 =C2=A0=C2=A0 STATE=C2=A0 =C2=A0=C2=A0 READ WRITE CKSUM
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 zextra=C2=A0 =C2=A0=C2=A0 =C2=A0ONLI=
NE=C2=A0 =C2=A0=C2=A0 =C2=A0 0=C2=A0 =C2=A0=C2=A0 0=C2=A0 =C2=A0=C2=A0 0
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mirror-0=C2=A0 ONLINE=C2=A0 =
=C2=A0=C2=A0 =C2=A0 0=C2=A0 =C2=A0=C2=A0 0=C2=A0 =C2=A0=C2=A0 0
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nvd2=C2=A0 =C2=A0 ONLI=
NE=C2=A0 =C2=A0=C2=A0 =C2=A0 0=C2=A0 =C2=A0=C2=A0 0=C2=A0 =C2=A0=C2=A0 0
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 nvd3=C2=A0 =C2=A0 ONLI=
NE=C2=A0 =C2=A0=C2=A0 =C2=A0 0=C2=A0 =C2=A0=C2=A0 0=C2=A0 =C2=A0=C2=A0 0
</div><div><br /></div><div>errors: No known data errors
</div><div><br /></div><div>In dmesg I have messages like this:</div><div>n=
vme2: UNRECOVERED READ ERROR (02/81) sqid:3 cid:80 cdw0:0
<div>nvme2: READ sqid:8 cid:119 nsid:1 lba:3831589512 len:256
</div><div>nvme2: UNRECOVERED READ ERROR (02/81) sqid:8 cid:119 cdw0:0
</div><div>nvme2: READ sqid:2 cid:123 nsid:1 lba:186822304 len:256
</div><div>nvme2: UNRECOVERED READ ERROR (02/81) sqid:2 cid:123 cdw0:0
</div><div>nvme2: READ sqid:5 cid:97 nsid:1 lba:186822560 len:256</div></di=
v><div>also for the other drive:</div><div>nvme3: READ sqid:7 cid:84 nsid:1=
 lba:1543829024 len:256
</div><div>nvme3: UNRECOVERED READ ERROR (02/81) sqid:7 cid:84 cdw0:0
</div><div><br /></div><div>smartctl does see the errors (but still says SM=
ART overall-health self-assessment test result: PASSED
<span>):</span></div><div>Media and Data Integrity Errors:=C2=A0 =C2=A0 190
</div><div>Error Information Log Entries:=C2=A0 =C2=A0=C2=A0 =C2=A0190
</div><div>Error Information (NVMe Log 0x01, 16 of 64 entries)
</div><div>Num=C2=A0 =C2=A0ErrCount=C2=A0 SQId=C2=A0 =C2=A0CmdId=C2=A0 Stat=
us=C2=A0 PELoc=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 =C2=A0 LBA=C2=A0 NSID=C2=A0 =
=C2=A0 VS
</div><div>=C2=A0 0=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 190=C2=A0 =C2=A0=C2=A0 =
1=C2=A0 0x006e=C2=A0 0xc502=C2=A0 0x000=C2=A0 =C2=A03649951416=C2=A0 =C2=A0=
=C2=A0 1=C2=A0 =C2=A0=C2=A0 -
</div><div>=C2=A0 1=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 189=C2=A0 =C2=A0=C2=A0 =
6=C2=A0 0x0067=C2=A0 0xc502=C2=A0 0x000=C2=A0 =C2=A02909882960=C2=A0 =C2=A0=
=C2=A0 1=C2=A0 =C2=A0=C2=A0 -
</div><div><br /></div><div>and for the other drive:</div><div>Media and Da=
ta Integrity Errors:=C2=A0 =C2=A0 284
</div><div>Error Information Log Entries:=C2=A0 =C2=A0=C2=A0 =C2=A0284
</div><div><br /></div><div>Is the following thinking somewhat correct ?</d=
iv><div>-zfs doesn't remove the drives because it has no write errors and I=
've been lucky so far in that read errors were repairable.</div><div>-Both =
drives are unreliable, if it was a hardware<span>=C2=A0(both sit on a pcie =
card, not the motherboard)</span><span>=C2=A0or software problem elsewhere =
smartctl would not find these errors in the drive logs.</span></div><div><s=
pan><br /></span></div><div><span>I'll replace one drive and see if any of =
the errors go away for that drive, If this works I'll replace the other one=
 as well, I have this same setup=C2=A0on another machine, this one is error=
 free.</span></div><div><span>Could more expensive ssd's made a difference =
here ? according to smartctl I've now written 50TB, these drives should be =
good for 1200TBW</span></div><div><span><br /></span></div><div><span>I bac=
kup the drives by making a snapshot and then using "zfs send &gt; imgfile" =
to a hard drive, what would have have happened here if more and more read e=
rrors would occur ?</span></div><div><span>I may change this to a separate =
imgfile for the even and uneven days, or even one for every day of the week=
 if I have enough room for that.</span></div><div><br /></div><div><span>th=
x for any input</span></div><div><span>Bram</span></div><div><br /></div><d=
iv><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br =
/></div><div><br /></div></div></body></html><img height=3D"1" width=3D"1" =
alt=3D"" border=3D"0" src=3D"http://hmjgz.serversmtpgold.com/tracking/qaR9Z=
GLkBGNmBGp2ZQL3AQp5ZGDlAPj.gif">
--------=_MB21C53DF8-236F-4888-B964-B1E94D40931A--