From nobody Sun Jul 16 21:00:36 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R3yKd4n7Tz4n4NX
	for <scsi@mlmmj.nyi.freebsd.org>; Sun, 16 Jul 2023 21:00:37 +0000 (UTC)
	(envelope-from bugzilla-noreply@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R3yKc5tDrz3t0b
	for <scsi@FreeBSD.org>; Sun, 16 Jul 2023 21:00:36 +0000 (UTC)
	(envelope-from bugzilla-noreply@FreeBSD.org)
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org;
	s=dkim; t=1689541236;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding;
	bh=+iodUoOckteWkjTmXEoOAWBrTqThfGVkrdd8TLntH0s=;
	b=plpnKocbXgqF+4HndbWrk0LGAVIQbXyrF0Lrzf7STOac1HgiGhm6yryvqk+hOhm8OWHo+o
	bWx2M6JYbLZMCHL8hACvHL2afyztMa/KfUX6KUXev4vLmGE3NIYcnsVs1aPo+10sLFMlv8
	MKcuOLmaZ2WL21TCx1JQjztEeg/6EaoHKq70TT1EqjMuOgIzNF2mRXp1Y0UpGxUdCOW+MF
	5DbXX8EKny7PoPE/GmH1PJC4r6aMkzAOfRFmkYys5QX5rDkp//M+wDKINthQeGY3EpQ6KP
	EAaEXuGaBFdTcH60PUYp+otIcm00ks2Izi0ZiLULjCj66Wdlbdg/UcN83r4HWA==
ARC-Authentication-Results: i=1;
	mx1.freebsd.org;
	none
ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1689541236; a=rsa-sha256; cv=none;
	b=P6r1RJUvS40Jls+/kVPtOF76tqExBJ4E1ZLmbvqjQ1d/E1JSARIxu+xFJguQEpMaOAmUSF
	00RSlF9H9SvKMSW1wDPsSri0fWqewVbYnCYmAB2IYr+pD7aA6Rkoxa8UZezyrz8Y5xCqA5
	pbdn9Ynnh61tyXfrhBD7Y6fCIOzOuqcu3uvqID87ubPRoonAkGjC+9dVfUDUH+vYHhJ7bF
	KYjEXhx3z6FjV//iJksxHUnO9GapbabKJymMaogj/UfLAVGQjuDEDHEzIWeaow0iPvlioT
	VTxPMnbozCYFk55Xy87Obw2tTUT+WGVfTnK/Gpp2W5dc1B5I9DAXx7mcvK3UQQ==
Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4R3yKc4xtszxRh
	for <scsi@FreeBSD.org>; Sun, 16 Jul 2023 21:00:36 +0000 (UTC)
	(envelope-from bugzilla-noreply@FreeBSD.org)
Received: from kenobi.freebsd.org ([127.0.1.5])
	by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 36GL0aoE070891
	for <scsi@FreeBSD.org>; Sun, 16 Jul 2023 21:00:36 GMT
	(envelope-from bugzilla-noreply@FreeBSD.org)
Received: (from bugzilla@localhost)
	by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 36GL0aD5070890
	for scsi@FreeBSD.org; Sun, 16 Jul 2023 21:00:36 GMT
	(envelope-from bugzilla-noreply@FreeBSD.org)
Message-Id: <202307162100.36GL0aD5070890@kenobi.freebsd.org>
X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f
From: bugzilla-noreply@FreeBSD.org
To: scsi@FreeBSD.org
Subject: Problem reports for scsi@FreeBSD.org that need special attention
Date: Sun, 16 Jul 2023 21:00:36 +0000
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="16895412364.F8686e.67715"
Content-Transfer-Encoding: 7bit
X-ThisMailContainsUnwantedMimeParts: N


--16895412364.F8686e.67715
Date: Sun, 16 Jul 2023 21:00:36 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"

To view an individual PR, use:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id).

The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
Open        |    221952 | cam iosched: Fix trim statistics                  

1 problems total for which you should take action.

--16895412364.F8686e.67715
Date: Sun, 16 Jul 2023 21:00:36 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"

<pre style="font-family: monospace;">
The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
Open        |    <a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221952">221952</a> | <a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221952">cam iosched: Fix trim statistics</a>

1 problems total for which you should take action.
</pre>
--16895412364.F8686e.67715--

From nobody Wed Jul 19 15:41:37 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R5g6Q0WJcz4dTBQ
	for <scsi@mlmmj.nyi.freebsd.org>; Wed, 19 Jul 2023 15:41:50 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R5g6N4PFKz438L
	for <scsi@freebsd.org>; Wed, 19 Jul 2023 15:41:48 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=bsdimp-com.20221208.gappssmtp.com header.s=20221208 header.b=O8WN1ow0;
	spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2a00:1450:4864:20::52f) smtp.mailfrom=wlosh@bsdimp.com;
	dmarc=none
Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-51a52a7d859so2042558a12.0
        for <scsi@freebsd.org>; Wed, 19 Jul 2023 08:41:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1689781306; x=1692373306;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=+vilaCcfpgA2rPjdFz2oVChGSsfdXnUn+suiTIVewmg=;
        b=O8WN1ow0CtXWCXuq7HapWhjMI8r5ezdBbRwp0sYiNBi8TjG/Q2Ay8OwoYrhXAs2xqV
         bxrjLJxKbUPJnDR49qz8QBSUr1+t3D5vsJq6oZxLrGP73wHfHBAWqb0WOa7/PD8PU5ym
         1DUzq0Q9Qk1/CaWJpNPaqFz0dD0xBp80kyeiShf17ClAT+muqNVGw8lLUUizyD+aCPn3
         +NZ9CzU/hv4UFG7HGI6+Fazfur5zwLR2sE8CibZHbQuMdNVzoYq8Lbj0GtvGRm+nYgPw
         MEmA3bVym5zAQ65/fNaBFwa/PQsz8AUr9tq5flqGhCNjh9y6yO0IoLyAqVBLKDEDBbU/
         s/4g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689781306; x=1692373306;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=+vilaCcfpgA2rPjdFz2oVChGSsfdXnUn+suiTIVewmg=;
        b=U3RnBLKvNCwEq4lUX211KKQ4I1VsmwxCpADhjL1b4GHQqTdLsV9sATW9E6AY+gddoD
         bCdhERuLz7vW6ZM/vYlNWxktZ9SzBOqWyW1ENSmj4C6MJ3xLwMvCLfeNCAe23ChR79FG
         KVvhtb7v6gqYK82e0N0C+psGVW5+iNdHj7LhsenrGYnS9yoiQ6voM1zHCa6yfQuzF2lI
         nTGJYNdz5Yb3hfl0T2z0Rvz3WegmikUT5UeX60r0mluSFA4yXoCtjniP6dc8tQR1uOMk
         ECZkP8LY+l/4USKCtkBugVrFetUDAbMau/igXllkBLj48krzDRKIhhOeJ+Den5+gHVh3
         s1mQ==
X-Gm-Message-State: ABy/qLayaBqilaXF76SESiJ8BTls/gd+kJJjTN4KC/VwsKJEQYU3Jplu
	F72d9QRrtfZ2dawX9TmdOYTL7p7FP6VIWH6efO9NlA==
X-Google-Smtp-Source: APBJJlFw8wJDLLKr429JIGs5vC9Hmob7l+sds8Zx2EMCIzwRAa0C2EBDYhZdyUrDjfV0+NRfxiwPk6rvUH+jYFaVVgs=
X-Received: by 2002:a50:fb19:0:b0:521:6275:c9af with SMTP id
 d25-20020a50fb19000000b005216275c9afmr2869040edq.7.1689781306447; Wed, 19 Jul
 2023 08:41:46 -0700 (PDT)
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
References: <CANCZdfokEoRtNp0en=9pjLQSQ+jtmfwH3OOwz1z09VcwWpE+xg@mail.gmail.com>
 <CAOtMX2g4+SDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com>
 <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com>
 <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=+gUowXGS_gtyDOkig@mail.gmail.com> <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
In-Reply-To: <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
From: Warner Losh <imp@bsdimp.com>
Date: Wed, 19 Jul 2023 09:41:37 -0600
Message-ID: <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com>
Subject: Re: ASC/ASCQ Review
To: Alan Somers <asomers@freebsd.org>
Cc: scsi@freebsd.org
Content-Type: multipart/alternative; boundary="00000000000084edec0600d8debe"
X-Spamd-Result: default: False [-2.99 / 15.00];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-0.99)[-0.994];
	FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com];
	R_DKIM_ALLOW(-0.20)[bsdimp-com.20221208.gappssmtp.com:s=20221208];
	MIME_GOOD(-0.10)[multipart/alternative,text/plain];
	MLMMJ_DEST(0.00)[scsi@freebsd.org];
	R_SPF_NA(0.00)[no SPF record];
	MIME_TRACE(0.00)[0:+,1:+,2:~];
	ARC_NA(0.00)[];
	RCVD_TLS_LAST(0.00)[];
	DKIM_TRACE(0.00)[bsdimp-com.20221208.gappssmtp.com:+];
	RCVD_COUNT_TWO(0.00)[2];
	RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52f:from];
	BLOCKLISTDE_FAIL(0.00)[2a00:1450:4864:20::52f:server fail];
	ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
	TO_DN_SOME(0.00)[];
	FROM_HAS_DN(0.00)[];
	RCPT_COUNT_TWO(0.00)[2];
	PREVIOUSLY_DELIVERED(0.00)[scsi@freebsd.org];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	DMARC_NA(0.00)[bsdimp.com];
	FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]
X-Rspamd-Queue-Id: 4R5g6N4PFKz438L
X-Spamd-Bar: --

--00000000000084edec0600d8debe
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

btw, it also occurs to me that if I do add a 'secondary' table, then you
could use it to generate a unique errno and experiment
with that w/o affecting the main code until that stuff was mature.

I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs
that I'd like to tag as 'if trying harder, retry, otherwise fail' since
re-retry needs have changed a lot since cam was written in the late 90s and
at least some of the asc/ascq pairs I'm looking at haven't changed since
the initial import, but that's based on a tiny sampling of the data I have
and is preliminary at best. I may just change it to reflect modern usage.

Warner

On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers <asomers@freebsd.org=
> wrote:
>
>> On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh <imp@bsdimp.com> wr=
ote:
>> >
>> >
>> >
>> > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <asomers@freebsd.org> wrote=
:
>> >>
>> >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh <imp@bsdimp.com>=
 wrote:
>> >> >
>> >> > Greetings,
>> >> >
>> >> > i've been looking closely at failed drives for $WORK lately. I've
>> noticed that a lot of errors that kinda sound like fatal errors have
>> SS_RDEF set on them.
>> >> >
>> >> > What's the process for evaluating whether those error codes are
>> worth retrying. There are several errors that we seem to be seeing
>> (preliminary read of the data) before the drive gives up the ghost
>> altogether. For those cases, I'd like to post more specific lists. Shoul=
d I
>> do that here?
>> >> >
>> >> > Independent of that, I may want to have a more aggressive 'fail
>> fast' policy than is appropriate for my work load (we have a lot of data
>> that's a copy of a copy of a copy, so if we lose it, we don't care: we'l=
l
>> just delete any files we can't read and get on with life, though I know
>> others will have a more conservative attitude towards data that might be
>> precious and unique). I can set the number of retries lower, I can do so=
me
>> other hacks for disks that tell the disk to fail faster, but I think par=
t
>> of the solution is going to have to be failing for some sense-code/ASC/A=
SCQ
>> tuples that we don't want to fail in upstream or the general case. I was
>> thinking of identifying those and creating a 'global quirk table' that g=
ets
>> applied after the drive-specific quirk table that would let $WORK overri=
de
>> the defaults, while letting others keep the current behavior. IMHO, it
>> would be better to have these separate rather than in the global data fo=
r
>> tracking upstream...
>> >> >
>> >> > Is that clear, or should I give concrete examples?
>> >> >
>> >> > Comments?
>> >> >
>> >> > Warner
>> >>
>> >> Basically, you want to change the retry counts for certain ASC/ASCQ
>> >> codes only, on a site-by-site basis?  That sounds reasonable.  Would
>> >> it be configurable at runtime or only at build time?
>> >
>> >
>> > I'd like to change the default actions. But maybe we just do that for
>> everyone and assume modern drives...
>> >
>> >> Also, I've been thinking lately that it would be real nice if READ
>> >> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.  That
>> >> would let consumers know that retries are pointless, but that the dat=
a
>> >> is probably healable.
>> >
>> >
>> > Unlikely, unless you've tuned things to not try for long at recovery..=
.
>> >
>> > But regardless... do you have a concrete example of a use case? There'=
s
>> a number of places that map any error to EIO. And I'd like a use case
>> before we expand the errors the lower layers return...
>> >
>> > Warner
>>
>> My first use-case is a user-space FUSE file system.  It only has
>> access to errnos, not ASC/ASCQ codes.  If we do as I suggest, then it
>> could heal a READ UNRECOVERABLE by rewriting the sector, whereas other
>> EIO errors aren't likely to be healed that way.
>>
>
> Yea... but READ UNRECOVERABLE is kinda hit or miss...
>
>
>> My second use-case is ZFS.  zfsd treats checksum errors differently
>> from I/O errors.  A checksum error normally means that a read returned
>> wrong data.  But I think that READ UNRECOVERABLE should also count.
>> After all, that means that the disk's media returned wrong data which
>> was detected by the disk's own EDC/ECC.  I've noticed that zfsd seems
>> to fault disks too eagerly when their only problem is READ
>> UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new error
>> code, would let zfsd be tuned better.
>>
>
> EINTEGRITY would then mean two different things. UFS returns in when
> checksums fail for critical filesystem errors. I'm not saying no, per se,
> just that it conflates two different errors.
>
> I think both of these use cases would be better served by CAM's publishin=
g
> of the errors to devctl today. Here's some example data from a system I'm
> looking at:
>
> system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=3D"12=
345"
> cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04 cc 00=
 "
>  timestamp=3D1634739729.312068
> system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=3D"12=
345"
> cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00 c0 00=
 "
>  timestamp=3D1634739729.585541
> system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D"1234=
5"
> cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=3D"28=
 00 ad 1a
> 35 96 00 00 56 00 " timestamp=3D1641979267.469064
> system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D"1234=
5"
> cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=3D"28=
 00 ad 1a
> 35 96 00 01 5e 00 "  timestamp=3D1642252539.693699
> system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial=3D"1234=
6"
> cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB=3D"2a=
 00 01 2b
> c8 f6 00 07 81 00 "  timestamp=3D1669603144.090835
>
> Here we get the sense key, the asc and the ascq in the scsi_sense data
> (I'm currently looking at expanding this to the entire sense buffer, sinc=
e
> it includes how hard the drive tried to read the data on media and hardwa=
re
> errors).  It doesn't include nvme data, but does include ata data (I'll
> have to add that data, now that I've noticed it is missing).  With the
> sense data and the CDB you know what kind of error you got, plus what blo=
ck
> didn't read/write correctly. With the extended sense data, you can find o=
ut
> even more details that are sense-key dependent...
>
> So I'm unsure that trying to shoehorn our imperfect knowledge of what's
> retriable, fixable, should be written with zeros into the kernel and
> converting that to a separate errno would give good results, and tapping
> into this stream daemons that want to make more nuanced calls about disks
> might be the better way to go. One of the things I'm planning for $WORK i=
s
> to enable the retry time limit of one of the mode pages so that we fail
> faster and can just delete the file with the 'bad' block that we'd get
> eventually if we allowed the full, default error processing to run, but
> that 'slow path' processing kills performance for all other users of the
> drive...  I'm unsure how well that will work out (and I know I'm lucky th=
at
> I can always recover any data for my application since it's just a cache)=
.
>
> I'd be interested to hear what others have to say here thought, since my
> focus on this data is through the lense of my rather specialized
> application...
>
> Warner
>
> P.S. That was generated with this rule if you wanted to play with it...
> You'd have to translate absolute disk blocks to a partition and an offset
> into the filesystem, then give the filesystem a chance to tell you what o=
f
> its data/metadata that block is used for...
>
> # Disk errors
> notify 10 {
>         match "system"          "CAM";
>         match "subsystem"       "periph";
>         match "device"          "[an]?da[0-9]+";
>         action "logger -t diskerr -p daemon.info $_ timestamp=3D$timestam=
p";
> };
>
>

--00000000000084edec0600d8debe
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>btw, it also occurs to me that if I do add a &#39;sec=
ondary&#39; table, then you could use it to generate a unique errno and exp=
eriment</div><div>with that w/o affecting the main code until that stuff wa=
s mature.</div><div><br></div><div>I&#39;m not sure I&#39;ll do that now, s=
ince I&#39;ve found maybe 10 asc/ascq pairs that I&#39;d like to tag as =
9;if trying harder, retry, otherwise fail&#39; since re-retry needs have ch=
anged a lot since cam was written in the late 90s and at least some of the =
asc/ascq pairs I&#39;m looking at haven&#39;t changed since the initial imp=
ort, but that&#39;s based on a tiny sampling of the data I have and is prel=
iminary at best. I may just change it to reflect modern usage.<br></div><di=
v><br></div><div>Warner<br></div></div><br><div class=3D"gmail_quote"><div =
dir=3D"ltr" class=3D"gmail_attr">On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Wa=
rner Losh &lt;<a href=3D"mailto:imp@bsdimp.com">imp@bsdimp.com</a>&gt; wrot=
e:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0=
.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"l=
tr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote"><div dir=3D"l=
tr" class=3D"gmail_attr">On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Some=
rs &lt;<a href=3D"mailto:asomers@freebsd.org" target=3D"_blank">asomers@fre=
ebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D=
"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-le=
ft:1ex">On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh &lt;<a href=3D=
"mailto:imp@bsdimp.com" target=3D"_blank">imp@bsdimp.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Fri, Jul 14, 2023, 11:12 AM Alan Somers &lt;<a href=3D"mailto:asome=
rs@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh &lt;<a href=
=3D"mailto:imp@bsdimp.com" target=3D"_blank">imp@bsdimp.com</a>&gt; wrote:<=
br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Greetings,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; i&#39;ve been looking closely at failed drives for $WORK late=
ly. I&#39;ve noticed that a lot of errors that kinda sound like fatal error=
s have SS_RDEF set on them.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; What&#39;s the process for evaluating whether those error cod=
es are worth retrying. There are several errors that we seem to be seeing (=
preliminary read of the data) before the drive gives up the ghost altogethe=
r. For those cases, I&#39;d like to post more specific lists. Should I do t=
hat here?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Independent of that, I may want to have a more aggressive =
9;fail fast&#39; policy than is appropriate for my work load (we have a lot=
 of data that&#39;s a copy of a copy of a copy, so if we lose it, we don=
9;t care: we&#39;ll just delete any files we can&#39;t read and get on with=
 life, though I know others will have a more conservative attitude towards =
data that might be precious and unique). I can set the number of retries lo=
wer, I can do some other hacks for disks that tell the disk to fail faster,=
 but I think part of the solution is going to have to be failing for some s=
ense-code/ASC/ASCQ tuples that we don&#39;t want to fail in upstream or the=
 general case. I was thinking of identifying those and creating a &#39;glob=
al quirk table&#39; that gets applied after the drive-specific quirk table =
that would let $WORK override the defaults, while letting others keep the c=
urrent behavior. IMHO, it would be better to have these separate rather tha=
n in the global data for tracking upstream...<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Is that clear, or should I give concrete examples?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Comments?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Warner<br>
&gt;&gt;<br>
&gt;&gt; Basically, you want to change the retry counts for certain ASC/ASC=
Q<br>
&gt;&gt; codes only, on a site-by-site basis?=C2=A0 That sounds reasonable.=
=C2=A0 Would<br>
&gt;&gt; it be configurable at runtime or only at build time?<br>
&gt;<br>
&gt;<br>
&gt; I&#39;d like to change the default actions. But maybe we just do that =
for everyone and assume modern drives...<br>
&gt;<br>
&gt;&gt; Also, I&#39;ve been thinking lately that it would be real nice if =
READ<br>
&gt;&gt; UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.=C2=
=A0 That<br>
&gt;&gt; would let consumers know that retries are pointless, but that the =
data<br>
&gt;&gt; is probably healable.<br>
&gt;<br>
&gt;<br>
&gt; Unlikely, unless you&#39;ve tuned things to not try for long at recove=
ry...<br>
&gt;<br>
&gt; But regardless... do you have a concrete example of a use case? There&=
#39;s a number of places that map any error to EIO. And I&#39;d like a use =
case before we expand the errors the lower layers return...<br>
&gt;<br>
&gt; Warner<br>
<br>
My first use-case is a user-space FUSE file system.=C2=A0 It only has<br>
access to errnos, not ASC/ASCQ codes.=C2=A0 If we do as I suggest, then it<=
br>
could heal a READ UNRECOVERABLE by rewriting the sector, whereas other<br>
EIO errors aren&#39;t likely to be healed that way.<br></blockquote><div><b=
r></div><div>Yea... but READ UNRECOVERABLE is kinda hit or miss...</div><di=
v>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
My second use-case is ZFS.=C2=A0 zfsd treats checksum errors differently<br=
>
from I/O errors.=C2=A0 A checksum error normally means that a read returned=
<br>
wrong data.=C2=A0 But I think that READ UNRECOVERABLE should also count.<br=
>
After all, that means that the disk&#39;s media returned wrong data which<b=
r>
was detected by the disk&#39;s own EDC/ECC.=C2=A0 I&#39;ve noticed that zfs=
d seems<br>
to fault disks too eagerly when their only problem is READ<br>
UNRECOVERABLE errors.=C2=A0 Mapping it to EINTEGRITY, or even a new error<b=
r>
code, would let zfsd be tuned better.<br></blockquote><div><br></div><div>E=
INTEGRITY would then mean two different things. UFS returns in when checksu=
ms fail for critical=C2=A0filesystem errors. I&#39;m not saying no, per se,=
 just that it conflates two different errors.</div><div><br></div><div>I th=
ink both of these use cases would be better served by CAM&#39;s publishing =
of the errors to devctl today. Here&#39;s some example data from a system I=
&#39;m looking at:</div><div><br></div><div>system=3DCAM subsystem=3Dperiph=
 type=3Dtimeout device=3Dda36 serial=3D&quot;12345&quot; cam_status=3D&quot=
;0x44b&quot; timeout=3D30000 CDB=3D&quot;28 00 4e b7 cb a3 00 04 cc 00 &quo=
t; =C2=A0timestamp=3D1634739729.312068<br>system=3DCAM subsystem=3Dperiph t=
ype=3Dtimeout device=3Dda36 serial=3D&quot;12345&quot; cam_status=3D&quot;0=
x44b&quot; timeout=3D30000 CDB=3D&quot;28 00 20 6b d5 56 00 00 c0 00 &quot;=
 =C2=A0timestamp=3D1634739729.585541<br>system=3DCAM subsystem=3Dperiph typ=
e=3Derror device=3Dda36 serial=3D&quot;12345&quot; cam_status=3D&quot;0x4cc=
&quot; scsi_status=3D2 scsi_sense=3D&quot;72 03 11 00&quot; CDB=3D&quot;28 =
00 ad 1a 35 96 00 00 56 00 &quot; timestamp=3D1641979267.469064<br>system=
=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D&quot;12345&q=
uot; cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi_sense=3D&quot;72 0=
3 11 00&quot; CDB=3D&quot;28 00 ad 1a 35 96 00 01 5e 00 &quot; =C2=A0timest=
amp=3D1642252539.693699<br></div><div>system=3DCAM subsystem=3Dperiph type=
=3Derror device=3Dda39 serial=3D&quot;12346&quot; cam_status=3D&quot;0x4cc&=
quot; scsi_status=3D2 scsi_sense=3D&quot;72 04 02 00&quot; CDB=3D&quot;2a 0=
0 01 2b c8 f6 00 07 81 00 &quot; =C2=A0timestamp=3D1669603144.090835<br></d=
iv><div><br></div><div>Here we get the sense key, the asc and the ascq in t=
he scsi_sense data (I&#39;m currently looking at expanding this to the enti=
re sense buffer, since it includes how hard the drive tried to read the dat=
a on media and hardware errors).=C2=A0 It doesn&#39;t include nvme data, bu=
t does include ata data (I&#39;ll have to add that data, now that I&#39;ve =
noticed it is missing).=C2=A0 With the sense data and the CDB you know what=
 kind of error you got, plus what block didn&#39;t read/write correctly. Wi=
th the extended sense data, you can find out even more details that are sen=
se-key dependent...</div><div><br></div><div>So I&#39;m unsure that trying =
to shoehorn our imperfect knowledge of what&#39;s retriable, fixable, shoul=
d be written with zeros into the kernel and converting that to a separate e=
rrno would give good results, and tapping into this stream daemons that wan=
t to make more nuanced calls about disks might be the better way to go. One=
 of the things I&#39;m planning for $WORK is to enable the retry time limit=
 of one of the mode pages so that we fail faster and can just delete the fi=
le with the &#39;bad&#39; block that we&#39;d get eventually if we allowed =
the full, default error processing to run, but that &#39;slow path&#39; pro=
cessing kills performance for all other users of the drive...=C2=A0 I&#39;m=
 unsure how well that will work out (and I know I&#39;m lucky that I can al=
ways recover any data for my application since it&#39;s just a cache).</div=
><div><br></div><div>I&#39;d be interested to hear what others have to say =
here thought, since my focus on this data is through the lense of my rather=
 specialized application...</div><div><br></div><div>Warner</div><div><br><=
/div><div>P.S. That was generated with this rule if you wanted to play with=
 it... You&#39;d have to translate absolute disk blocks to a partition and =
an offset into the filesystem, then give the filesystem a chance to tell yo=
u what of its data/metadata that block is used for...</div><div><br></div><=
div># Disk errors<br>notify 10 {<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot=
;system&quot; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;CAM&quot;;<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 match &quot;subsystem&quot; =C2=A0 =C2=A0 =C2=A0 &quot=
;periph&quot;;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;device&quot; =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;[an]?da[0-9]+&quot;;<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 action &quot;logger -t diskerr -p <a href=3D"http://daemon.in=
fo" target=3D"_blank">daemon.info</a> $_ timestamp=3D$timestamp&quot;;<br>}=
;<br></div><div><br></div></div></div>
</blockquote></div>

--00000000000084edec0600d8debe--

From nobody Fri Jul 21 03:18:44 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6ZXF0gwdz4nrT7
	for <scsi@mlmmj.nyi.freebsd.org>; Fri, 21 Jul 2023 03:18:53 +0000 (UTC)
	(envelope-from dgilbert@interlog.com)
Received: from mp-relay-01.fibernetics.ca (mp-relay-01.fibernetics.ca [208.85.217.136])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R6ZXD1ZSGz4l1x;
	Fri, 21 Jul 2023 03:18:52 +0000 (UTC)
	(envelope-from dgilbert@interlog.com)
Authentication-Results: mx1.freebsd.org;
	dkim=none;
	spf=pass (mx1.freebsd.org: domain of dgilbert@interlog.com designates 208.85.217.136 as permitted sender) smtp.mailfrom=dgilbert@interlog.com;
	dmarc=none
Received: from mailpool-fe-01.fibernetics.ca (mailpool-fe-01.fibernetics.ca [208.85.217.144])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by mp-relay-01.fibernetics.ca (Postfix) with ESMTPS id 8DB1CE1C04;
	Fri, 21 Jul 2023 03:18:45 +0000 (UTC)
Received: from localhost (mailpool-mx-01.fibernetics.ca [208.85.217.140])
	by mailpool-fe-01.fibernetics.ca (Postfix) with ESMTP id 7DD8B3CAB7;
	Fri, 21 Jul 2023 03:18:45 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at 
X-Spam-Flag: NO
X-Spam-Score: -0.199
X-Spam-Level:
X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5
	tests=[ALL_TRUSTED=-1, BAYES_50=0.8, URIBL_BLOCKED=0.001]
	autolearn=no autolearn_force=no
Received: from mailpool-fe-01.fibernetics.ca ([208.85.217.144])
	by localhost (mail-mx-01.fibernetics.ca [208.85.217.140]) (amavisd-new, port 10024)
	with ESMTP id pksqIB5eRMRX; Fri, 21 Jul 2023 03:18:44 +0000 (UTC)
Received: from [192.168.48.17] (host-192.252-165-26.dyn.295.ca [192.252.165.26])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits))
	(No client certificate requested)
	(Authenticated sender: dgilbert@interlog.com)
	by mail.ca.inter.net (Postfix) with ESMTPSA id 84D8A3CAB5;
	Fri, 21 Jul 2023 03:18:44 +0000 (UTC)
Message-ID: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
Date: Thu, 20 Jul 2023 23:18:44 -0400
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Reply-To: dgilbert@interlog.com
Subject: Re: ASC/ASCQ Review
Content-Language: en-CA
To: Warner Losh <imp@bsdimp.com>, Alan Somers <asomers@freebsd.org>
Cc: scsi@freebsd.org
References: <CANCZdfokEoRtNp0en=9pjLQSQ+jtmfwH3OOwz1z09VcwWpE+xg@mail.gmail.com>
 <CAOtMX2g4+SDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com>
 <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com>
 <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=+gUowXGS_gtyDOkig@mail.gmail.com>
 <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
 <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com>
From: Douglas Gilbert <dgilbert@interlog.com>
In-Reply-To: <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spamd-Result: default: False [-3.30 / 15.00];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-1.00)[-0.999];
	NEURAL_HAM_SHORT(-1.00)[-0.998];
	R_SPF_ALLOW(-0.20)[+ip4:208.85.217.0/24];
	MIME_GOOD(-0.10)[text/plain];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	MLMMJ_DEST(0.00)[scsi@freebsd.org];
	R_DKIM_NA(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	ASN(0.00)[asn:36493, ipnet:208.85.216.0/21, country:CA];
	MIME_TRACE(0.00)[0:+];
	HAS_REPLYTO(0.00)[dgilbert@interlog.com];
	DMARC_NA(0.00)[interlog.com];
	REPLYTO_ADDR_EQ_FROM(0.00)[];
	RCVD_COUNT_FIVE(0.00)[5];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	RCVD_TLS_LAST(0.00)[];
	ARC_NA(0.00)[];
	RCPT_COUNT_THREE(0.00)[3];
	FROM_HAS_DN(0.00)[];
	TO_DN_SOME(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[]
X-Rspamd-Queue-Id: 4R6ZXD1ZSGz4l1x
X-Spamd-Bar: ---

On 2023-07-19 11:41, Warner Losh wrote:
> btw, it also occurs to me that if I do add a 'secondary' table, then you could 
> use it to generate a unique errno and experiment
> with that w/o affecting the main code until that stuff was mature.
> 
> I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs that I'd 
> like to tag as 'if trying harder, retry, otherwise fail' since re-retry needs 
> have changed a lot since cam was written in the late 90s and at least some of 
> the asc/ascq pairs I'm looking at haven't changed since the initial import, but 
> that's based on a tiny sampling of the data I have and is preliminary at best. I 
> may just change it to reflect modern usage.

Hi,
If you are looking for up-to-date [20230325] asc/ascq tables in C you could
borrow mine at https://github.com/doug-gilbert/sg3_utils in lib/sg_lib_data.c
starting at line 745 .
In testing/sg_chk_asc.c is a small test program for checking that the table in
sg_lib_data.c agrees with the file that T10 supplies:
      https://www.t10.org/lists/asc-num.txt

Doug Gilbert

> On Fri, Jul 14, 2023 at 5:34 PM Warner Losh <imp@bsdimp.com 
> <mailto:imp@bsdimp.com>> wrote:
> 
> 
> 
>     On Fri, Jul 14, 2023 at 12:31 PM Alan Somers <asomers@freebsd.org
>     <mailto:asomers@freebsd.org>> wrote:
> 
>         On Fri, Jul 14, 2023 at 11:05 AM Warner Losh <imp@bsdimp.com
>         <mailto:imp@bsdimp.com>> wrote:
>          >
>          >
>          >
>          > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <asomers@freebsd.org
>         <mailto:asomers@freebsd.org>> wrote:
>          >>
>          >> On Thu, Jul 13, 2023 at 12:14 PM Warner Losh <imp@bsdimp.com
>         <mailto:imp@bsdimp.com>> wrote:
>          >> >
>          >> > Greetings,
>          >> >
>          >> > i've been looking closely at failed drives for $WORK lately. I've
>         noticed that a lot of errors that kinda sound like fatal errors have
>         SS_RDEF set on them.
>          >> >
>          >> > What's the process for evaluating whether those error codes are
>         worth retrying. There are several errors that we seem to be seeing
>         (preliminary read of the data) before the drive gives up the ghost
>         altogether. For those cases, I'd like to post more specific lists.
>         Should I do that here?
>          >> >
>          >> > Independent of that, I may want to have a more aggressive 'fail
>         fast' policy than is appropriate for my work load (we have a lot of data
>         that's a copy of a copy of a copy, so if we lose it, we don't care:
>         we'll just delete any files we can't read and get on with life, though I
>         know others will have a more conservative attitude towards data that
>         might be precious and unique). I can set the number of retries lower, I
>         can do some other hacks for disks that tell the disk to fail faster, but
>         I think part of the solution is going to have to be failing for some
>         sense-code/ASC/ASCQ tuples that we don't want to fail in upstream or the
>         general case. I was thinking of identifying those and creating a 'global
>         quirk table' that gets applied after the drive-specific quirk table that
>         would let $WORK override the defaults, while letting others keep the
>         current behavior. IMHO, it would be better to have these separate rather
>         than in the global data for tracking upstream...
>          >> >
>          >> > Is that clear, or should I give concrete examples?
>          >> >
>          >> > Comments?
>          >> >
>          >> > Warner
>          >>
>          >> Basically, you want to change the retry counts for certain ASC/ASCQ
>          >> codes only, on a site-by-site basis?  That sounds reasonable.  Would
>          >> it be configurable at runtime or only at build time?
>          >
>          >
>          > I'd like to change the default actions. But maybe we just do that for
>         everyone and assume modern drives...
>          >
>          >> Also, I've been thinking lately that it would be real nice if READ
>          >> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.  That
>          >> would let consumers know that retries are pointless, but that the data
>          >> is probably healable.
>          >
>          >
>          > Unlikely, unless you've tuned things to not try for long at recovery...
>          >
>          > But regardless... do you have a concrete example of a use case?
>         There's a number of places that map any error to EIO. And I'd like a use
>         case before we expand the errors the lower layers return...
>          >
>          > Warner
> 
>         My first use-case is a user-space FUSE file system.  It only has
>         access to errnos, not ASC/ASCQ codes.  If we do as I suggest, then it
>         could heal a READ UNRECOVERABLE by rewriting the sector, whereas other
>         EIO errors aren't likely to be healed that way.
> 
> 
>     Yea... but READ UNRECOVERABLE is kinda hit or miss...
> 
>         My second use-case is ZFS.  zfsd treats checksum errors differently
>         from I/O errors.  A checksum error normally means that a read returned
>         wrong data.  But I think that READ UNRECOVERABLE should also count.
>         After all, that means that the disk's media returned wrong data which
>         was detected by the disk's own EDC/ECC.  I've noticed that zfsd seems
>         to fault disks too eagerly when their only problem is READ
>         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new error
>         code, would let zfsd be tuned better.
> 
> 
>     EINTEGRITY would then mean two different things. UFS returns in when
>     checksums fail for critical filesystem errors. I'm not saying no, per se,
>     just that it conflates two different errors.
> 
>     I think both of these use cases would be better served by CAM's publishing
>     of the errors to devctl today. Here's some example data from a system I'm
>     looking at:
> 
>     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
>     cam_status="0x44b" timeout=30000 CDB="28 00 4e b7 cb a3 00 04 cc 00 "
>       timestamp=1634739729.312068
>     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
>     cam_status="0x44b" timeout=30000 CDB="28 00 20 6b d5 56 00 00 c0 00 "
>       timestamp=1634739729.585541
>     system=CAM subsystem=periph type=error device=da36 serial="12345"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a
>     35 96 00 00 56 00 " timestamp=1641979267.469064
>     system=CAM subsystem=periph type=error device=da36 serial="12345"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a
>     35 96 00 01 5e 00 "  timestamp=1642252539.693699
>     system=CAM subsystem=periph type=error device=da39 serial="12346"
>     cam_status="0x4cc" scsi_status=2 scsi_sense="72 04 02 00" CDB="2a 00 01 2b
>     c8 f6 00 07 81 00 "  timestamp=1669603144.090835
> 
>     Here we get the sense key, the asc and the ascq in the scsi_sense data (I'm
>     currently looking at expanding this to the entire sense buffer, since it
>     includes how hard the drive tried to read the data on media and hardware
>     errors).  It doesn't include nvme data, but does include ata data (I'll have
>     to add that data, now that I've noticed it is missing).  With the sense data
>     and the CDB you know what kind of error you got, plus what block didn't
>     read/write correctly. With the extended sense data, you can find out even
>     more details that are sense-key dependent...
> 
>     So I'm unsure that trying to shoehorn our imperfect knowledge of what's
>     retriable, fixable, should be written with zeros into the kernel and
>     converting that to a separate errno would give good results, and tapping
>     into this stream daemons that want to make more nuanced calls about disks
>     might be the better way to go. One of the things I'm planning for $WORK is
>     to enable the retry time limit of one of the mode pages so that we fail
>     faster and can just delete the file with the 'bad' block that we'd get
>     eventually if we allowed the full, default error processing to run, but that
>     'slow path' processing kills performance for all other users of the
>     drive...  I'm unsure how well that will work out (and I know I'm lucky that
>     I can always recover any data for my application since it's just a cache).
> 
>     I'd be interested to hear what others have to say here thought, since my
>     focus on this data is through the lense of my rather specialized application...
> 
>     Warner
> 
>     P.S. That was generated with this rule if you wanted to play with it...
>     You'd have to translate absolute disk blocks to a partition and an offset
>     into the filesystem, then give the filesystem a chance to tell you what of
>     its data/metadata that block is used for...
> 
>     # Disk errors
>     notify 10 {
>              match "system"          "CAM";
>              match "subsystem"       "periph";
>              match "device"          "[an]?da[0-9]+";
>              action "logger -t diskerr -p daemon.info <http://daemon.info> $_
>     timestamp=$timestamp";
>     };
> 


From nobody Fri Jul 21 03:26:07 2023
X-Original-To: scsi@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6Zhs0hNqz4nwLH
	for <scsi@mlmmj.nyi.freebsd.org>; Fri, 21 Jul 2023 03:26:21 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4R6Zhr5ZWgz3DBV
	for <scsi@freebsd.org>; Fri, 21 Jul 2023 03:26:20 +0000 (UTC)
	(envelope-from wlosh@bsdimp.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-9922d6f003cso237769866b.0
        for <scsi@freebsd.org>; Thu, 20 Jul 2023 20:26:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1689909979; x=1690514779;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=;
        b=28sSZa5jwc1eLWJxDuAgu1vGy1svseVYTKYxA/qnSdTdTzvqUP6sTFgp0ar4lgJPEF
         BZfh1ykxhtBQIIirtkz+D2DHS5/6/s9AZISGRoTVtjXdIenDL5y7HlRk/rSANKU2nfnP
         ozjaI7lEbymfeM0uP5tUX2AC695xCs/xTdOXT0/+bF6S5uZTzLRGz94Vl4qmpuXZzQqq
         DDPBkh/kx2YPafAS+W3C3aAntrPZX5I4UAO8p6cJqjkSAD1UWwRHitqTz3WUObdveqjc
         bO+SKEBeKApDTK8+hqUFr/IibzQD8e7YKDw344bJKIb1SSnYQ8zLRv6iC8jO/yyUi4dD
         D6pA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1689909979; x=1690514779;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=;
        b=WR2UB7V4RZmJyFh+ErCmArImzpuJ9twRjWfQ8zHRf9SNC8X2rll1IuCo+A6figTEMt
         YZHPT5YVqImF+OIsy6XORvi46FiSGWlJfQwN2ybOabLIifZsuI7JYcB3JJb5229dxME3
         NIunmGyp35WRVS/hyQZUw0C8EtvNxs4a/MWxxs0199ca7KRl+/48IWkug5Yv3JTZReg5
         yDXDs2bKI99AZJPW8xk6H8ceXYO6ifWjb4B7D7uz8DnO57A5MVmqkQqDfSyvinQ7hqp7
         gueMNVHw34iHEcmCGibtDDgsVUyHUNqGLTt7QMKocaGr+VrZHMn+AWiW+zV5UCaS34tn
         kuiw==
X-Gm-Message-State: ABy/qLbKHRjgyKBhhi8sHwJHfDu3WIgFj7ckings//Ab9UgPDpWeV+gj
	j55xJbQapdLwUyhhceV5O8a0j6YcHt8ozPOZYh7EsQ==
X-Google-Smtp-Source: APBJJlEr4sZPZKt1pSvRmqKfaAP9GYJSKGJ8QkUVQZLj5hdhgIZselRTINa8olnaY5hG5PtRxVQS8bHbanBl3K9H9PY=
X-Received: by 2002:a17:906:8451:b0:994:1fd2:cf96 with SMTP id
 e17-20020a170906845100b009941fd2cf96mr588912ejy.0.1689909978614; Thu, 20 Jul
 2023 20:26:18 -0700 (PDT)
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-scsi
List-Help: <mailto:scsi+help@freebsd.org>
List-Post: <mailto:scsi@freebsd.org>
List-Subscribe: <mailto:scsi+subscribe@freebsd.org>
List-Unsubscribe: <mailto:scsi+unsubscribe@freebsd.org>
Sender: owner-freebsd-scsi@freebsd.org
X-BeenThere: freebsd-scsi@freebsd.org
MIME-Version: 1.0
References: <CANCZdfokEoRtNp0en=9pjLQSQ+jtmfwH3OOwz1z09VcwWpE+xg@mail.gmail.com>
 <CAOtMX2g4+SDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com>
 <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com>
 <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=+gUowXGS_gtyDOkig@mail.gmail.com>
 <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j+bo+WiDwUT5uR7A@mail.gmail.com>
 <CANCZdfptEG=+xa3m31Ngre26ZQxZ_Fqsfjmh+tVHgP2XpqhZ7g@mail.gmail.com> <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
In-Reply-To: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
From: Warner Losh <imp@bsdimp.com>
Date: Thu, 20 Jul 2023 21:26:07 -0600
Message-ID: <CANCZdfoed3meq_z90aC=BP7RE_Gk+Oq6K1sptO4E0s6jT_ge6Q@mail.gmail.com>
Subject: Re: ASC/ASCQ Review
To: dgilbert@interlog.com
Cc: Alan Somers <asomers@freebsd.org>, scsi@freebsd.org
Content-Type: multipart/alternative; boundary="000000000000fa60eb0600f6d3dc"
X-Rspamd-Queue-Id: 4R6Zhr5ZWgz3DBV
X-Spamd-Bar: ----
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated

--000000000000fa60eb0600f6d3dc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <dgilbert@interlog.com> wrote=
:

> On 2023-07-19 11:41, Warner Losh wrote:
> > btw, it also occurs to me that if I do add a 'secondary' table, then yo=
u
> could
> > use it to generate a unique errno and experiment
> > with that w/o affecting the main code until that stuff was mature.
> >
> > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs
> that I'd
> > like to tag as 'if trying harder, retry, otherwise fail' since re-retry
> needs
> > have changed a lot since cam was written in the late 90s and at least
> some of
> > the asc/ascq pairs I'm looking at haven't changed since the initial
> import, but
> > that's based on a tiny sampling of the data I have and is preliminary a=
t
> best. I
> > may just change it to reflect modern usage.
>
> Hi,
> If you are looking for up-to-date [20230325] asc/ascq tables in C you cou=
ld
> borrow mine at https://github.com/doug-gilbert/sg3_utils in
> lib/sg_lib_data.c
> starting at line 745 .
> In testing/sg_chk_asc.c is a small test program for checking that the
> table in
> sg_lib_data.c agrees with the file that T10 supplies:
>       https://www.t10.org/lists/asc-num.txt


Thanks for the pointer. I'd already updated CAM's tables for that...

what I'm doing now is to make sure CAM's reactions to the asc/ascq is good
for the modern drives... it's a good idea though to create a program for
our table to match...

Warner


> Doug Gilbert
>
> > On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> >
> >
> >
> >     On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers <asomers@freeb=
sd.org
> >     <mailto:asomers@freebsd.org>> wrote:
> >
> >         On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh <imp@bsdim=
p.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >
> >          >
> >          >
> >          > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <
> asomers@freebsd.org
> >         <mailto:asomers@freebsd.org>> wrote:
> >          >>
> >          >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh <imp@b=
sdimp.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >> >
> >          >> > Greetings,
> >          >> >
> >          >> > i've been looking closely at failed drives for $WORK
> lately. I've
> >         noticed that a lot of errors that kinda sound like fatal errors
> have
> >         SS_RDEF set on them.
> >          >> >
> >          >> > What's the process for evaluating whether those error
> codes are
> >         worth retrying. There are several errors that we seem to be
> seeing
> >         (preliminary read of the data) before the drive gives up the
> ghost
> >         altogether. For those cases, I'd like to post more specific
> lists.
> >         Should I do that here?
> >          >> >
> >          >> > Independent of that, I may want to have a more aggressive
> 'fail
> >         fast' policy than is appropriate for my work load (we have a lo=
t
> of data
> >         that's a copy of a copy of a copy, so if we lose it, we don't
> care:
> >         we'll just delete any files we can't read and get on with life,
> though I
> >         know others will have a more conservative attitude towards data
> that
> >         might be precious and unique). I can set the number of retries
> lower, I
> >         can do some other hacks for disks that tell the disk to fail
> faster, but
> >         I think part of the solution is going to have to be failing for
> some
> >         sense-code/ASC/ASCQ tuples that we don't want to fail in
> upstream or the
> >         general case. I was thinking of identifying those and creating =
a
> 'global
> >         quirk table' that gets applied after the drive-specific quirk
> table that
> >         would let $WORK override the defaults, while letting others kee=
p
> the
> >         current behavior. IMHO, it would be better to have these
> separate rather
> >         than in the global data for tracking upstream...
> >          >> >
> >          >> > Is that clear, or should I give concrete examples?
> >          >> >
> >          >> > Comments?
> >          >> >
> >          >> > Warner
> >          >>
> >          >> Basically, you want to change the retry counts for certain
> ASC/ASCQ
> >          >> codes only, on a site-by-site basis?  That sounds
> reasonable.  Would
> >          >> it be configurable at runtime or only at build time?
> >          >
> >          >
> >          > I'd like to change the default actions. But maybe we just do
> that for
> >         everyone and assume modern drives...
> >          >
> >          >> Also, I've been thinking lately that it would be real nice
> if READ
> >          >> UNRECOVERABLE could be translated to EINTEGRITY instead of
> EIO.  That
> >          >> would let consumers know that retries are pointless, but
> that the data
> >          >> is probably healable.
> >          >
> >          >
> >          > Unlikely, unless you've tuned things to not try for long at
> recovery...
> >          >
> >          > But regardless... do you have a concrete example of a use
> case?
> >         There's a number of places that map any error to EIO. And I'd
> like a use
> >         case before we expand the errors the lower layers return...
> >          >
> >          > Warner
> >
> >         My first use-case is a user-space FUSE file system.  It only ha=
s
> >         access to errnos, not ASC/ASCQ codes.  If we do as I suggest,
> then it
> >         could heal a READ UNRECOVERABLE by rewriting the sector, wherea=
s
> other
> >         EIO errors aren't likely to be healed that way.
> >
> >
> >     Yea... but READ UNRECOVERABLE is kinda hit or miss...
> >
> >         My second use-case is ZFS.  zfsd treats checksum errors
> differently
> >         from I/O errors.  A checksum error normally means that a read
> returned
> >         wrong data.  But I think that READ UNRECOVERABLE should also
> count.
> >         After all, that means that the disk's media returned wrong data
> which
> >         was detected by the disk's own EDC/ECC.  I've noticed that zfsd
> seems
> >         to fault disks too eagerly when their only problem is READ
> >         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new
> error
> >         code, would let zfsd be tuned better.
> >
> >
> >     EINTEGRITY would then mean two different things. UFS returns in whe=
n
> >     checksums fail for critical filesystem errors. I'm not saying no,
> per se,
> >     just that it conflates two different errors.
> >
> >     I think both of these use cases would be better served by CAM's
> publishing
> >     of the errors to devctl today. Here's some example data from a
> system I'm
> >     looking at:
> >
> >     system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04=
 cc 00 "
> >       timestamp=3D1634739729.312068
> >     system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00=
 c0 00 "
> >       timestamp=3D1634739729.585541
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=
=3D"28 00
> ad 1a
> >     35 96 00 00 56 00 " timestamp=3D1641979267.469064
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=
=3D"12345"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=
=3D"28 00
> ad 1a
> >     35 96 00 01 5e 00 "  timestamp=3D1642252539.693699
> >     system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial=
=3D"12346"
> >     cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB=
=3D"2a 00
> 01 2b
> >     c8 f6 00 07 81 00 "  timestamp=3D1669603144.090835
> >
> >     Here we get the sense key, the asc and the ascq in the scsi_sense
> data (I'm
> >     currently looking at expanding this to the entire sense buffer,
> since it
> >     includes how hard the drive tried to read the data on media and
> hardware
> >     errors).  It doesn't include nvme data, but does include ata data
> (I'll have
> >     to add that data, now that I've noticed it is missing).  With the
> sense data
> >     and the CDB you know what kind of error you got, plus what block
> didn't
> >     read/write correctly. With the extended sense data, you can find ou=
t
> even
> >     more details that are sense-key dependent...
> >
> >     So I'm unsure that trying to shoehorn our imperfect knowledge of
> what's
> >     retriable, fixable, should be written with zeros into the kernel an=
d
> >     converting that to a separate errno would give good results, and
> tapping
> >     into this stream daemons that want to make more nuanced calls about
> disks
> >     might be the better way to go. One of the things I'm planning for
> $WORK is
> >     to enable the retry time limit of one of the mode pages so that we
> fail
> >     faster and can just delete the file with the 'bad' block that we'd
> get
> >     eventually if we allowed the full, default error processing to run,
> but that
> >     'slow path' processing kills performance for all other users of the
> >     drive...  I'm unsure how well that will work out (and I know I'm
> lucky that
> >     I can always recover any data for my application since it's just a
> cache).
> >
> >     I'd be interested to hear what others have to say here thought,
> since my
> >     focus on this data is through the lense of my rather specialized
> application...
> >
> >     Warner
> >
> >     P.S. That was generated with this rule if you wanted to play with
> it...
> >     You'd have to translate absolute disk blocks to a partition and an
> offset
> >     into the filesystem, then give the filesystem a chance to tell you
> what of
> >     its data/metadata that block is used for...
> >
> >     # Disk errors
> >     notify 10 {
> >              match "system"          "CAM";
> >              match "subsystem"       "periph";
> >              match "device"          "[an]?da[0-9]+";
> >              action "logger -t diskerr -p daemon.info <
> http://daemon.info> $_
> >     timestamp=3D$timestamp";
> >     };
> >
>
>

--000000000000fa60eb0600f6d3dc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert &lt;<a h=
ref=3D"mailto:dgilbert@interlog.com" target=3D"_blank" rel=3D"noreferrer">d=
gilbert@interlog.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>On 2023-07-19 11:41, Warner Losh wrote:<br>
&gt; btw, it also occurs to me that if I do add a &#39;secondary&#39; table=
, then you could <br>
&gt; use it to generate a unique errno and experiment<br>
&gt; with that w/o affecting the main code until that stuff was mature.<br>
&gt; <br>
&gt; I&#39;m not sure I&#39;ll do that now, since I&#39;ve found maybe 10 a=
sc/ascq pairs that I&#39;d <br>
&gt; like to tag as &#39;if trying harder, retry, otherwise fail&#39; since=
 re-retry needs <br>
&gt; have changed a lot since cam was written in the late 90s and at least =
some of <br>
&gt; the asc/ascq pairs I&#39;m looking at haven&#39;t changed since the in=
itial import, but <br>
&gt; that&#39;s based on a tiny sampling of the data I have and is prelimin=
ary at best. I <br>
&gt; may just change it to reflect modern usage.<br>
<br>
Hi,<br>
If you are looking for up-to-date [20230325] asc/ascq tables in C you could=
<br>
borrow mine at <a href=3D"https://github.com/doug-gilbert/sg3_utils" rel=3D=
"noreferrer noreferrer noreferrer" target=3D"_blank">https://github.com/dou=
g-gilbert/sg3_utils</a> in lib/sg_lib_data.c<br>
starting at line 745 .<br>
In testing/sg_chk_asc.c is a small test program for checking that the table=
 in<br>
sg_lib_data.c agrees with the file that T10 supplies:<br>
=C2=A0 =C2=A0 =C2=A0 <a href=3D"https://www.t10.org/lists/asc-num.txt" rel=
=3D"noreferrer noreferrer noreferrer" target=3D"_blank">https://www.t10.org=
/lists/asc-num.txt</a></blockquote></div></div><div dir=3D"auto"><br></div>=
<div dir=3D"auto">Thanks for the pointer. I&#39;d already updated CAM&#39;s=
 tables for that...</div><div dir=3D"auto"><br></div><div dir=3D"auto">what=
 I&#39;m doing now is to make sure CAM&#39;s reactions to the asc/ascq is g=
ood for the modern drives... it&#39;s a good idea though to create a progra=
m for our table to match...</div><div dir=3D"auto"><br></div><div dir=3D"au=
to">Warner</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D=
"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"><br>
Doug Gilbert<br>
<br>
&gt; On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh &lt;<a href=3D"mai=
lto:imp@bsdimp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsd=
imp.com</a> <br>
&gt; &lt;mailto:<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer norefer=
rer" target=3D"_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somer=
s &lt;<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer noreferrer" =
target=3D"_blank">asomers@freebsd.org</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:asomers@freebsd.org" r=
el=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebsd.org</a>&gt;&=
gt; wrote:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 11:05=E2=80=
=AFAM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer n=
oreferrer" target=3D"_blank">imp@bsdimp.com</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:imp@bsdi=
mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&=
gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; On Fri, Jul 14, 2023, 11:12 AM =
Alan Somers &lt;<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer no=
referrer" target=3D"_blank">asomers@freebsd.org</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:asomers@=
freebsd.org" rel=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebs=
d.org</a>&gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; On Thu, Jul 13, 2023 at 12:=
14=E2=80=AFPM Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.com" rel=3D"nore=
ferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&lt;mailto:<a href=3D"mailto:imp@bsdi=
mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&=
gt;&gt; wrote:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Greetings,<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; i&#39;ve been looking =
closely at failed drives for $WORK lately. I&#39;ve<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0noticed that a lot of errors that kin=
da sound like fatal errors have<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SS_RDEF set on them.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; What&#39;s the process=
 for evaluating whether those error codes are<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0worth retrying. There are several err=
ors that we seem to be seeing<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(preliminary read of the data) before=
 the drive gives up the ghost<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0altogether. For those cases, I&#39;d =
like to post more specific lists.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Should I do that here?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Independent of that, I=
 may want to have a more aggressive &#39;fail<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fast&#39; policy than is appropriate =
for my work load (we have a lot of data<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that&#39;s a copy of a copy of a copy=
, so if we lose it, we don&#39;t care:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0we&#39;ll just delete any files we ca=
n&#39;t read and get on with life, though I<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0know others will have a more conserva=
tive attitude towards data that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0might be precious and unique). I can =
set the number of retries lower, I<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0can do some other hacks for disks tha=
t tell the disk to fail faster, but<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I think part of the solution is going=
 to have to be failing for some<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sense-code/ASC/ASCQ tuples that we do=
n&#39;t want to fail in upstream or the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0general case. I was thinking of ident=
ifying those and creating a &#39;global<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0quirk table&#39; that gets applied af=
ter the drive-specific quirk table that<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would let $WORK override the defaults=
, while letting others keep the<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0current behavior. IMHO, it would be b=
etter to have these separate rather<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0than in the global data for tracking =
upstream...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Is that clear, or shou=
ld I give concrete examples?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Comments?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; &gt; Warner<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; Basically, you want to chan=
ge the retry counts for certain ASC/ASCQ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; codes only, on a site-by-si=
te basis?=C2=A0 That sounds reasonable.=C2=A0 Would<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; it be configurable at runti=
me or only at build time?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; I&#39;d like to change the defa=
ult actions. But maybe we just do that for<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0everyone and assume modern drives...<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; Also, I&#39;ve been thinkin=
g lately that it would be real nice if READ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; UNRECOVERABLE could be tran=
slated to EINTEGRITY instead of EIO.=C2=A0 That<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; would let consumers know th=
at retries are pointless, but that the data<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;&gt; is probably healable.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; Unlikely, unless you&#39;ve tun=
ed things to not try for long at recovery...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; But regardless... do you have a=
 concrete example of a use case?<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0There&#39;s a number of places that m=
ap any error to EIO. And I&#39;d like a use<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0case before we expand the errors the =
lower layers return...<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &gt; Warner<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My first use-case is a user-space FUS=
E file system.=C2=A0 It only has<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0access to errnos, not ASC/ASCQ codes.=
=C2=A0 If we do as I suggest, then it<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0could heal a READ UNRECOVERABLE by re=
writing the sector, whereas other<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EIO errors aren&#39;t likely to be he=
aled that way.<br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Yea... but READ UNRECOVERABLE is kinda hit or miss.=
..<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My second use-case is ZFS.=C2=A0 zfsd=
 treats checksum errors differently<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from I/O errors.=C2=A0 A checksum err=
or normally means that a read returned<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0wrong data.=C2=A0 But I think that RE=
AD UNRECOVERABLE should also count.<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0After all, that means that the disk&#=
39;s media returned wrong data which<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was detected by the disk&#39;s own ED=
C/ECC.=C2=A0 I&#39;ve noticed that zfsd seems<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to fault disks too eagerly when their=
 only problem is READ<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UNRECOVERABLE errors.=C2=A0 Mapping i=
t to EINTEGRITY, or even a new error<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0code, would let zfsd be tuned better.=
<br>
&gt; <br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0EINTEGRITY would then mean two different things. UF=
S returns in when<br>
&gt;=C2=A0 =C2=A0 =C2=A0checksums fail for critical=C2=A0filesystem errors.=
 I&#39;m not saying no, per se,<br>
&gt;=C2=A0 =C2=A0 =C2=A0just that it conflates two different errors.<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0I think both of these use cases would be better ser=
ved by CAM&#39;s publishing<br>
&gt;=C2=A0 =C2=A0 =C2=A0of the errors to devctl today. Here&#39;s some exam=
ple data from a system I&#39;m<br>
&gt;=C2=A0 =C2=A0 =C2=A0looking at:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi=
ce=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x44b&quot; timeout=3D30000 CDB=
=3D&quot;28 00 4e b7 cb a3 00 04 cc 00 &quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.312068<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi=
ce=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x44b&quot; timeout=3D30000 CDB=
=3D&quot;28 00 20 6b d5 56 00 00 c0 00 &quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.585541<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 03 11 00&quot; CDB=3D&quot;28 00 ad 1a<br>
&gt;=C2=A0 =C2=A0 =C2=A035 96 00 00 56 00 &quot; timestamp=3D1641979267.469=
064<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda36 serial=3D&quot;12345&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 03 11 00&quot; CDB=3D&quot;28 00 ad 1a<br>
&gt;=C2=A0 =C2=A0 =C2=A035 96 00 01 5e 00 &quot; =C2=A0timestamp=3D16422525=
39.693699<br>
&gt;=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device=
=3Dda39 serial=3D&quot;12346&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0cam_status=3D&quot;0x4cc&quot; scsi_status=3D2 scsi=
_sense=3D&quot;72 04 02 00&quot; CDB=3D&quot;2a 00 01 2b<br>
&gt;=C2=A0 =C2=A0 =C2=A0c8 f6 00 07 81 00 &quot; =C2=A0timestamp=3D16696031=
44.090835<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Here we get the sense key, the asc and the ascq in =
the scsi_sense data (I&#39;m<br>
&gt;=C2=A0 =C2=A0 =C2=A0currently looking at expanding this to the entire s=
ense buffer, since it<br>
&gt;=C2=A0 =C2=A0 =C2=A0includes how hard the drive tried to read the data =
on media and hardware<br>
&gt;=C2=A0 =C2=A0 =C2=A0errors).=C2=A0 It doesn&#39;t include nvme data, bu=
t does include ata data (I&#39;ll have<br>
&gt;=C2=A0 =C2=A0 =C2=A0to add that data, now that I&#39;ve noticed it is m=
issing).=C2=A0 With the sense data<br>
&gt;=C2=A0 =C2=A0 =C2=A0and the CDB you know what kind of error you got, pl=
us what block didn&#39;t<br>
&gt;=C2=A0 =C2=A0 =C2=A0read/write correctly. With the extended sense data,=
 you can find out even<br>
&gt;=C2=A0 =C2=A0 =C2=A0more details that are sense-key dependent...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0So I&#39;m unsure that trying to shoehorn our imper=
fect knowledge of what&#39;s<br>
&gt;=C2=A0 =C2=A0 =C2=A0retriable, fixable, should be written with zeros in=
to the kernel and<br>
&gt;=C2=A0 =C2=A0 =C2=A0converting that to a separate errno would give good=
 results, and tapping<br>
&gt;=C2=A0 =C2=A0 =C2=A0into this stream daemons that want to make more nua=
nced calls about disks<br>
&gt;=C2=A0 =C2=A0 =C2=A0might be the better way to go. One of the things I&=
#39;m planning for $WORK is<br>
&gt;=C2=A0 =C2=A0 =C2=A0to enable the retry time limit of one of the mode p=
ages so that we fail<br>
&gt;=C2=A0 =C2=A0 =C2=A0faster and can just delete the file with the &#39;b=
ad&#39; block that we&#39;d get<br>
&gt;=C2=A0 =C2=A0 =C2=A0eventually if we allowed the full, default error pr=
ocessing to run, but that<br>
&gt;=C2=A0 =C2=A0 =C2=A0&#39;slow path&#39; processing kills performance fo=
r all other users of the<br>
&gt;=C2=A0 =C2=A0 =C2=A0drive...=C2=A0 I&#39;m unsure how well that will wo=
rk out (and I know I&#39;m lucky that<br>
&gt;=C2=A0 =C2=A0 =C2=A0I can always recover any data for my application si=
nce it&#39;s just a cache).<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0I&#39;d be interested to hear what others have to s=
ay here thought, since my<br>
&gt;=C2=A0 =C2=A0 =C2=A0focus on this data is through the lense of my rathe=
r specialized application...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0Warner<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0P.S. That was generated with this rule if you wante=
d to play with it...<br>
&gt;=C2=A0 =C2=A0 =C2=A0You&#39;d have to translate absolute disk blocks to=
 a partition and an offset<br>
&gt;=C2=A0 =C2=A0 =C2=A0into the filesystem, then give the filesystem a cha=
nce to tell you what of<br>
&gt;=C2=A0 =C2=A0 =C2=A0its data/metadata that block is used for...<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A0# Disk errors<br>
&gt;=C2=A0 =C2=A0 =C2=A0notify 10 {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;system&quo=
t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;CAM&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;subsystem&=
quot; =C2=A0 =C2=A0 =C2=A0 &quot;periph&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match &quot;device&quo=
t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&quot;[an]?da[0-9]+&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 action &quot;logger -t=
 diskerr -p <a href=3D"http://daemon.info" rel=3D"noreferrer noreferrer nor=
eferrer" target=3D"_blank">daemon.info</a> &lt;<a href=3D"http://daemon.inf=
o" rel=3D"noreferrer noreferrer noreferrer" target=3D"_blank">http://daemon=
.info</a>&gt; $_<br>
&gt;=C2=A0 =C2=A0 =C2=A0timestamp=3D$timestamp&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0};<br>
&gt; <br>
<br>
</blockquote></div></div></div>

--000000000000fa60eb0600f6d3dc--