From nobody Sun Jul 16 21:00:36 2023 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R3yKd4n7Tz4n4NX for ; Sun, 16 Jul 2023 21:00:37 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R3yKc5tDrz3t0b for ; Sun, 16 Jul 2023 21:00:36 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1689541236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=+iodUoOckteWkjTmXEoOAWBrTqThfGVkrdd8TLntH0s=; b=plpnKocbXgqF+4HndbWrk0LGAVIQbXyrF0Lrzf7STOac1HgiGhm6yryvqk+hOhm8OWHo+o bWx2M6JYbLZMCHL8hACvHL2afyztMa/KfUX6KUXev4vLmGE3NIYcnsVs1aPo+10sLFMlv8 MKcuOLmaZ2WL21TCx1JQjztEeg/6EaoHKq70TT1EqjMuOgIzNF2mRXp1Y0UpGxUdCOW+MF 5DbXX8EKny7PoPE/GmH1PJC4r6aMkzAOfRFmkYys5QX5rDkp//M+wDKINthQeGY3EpQ6KP EAaEXuGaBFdTcH60PUYp+otIcm00ks2Izi0ZiLULjCj66Wdlbdg/UcN83r4HWA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1689541236; a=rsa-sha256; cv=none; b=P6r1RJUvS40Jls+/kVPtOF76tqExBJ4E1ZLmbvqjQ1d/E1JSARIxu+xFJguQEpMaOAmUSF 00RSlF9H9SvKMSW1wDPsSri0fWqewVbYnCYmAB2IYr+pD7aA6Rkoxa8UZezyrz8Y5xCqA5 pbdn9Ynnh61tyXfrhBD7Y6fCIOzOuqcu3uvqID87ubPRoonAkGjC+9dVfUDUH+vYHhJ7bF KYjEXhx3z6FjV//iJksxHUnO9GapbabKJymMaogj/UfLAVGQjuDEDHEzIWeaow0iPvlioT VTxPMnbozCYFk55Xy87Obw2tTUT+WGVfTnK/Gpp2W5dc1B5I9DAXx7mcvK3UQQ== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4R3yKc4xtszxRh for ; Sun, 16 Jul 2023 21:00:36 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 36GL0aoE070891 for ; Sun, 16 Jul 2023 21:00:36 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Received: (from bugzilla@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 36GL0aD5070890 for scsi@FreeBSD.org; Sun, 16 Jul 2023 21:00:36 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <202307162100.36GL0aD5070890@kenobi.freebsd.org> X-Authentication-Warning: kenobi.freebsd.org: bugzilla set sender to bugzilla-noreply@FreeBSD.org using -f From: bugzilla-noreply@FreeBSD.org To: scsi@FreeBSD.org Subject: Problem reports for scsi@FreeBSD.org that need special attention Date: Sun, 16 Jul 2023 21:00:36 +0000 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="16895412364.F8686e.67715" Content-Transfer-Encoding: 7bit X-ThisMailContainsUnwantedMimeParts: N --16895412364.F8686e.67715 Date: Sun, 16 Jul 2023 21:00:36 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 221952 | cam iosched: Fix trim statistics 1 problems total for which you should take action. --16895412364.F8686e.67715 Date: Sun, 16 Jul 2023 21:00:36 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8"
The following is a listing of current problems submitted by FreeBSD users,
which need special attention. These represent problem reports covering
all versions including experimental development code and obsolete releases.

Status      |    Bug Id | Description
------------+-----------+---------------------------------------------------
Open        |    221952 | cam iosched: Fix trim statistics

1 problems total for which you should take action.
--16895412364.F8686e.67715-- From nobody Wed Jul 19 15:41:37 2023 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R5g6Q0WJcz4dTBQ for ; Wed, 19 Jul 2023 15:41:50 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R5g6N4PFKz438L for ; Wed, 19 Jul 2023 15:41:48 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20221208.gappssmtp.com header.s=20221208 header.b=O8WN1ow0; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2a00:1450:4864:20::52f) smtp.mailfrom=wlosh@bsdimp.com; dmarc=none Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-51a52a7d859so2042558a12.0 for ; Wed, 19 Jul 2023 08:41:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1689781306; x=1692373306; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+vilaCcfpgA2rPjdFz2oVChGSsfdXnUn+suiTIVewmg=; b=O8WN1ow0CtXWCXuq7HapWhjMI8r5ezdBbRwp0sYiNBi8TjG/Q2Ay8OwoYrhXAs2xqV bxrjLJxKbUPJnDR49qz8QBSUr1+t3D5vsJq6oZxLrGP73wHfHBAWqb0WOa7/PD8PU5ym 1DUzq0Q9Qk1/CaWJpNPaqFz0dD0xBp80kyeiShf17ClAT+muqNVGw8lLUUizyD+aCPn3 +NZ9CzU/hv4UFG7HGI6+Fazfur5zwLR2sE8CibZHbQuMdNVzoYq8Lbj0GtvGRm+nYgPw MEmA3bVym5zAQ65/fNaBFwa/PQsz8AUr9tq5flqGhCNjh9y6yO0IoLyAqVBLKDEDBbU/ s/4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689781306; x=1692373306; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+vilaCcfpgA2rPjdFz2oVChGSsfdXnUn+suiTIVewmg=; b=U3RnBLKvNCwEq4lUX211KKQ4I1VsmwxCpADhjL1b4GHQqTdLsV9sATW9E6AY+gddoD bCdhERuLz7vW6ZM/vYlNWxktZ9SzBOqWyW1ENSmj4C6MJ3xLwMvCLfeNCAe23ChR79FG KVvhtb7v6gqYK82e0N0C+psGVW5+iNdHj7LhsenrGYnS9yoiQ6voM1zHCa6yfQuzF2lI nTGJYNdz5Yb3hfl0T2z0Rvz3WegmikUT5UeX60r0mluSFA4yXoCtjniP6dc8tQR1uOMk ECZkP8LY+l/4USKCtkBugVrFetUDAbMau/igXllkBLj48krzDRKIhhOeJ+Den5+gHVh3 s1mQ== X-Gm-Message-State: ABy/qLayaBqilaXF76SESiJ8BTls/gd+kJJjTN4KC/VwsKJEQYU3Jplu F72d9QRrtfZ2dawX9TmdOYTL7p7FP6VIWH6efO9NlA== X-Google-Smtp-Source: APBJJlFw8wJDLLKr429JIGs5vC9Hmob7l+sds8Zx2EMCIzwRAa0C2EBDYhZdyUrDjfV0+NRfxiwPk6rvUH+jYFaVVgs= X-Received: by 2002:a50:fb19:0:b0:521:6275:c9af with SMTP id d25-20020a50fb19000000b005216275c9afmr2869040edq.7.1689781306447; Wed, 19 Jul 2023 08:41:46 -0700 (PDT) List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Wed, 19 Jul 2023 09:41:37 -0600 Message-ID: Subject: Re: ASC/ASCQ Review To: Alan Somers Cc: scsi@freebsd.org Content-Type: multipart/alternative; boundary="00000000000084edec0600d8debe" X-Spamd-Result: default: False [-2.99 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.994]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20221208.gappssmtp.com:s=20221208]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; MLMMJ_DEST(0.00)[scsi@freebsd.org]; R_SPF_NA(0.00)[no SPF record]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ARC_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20221208.gappssmtp.com:+]; RCVD_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52f:from]; BLOCKLISTDE_FAIL(0.00)[2a00:1450:4864:20::52f:server fail]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; PREVIOUSLY_DELIVERED(0.00)[scsi@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DMARC_NA(0.00)[bsdimp.com]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-Rspamd-Queue-Id: 4R5g6N4PFKz438L X-Spamd-Bar: -- --00000000000084edec0600d8debe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable btw, it also occurs to me that if I do add a 'secondary' table, then you could use it to generate a unique errno and experiment with that w/o affecting the main code until that stuff was mature. I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs that I'd like to tag as 'if trying harder, retry, otherwise fail' since re-retry needs have changed a lot since cam was written in the late 90s and at least some of the asc/ascq pairs I'm looking at haven't changed since the initial import, but that's based on a tiny sampling of the data I have and is preliminary at best. I may just change it to reflect modern usage. Warner On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh wrote: > > > On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers wrote: > >> On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh wr= ote: >> > >> > >> > >> > On Fri, Jul 14, 2023, 11:12 AM Alan Somers wrote= : >> >> >> >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh = wrote: >> >> > >> >> > Greetings, >> >> > >> >> > i've been looking closely at failed drives for $WORK lately. I've >> noticed that a lot of errors that kinda sound like fatal errors have >> SS_RDEF set on them. >> >> > >> >> > What's the process for evaluating whether those error codes are >> worth retrying. There are several errors that we seem to be seeing >> (preliminary read of the data) before the drive gives up the ghost >> altogether. For those cases, I'd like to post more specific lists. Shoul= d I >> do that here? >> >> > >> >> > Independent of that, I may want to have a more aggressive 'fail >> fast' policy than is appropriate for my work load (we have a lot of data >> that's a copy of a copy of a copy, so if we lose it, we don't care: we'l= l >> just delete any files we can't read and get on with life, though I know >> others will have a more conservative attitude towards data that might be >> precious and unique). I can set the number of retries lower, I can do so= me >> other hacks for disks that tell the disk to fail faster, but I think par= t >> of the solution is going to have to be failing for some sense-code/ASC/A= SCQ >> tuples that we don't want to fail in upstream or the general case. I was >> thinking of identifying those and creating a 'global quirk table' that g= ets >> applied after the drive-specific quirk table that would let $WORK overri= de >> the defaults, while letting others keep the current behavior. IMHO, it >> would be better to have these separate rather than in the global data fo= r >> tracking upstream... >> >> > >> >> > Is that clear, or should I give concrete examples? >> >> > >> >> > Comments? >> >> > >> >> > Warner >> >> >> >> Basically, you want to change the retry counts for certain ASC/ASCQ >> >> codes only, on a site-by-site basis? That sounds reasonable. Would >> >> it be configurable at runtime or only at build time? >> > >> > >> > I'd like to change the default actions. But maybe we just do that for >> everyone and assume modern drives... >> > >> >> Also, I've been thinking lately that it would be real nice if READ >> >> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO. That >> >> would let consumers know that retries are pointless, but that the dat= a >> >> is probably healable. >> > >> > >> > Unlikely, unless you've tuned things to not try for long at recovery..= . >> > >> > But regardless... do you have a concrete example of a use case? There'= s >> a number of places that map any error to EIO. And I'd like a use case >> before we expand the errors the lower layers return... >> > >> > Warner >> >> My first use-case is a user-space FUSE file system. It only has >> access to errnos, not ASC/ASCQ codes. If we do as I suggest, then it >> could heal a READ UNRECOVERABLE by rewriting the sector, whereas other >> EIO errors aren't likely to be healed that way. >> > > Yea... but READ UNRECOVERABLE is kinda hit or miss... > > >> My second use-case is ZFS. zfsd treats checksum errors differently >> from I/O errors. A checksum error normally means that a read returned >> wrong data. But I think that READ UNRECOVERABLE should also count. >> After all, that means that the disk's media returned wrong data which >> was detected by the disk's own EDC/ECC. I've noticed that zfsd seems >> to fault disks too eagerly when their only problem is READ >> UNRECOVERABLE errors. Mapping it to EINTEGRITY, or even a new error >> code, would let zfsd be tuned better. >> > > EINTEGRITY would then mean two different things. UFS returns in when > checksums fail for critical filesystem errors. I'm not saying no, per se, > just that it conflates two different errors. > > I think both of these use cases would be better served by CAM's publishin= g > of the errors to devctl today. Here's some example data from a system I'm > looking at: > > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=3D"12= 345" > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04 cc 00= " > timestamp=3D1634739729.312068 > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial=3D"12= 345" > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00 c0 00= " > timestamp=3D1634739729.585541 > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D"1234= 5" > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=3D"28= 00 ad 1a > 35 96 00 00 56 00 " timestamp=3D1641979267.469064 > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D"1234= 5" > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=3D"28= 00 ad 1a > 35 96 00 01 5e 00 " timestamp=3D1642252539.693699 > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial=3D"1234= 6" > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB=3D"2a= 00 01 2b > c8 f6 00 07 81 00 " timestamp=3D1669603144.090835 > > Here we get the sense key, the asc and the ascq in the scsi_sense data > (I'm currently looking at expanding this to the entire sense buffer, sinc= e > it includes how hard the drive tried to read the data on media and hardwa= re > errors). It doesn't include nvme data, but does include ata data (I'll > have to add that data, now that I've noticed it is missing). With the > sense data and the CDB you know what kind of error you got, plus what blo= ck > didn't read/write correctly. With the extended sense data, you can find o= ut > even more details that are sense-key dependent... > > So I'm unsure that trying to shoehorn our imperfect knowledge of what's > retriable, fixable, should be written with zeros into the kernel and > converting that to a separate errno would give good results, and tapping > into this stream daemons that want to make more nuanced calls about disks > might be the better way to go. One of the things I'm planning for $WORK i= s > to enable the retry time limit of one of the mode pages so that we fail > faster and can just delete the file with the 'bad' block that we'd get > eventually if we allowed the full, default error processing to run, but > that 'slow path' processing kills performance for all other users of the > drive... I'm unsure how well that will work out (and I know I'm lucky th= at > I can always recover any data for my application since it's just a cache)= . > > I'd be interested to hear what others have to say here thought, since my > focus on this data is through the lense of my rather specialized > application... > > Warner > > P.S. That was generated with this rule if you wanted to play with it... > You'd have to translate absolute disk blocks to a partition and an offset > into the filesystem, then give the filesystem a chance to tell you what o= f > its data/metadata that block is used for... > > # Disk errors > notify 10 { > match "system" "CAM"; > match "subsystem" "periph"; > match "device" "[an]?da[0-9]+"; > action "logger -t diskerr -p daemon.info $_ timestamp=3D$timestam= p"; > }; > > --00000000000084edec0600d8debe Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
btw, it also occurs to me that if I do add a 'sec= ondary' table, then you could use it to generate a unique errno and exp= eriment
with that w/o affecting the main code until that stuff wa= s mature.

I'm not sure I'll do that now, s= ince I've found maybe 10 asc/ascq pairs that I'd like to tag as = 9;if trying harder, retry, otherwise fail' since re-retry needs have ch= anged a lot since cam was written in the late 90s and at least some of the = asc/ascq pairs I'm looking at haven't changed since the initial imp= ort, but that's based on a tiny sampling of the data I have and is prel= iminary at best. I may just change it to reflect modern usage.

Warner

On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Wa= rner Losh <imp@bsdimp.com> wrot= e:


On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Some= rs <asomers@fre= ebsd.org> wrote:
On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh <imp@bsdimp.com> wrote:
>
>
>
> On Fri, Jul 14, 2023, 11:12 AM Alan Somers <asomers@freebsd.org> wrote:
>>
>> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh <imp@bsdimp.com> wrote:<= br> >> >
>> > Greetings,
>> >
>> > i've been looking closely at failed drives for $WORK late= ly. I've noticed that a lot of errors that kinda sound like fatal error= s have SS_RDEF set on them.
>> >
>> > What's the process for evaluating whether those error cod= es are worth retrying. There are several errors that we seem to be seeing (= preliminary read of the data) before the drive gives up the ghost altogethe= r. For those cases, I'd like to post more specific lists. Should I do t= hat here?
>> >
>> > Independent of that, I may want to have a more aggressive = 9;fail fast' policy than is appropriate for my work load (we have a lot= of data that's a copy of a copy of a copy, so if we lose it, we don= 9;t care: we'll just delete any files we can't read and get on with= life, though I know others will have a more conservative attitude towards = data that might be precious and unique). I can set the number of retries lo= wer, I can do some other hacks for disks that tell the disk to fail faster,= but I think part of the solution is going to have to be failing for some s= ense-code/ASC/ASCQ tuples that we don't want to fail in upstream or the= general case. I was thinking of identifying those and creating a 'glob= al quirk table' that gets applied after the drive-specific quirk table = that would let $WORK override the defaults, while letting others keep the c= urrent behavior. IMHO, it would be better to have these separate rather tha= n in the global data for tracking upstream...
>> >
>> > Is that clear, or should I give concrete examples?
>> >
>> > Comments?
>> >
>> > Warner
>>
>> Basically, you want to change the retry counts for certain ASC/ASC= Q
>> codes only, on a site-by-site basis?=C2=A0 That sounds reasonable.= =C2=A0 Would
>> it be configurable at runtime or only at build time?
>
>
> I'd like to change the default actions. But maybe we just do that = for everyone and assume modern drives...
>
>> Also, I've been thinking lately that it would be real nice if = READ
>> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.=C2= =A0 That
>> would let consumers know that retries are pointless, but that the = data
>> is probably healable.
>
>
> Unlikely, unless you've tuned things to not try for long at recove= ry...
>
> But regardless... do you have a concrete example of a use case? There&= #39;s a number of places that map any error to EIO. And I'd like a use = case before we expand the errors the lower layers return...
>
> Warner

My first use-case is a user-space FUSE file system.=C2=A0 It only has
access to errnos, not ASC/ASCQ codes.=C2=A0 If we do as I suggest, then it<= br> could heal a READ UNRECOVERABLE by rewriting the sector, whereas other
EIO errors aren't likely to be healed that way.
Yea... but READ UNRECOVERABLE is kinda hit or miss...
=C2=A0
My second use-case is ZFS.=C2=A0 zfsd treats checksum errors differently from I/O errors.=C2=A0 A checksum error normally means that a read returned=
wrong data.=C2=A0 But I think that READ UNRECOVERABLE should also count. After all, that means that the disk's media returned wrong data which was detected by the disk's own EDC/ECC.=C2=A0 I've noticed that zfs= d seems
to fault disks too eagerly when their only problem is READ
UNRECOVERABLE errors.=C2=A0 Mapping it to EINTEGRITY, or even a new error code, would let zfsd be tuned better.

E= INTEGRITY would then mean two different things. UFS returns in when checksu= ms fail for critical=C2=A0filesystem errors. I'm not saying no, per se,= just that it conflates two different errors.

I th= ink both of these use cases would be better served by CAM's publishing = of the errors to devctl today. Here's some example data from a system I= 'm looking at:

system=3DCAM subsystem=3Dperiph= type=3Dtimeout device=3Dda36 serial=3D"12345" cam_status=3D"= ;0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04 cc 00 &quo= t; =C2=A0timestamp=3D1634739729.312068
system=3DCAM subsystem=3Dperiph t= ype=3Dtimeout device=3Dda36 serial=3D"12345" cam_status=3D"0= x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00 c0 00 "= =C2=A0timestamp=3D1634739729.585541
system=3DCAM subsystem=3Dperiph typ= e=3Derror device=3Dda36 serial=3D"12345" cam_status=3D"0x4cc= " scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB=3D"28 = 00 ad 1a 35 96 00 00 56 00 " timestamp=3D1641979267.469064
system= =3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial=3D"12345&q= uot; cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 0= 3 11 00" CDB=3D"28 00 ad 1a 35 96 00 01 5e 00 " =C2=A0timest= amp=3D1642252539.693699
system=3DCAM subsystem=3Dperiph type= =3Derror device=3Dda39 serial=3D"12346" cam_status=3D"0x4cc&= quot; scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB=3D"2a 0= 0 01 2b c8 f6 00 07 81 00 " =C2=A0timestamp=3D1669603144.090835

Here we get the sense key, the asc and the ascq in t= he scsi_sense data (I'm currently looking at expanding this to the enti= re sense buffer, since it includes how hard the drive tried to read the dat= a on media and hardware errors).=C2=A0 It doesn't include nvme data, bu= t does include ata data (I'll have to add that data, now that I've = noticed it is missing).=C2=A0 With the sense data and the CDB you know what= kind of error you got, plus what block didn't read/write correctly. Wi= th the extended sense data, you can find out even more details that are sen= se-key dependent...

So I'm unsure that trying = to shoehorn our imperfect knowledge of what's retriable, fixable, shoul= d be written with zeros into the kernel and converting that to a separate e= rrno would give good results, and tapping into this stream daemons that wan= t to make more nuanced calls about disks might be the better way to go. One= of the things I'm planning for $WORK is to enable the retry time limit= of one of the mode pages so that we fail faster and can just delete the fi= le with the 'bad' block that we'd get eventually if we allowed = the full, default error processing to run, but that 'slow path' pro= cessing kills performance for all other users of the drive...=C2=A0 I'm= unsure how well that will work out (and I know I'm lucky that I can al= ways recover any data for my application since it's just a cache).

I'd be interested to hear what others have to say = here thought, since my focus on this data is through the lense of my rather= specialized application...

Warner

<= /div>
P.S. That was generated with this rule if you wanted to play with= it... You'd have to translate absolute disk blocks to a partition and = an offset into the filesystem, then give the filesystem a chance to tell yo= u what of its data/metadata that block is used for...

<= div># Disk errors
notify 10 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 match "= ;system" =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"CAM";
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 match "subsystem" =C2=A0 =C2=A0 =C2=A0 "= ;periph";
=C2=A0 =C2=A0 =C2=A0 =C2=A0 match "device" =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"[an]?da[0-9]+";
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 action "logger -t diskerr -p daemon.info $_ timestamp=3D$timestamp";
}= ;

--00000000000084edec0600d8debe-- From nobody Fri Jul 21 03:18:44 2023 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6ZXF0gwdz4nrT7 for ; Fri, 21 Jul 2023 03:18:53 +0000 (UTC) (envelope-from dgilbert@interlog.com) Received: from mp-relay-01.fibernetics.ca (mp-relay-01.fibernetics.ca [208.85.217.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4R6ZXD1ZSGz4l1x; Fri, 21 Jul 2023 03:18:52 +0000 (UTC) (envelope-from dgilbert@interlog.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of dgilbert@interlog.com designates 208.85.217.136 as permitted sender) smtp.mailfrom=dgilbert@interlog.com; dmarc=none Received: from mailpool-fe-01.fibernetics.ca (mailpool-fe-01.fibernetics.ca [208.85.217.144]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mp-relay-01.fibernetics.ca (Postfix) with ESMTPS id 8DB1CE1C04; Fri, 21 Jul 2023 03:18:45 +0000 (UTC) Received: from localhost (mailpool-mx-01.fibernetics.ca [208.85.217.140]) by mailpool-fe-01.fibernetics.ca (Postfix) with ESMTP id 7DD8B3CAB7; Fri, 21 Jul 2023 03:18:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at X-Spam-Flag: NO X-Spam-Score: -0.199 X-Spam-Level: X-Spam-Status: No, score=-0.199 tagged_above=-999 required=5 tests=[ALL_TRUSTED=-1, BAYES_50=0.8, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no Received: from mailpool-fe-01.fibernetics.ca ([208.85.217.144]) by localhost (mail-mx-01.fibernetics.ca [208.85.217.140]) (amavisd-new, port 10024) with ESMTP id pksqIB5eRMRX; Fri, 21 Jul 2023 03:18:44 +0000 (UTC) Received: from [192.168.48.17] (host-192.252-165-26.dyn.295.ca [192.252.165.26]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) (Authenticated sender: dgilbert@interlog.com) by mail.ca.inter.net (Postfix) with ESMTPSA id 84D8A3CAB5; Fri, 21 Jul 2023 03:18:44 +0000 (UTC) Message-ID: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com> Date: Thu, 20 Jul 2023 23:18:44 -0400 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Reply-To: dgilbert@interlog.com Subject: Re: ASC/ASCQ Review Content-Language: en-CA To: Warner Losh , Alan Somers Cc: scsi@freebsd.org References: From: Douglas Gilbert In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; NEURAL_HAM_SHORT(-1.00)[-0.998]; R_SPF_ALLOW(-0.20)[+ip4:208.85.217.0/24]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MLMMJ_DEST(0.00)[scsi@freebsd.org]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:36493, ipnet:208.85.216.0/21, country:CA]; MIME_TRACE(0.00)[0:+]; HAS_REPLYTO(0.00)[dgilbert@interlog.com]; DMARC_NA(0.00)[interlog.com]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4R6ZXD1ZSGz4l1x X-Spamd-Bar: --- On 2023-07-19 11:41, Warner Losh wrote: > btw, it also occurs to me that if I do add a 'secondary' table, then you could > use it to generate a unique errno and experiment > with that w/o affecting the main code until that stuff was mature. > > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs that I'd > like to tag as 'if trying harder, retry, otherwise fail' since re-retry needs > have changed a lot since cam was written in the late 90s and at least some of > the asc/ascq pairs I'm looking at haven't changed since the initial import, but > that's based on a tiny sampling of the data I have and is preliminary at best. I > may just change it to reflect modern usage. Hi, If you are looking for up-to-date [20230325] asc/ascq tables in C you could borrow mine at https://github.com/doug-gilbert/sg3_utils in lib/sg_lib_data.c starting at line 745 . In testing/sg_chk_asc.c is a small test program for checking that the table in sg_lib_data.c agrees with the file that T10 supplies: https://www.t10.org/lists/asc-num.txt Doug Gilbert > On Fri, Jul 14, 2023 at 5:34 PM Warner Losh > wrote: > > > > On Fri, Jul 14, 2023 at 12:31 PM Alan Somers > wrote: > > On Fri, Jul 14, 2023 at 11:05 AM Warner Losh > wrote: > > > > > > > > On Fri, Jul 14, 2023, 11:12 AM Alan Somers > wrote: > >> > >> On Thu, Jul 13, 2023 at 12:14 PM Warner Losh > wrote: > >> > > >> > Greetings, > >> > > >> > i've been looking closely at failed drives for $WORK lately. I've > noticed that a lot of errors that kinda sound like fatal errors have > SS_RDEF set on them. > >> > > >> > What's the process for evaluating whether those error codes are > worth retrying. There are several errors that we seem to be seeing > (preliminary read of the data) before the drive gives up the ghost > altogether. For those cases, I'd like to post more specific lists. > Should I do that here? > >> > > >> > Independent of that, I may want to have a more aggressive 'fail > fast' policy than is appropriate for my work load (we have a lot of data > that's a copy of a copy of a copy, so if we lose it, we don't care: > we'll just delete any files we can't read and get on with life, though I > know others will have a more conservative attitude towards data that > might be precious and unique). I can set the number of retries lower, I > can do some other hacks for disks that tell the disk to fail faster, but > I think part of the solution is going to have to be failing for some > sense-code/ASC/ASCQ tuples that we don't want to fail in upstream or the > general case. I was thinking of identifying those and creating a 'global > quirk table' that gets applied after the drive-specific quirk table that > would let $WORK override the defaults, while letting others keep the > current behavior. IMHO, it would be better to have these separate rather > than in the global data for tracking upstream... > >> > > >> > Is that clear, or should I give concrete examples? > >> > > >> > Comments? > >> > > >> > Warner > >> > >> Basically, you want to change the retry counts for certain ASC/ASCQ > >> codes only, on a site-by-site basis?  That sounds reasonable.  Would > >> it be configurable at runtime or only at build time? > > > > > > I'd like to change the default actions. But maybe we just do that for > everyone and assume modern drives... > > > >> Also, I've been thinking lately that it would be real nice if READ > >> UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.  That > >> would let consumers know that retries are pointless, but that the data > >> is probably healable. > > > > > > Unlikely, unless you've tuned things to not try for long at recovery... > > > > But regardless... do you have a concrete example of a use case? > There's a number of places that map any error to EIO. And I'd like a use > case before we expand the errors the lower layers return... > > > > Warner > > My first use-case is a user-space FUSE file system.  It only has > access to errnos, not ASC/ASCQ codes.  If we do as I suggest, then it > could heal a READ UNRECOVERABLE by rewriting the sector, whereas other > EIO errors aren't likely to be healed that way. > > > Yea... but READ UNRECOVERABLE is kinda hit or miss... > > My second use-case is ZFS.  zfsd treats checksum errors differently > from I/O errors.  A checksum error normally means that a read returned > wrong data.  But I think that READ UNRECOVERABLE should also count. > After all, that means that the disk's media returned wrong data which > was detected by the disk's own EDC/ECC.  I've noticed that zfsd seems > to fault disks too eagerly when their only problem is READ > UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new error > code, would let zfsd be tuned better. > > > EINTEGRITY would then mean two different things. UFS returns in when > checksums fail for critical filesystem errors. I'm not saying no, per se, > just that it conflates two different errors. > > I think both of these use cases would be better served by CAM's publishing > of the errors to devctl today. Here's some example data from a system I'm > looking at: > > system=CAM subsystem=periph type=timeout device=da36 serial="12345" > cam_status="0x44b" timeout=30000 CDB="28 00 4e b7 cb a3 00 04 cc 00 " >  timestamp=1634739729.312068 > system=CAM subsystem=periph type=timeout device=da36 serial="12345" > cam_status="0x44b" timeout=30000 CDB="28 00 20 6b d5 56 00 00 c0 00 " >  timestamp=1634739729.585541 > system=CAM subsystem=periph type=error device=da36 serial="12345" > cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a > 35 96 00 00 56 00 " timestamp=1641979267.469064 > system=CAM subsystem=periph type=error device=da36 serial="12345" > cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00 ad 1a > 35 96 00 01 5e 00 "  timestamp=1642252539.693699 > system=CAM subsystem=periph type=error device=da39 serial="12346" > cam_status="0x4cc" scsi_status=2 scsi_sense="72 04 02 00" CDB="2a 00 01 2b > c8 f6 00 07 81 00 "  timestamp=1669603144.090835 > > Here we get the sense key, the asc and the ascq in the scsi_sense data (I'm > currently looking at expanding this to the entire sense buffer, since it > includes how hard the drive tried to read the data on media and hardware > errors).  It doesn't include nvme data, but does include ata data (I'll have > to add that data, now that I've noticed it is missing).  With the sense data > and the CDB you know what kind of error you got, plus what block didn't > read/write correctly. With the extended sense data, you can find out even > more details that are sense-key dependent... > > So I'm unsure that trying to shoehorn our imperfect knowledge of what's > retriable, fixable, should be written with zeros into the kernel and > converting that to a separate errno would give good results, and tapping > into this stream daemons that want to make more nuanced calls about disks > might be the better way to go. One of the things I'm planning for $WORK is > to enable the retry time limit of one of the mode pages so that we fail > faster and can just delete the file with the 'bad' block that we'd get > eventually if we allowed the full, default error processing to run, but that > 'slow path' processing kills performance for all other users of the > drive...  I'm unsure how well that will work out (and I know I'm lucky that > I can always recover any data for my application since it's just a cache). > > I'd be interested to hear what others have to say here thought, since my > focus on this data is through the lense of my rather specialized application... > > Warner > > P.S. That was generated with this rule if you wanted to play with it... > You'd have to translate absolute disk blocks to a partition and an offset > into the filesystem, then give the filesystem a chance to tell you what of > its data/metadata that block is used for... > > # Disk errors > notify 10 { >         match "system"          "CAM"; >         match "subsystem"       "periph"; >         match "device"          "[an]?da[0-9]+"; >         action "logger -t diskerr -p daemon.info $_ > timestamp=$timestamp"; > }; > From nobody Fri Jul 21 03:26:07 2023 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4R6Zhs0hNqz4nwLH for ; Fri, 21 Jul 2023 03:26:21 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x62f.google.com (mail-ej1-x62f.google.com [IPv6:2a00:1450:4864:20::62f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4R6Zhr5ZWgz3DBV for ; Fri, 21 Jul 2023 03:26:20 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-9922d6f003cso237769866b.0 for ; Thu, 20 Jul 2023 20:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20221208.gappssmtp.com; s=20221208; t=1689909979; x=1690514779; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=; b=28sSZa5jwc1eLWJxDuAgu1vGy1svseVYTKYxA/qnSdTdTzvqUP6sTFgp0ar4lgJPEF BZfh1ykxhtBQIIirtkz+D2DHS5/6/s9AZISGRoTVtjXdIenDL5y7HlRk/rSANKU2nfnP ozjaI7lEbymfeM0uP5tUX2AC695xCs/xTdOXT0/+bF6S5uZTzLRGz94Vl4qmpuXZzQqq DDPBkh/kx2YPafAS+W3C3aAntrPZX5I4UAO8p6cJqjkSAD1UWwRHitqTz3WUObdveqjc bO+SKEBeKApDTK8+hqUFr/IibzQD8e7YKDw344bJKIb1SSnYQ8zLRv6iC8jO/yyUi4dD D6pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689909979; x=1690514779; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lBPJQQLUPIAujvywawulGetLsBEuZ+/GDH3xHRepNYU=; b=WR2UB7V4RZmJyFh+ErCmArImzpuJ9twRjWfQ8zHRf9SNC8X2rll1IuCo+A6figTEMt YZHPT5YVqImF+OIsy6XORvi46FiSGWlJfQwN2ybOabLIifZsuI7JYcB3JJb5229dxME3 NIunmGyp35WRVS/hyQZUw0C8EtvNxs4a/MWxxs0199ca7KRl+/48IWkug5Yv3JTZReg5 yDXDs2bKI99AZJPW8xk6H8ceXYO6ifWjb4B7D7uz8DnO57A5MVmqkQqDfSyvinQ7hqp7 gueMNVHw34iHEcmCGibtDDgsVUyHUNqGLTt7QMKocaGr+VrZHMn+AWiW+zV5UCaS34tn kuiw== X-Gm-Message-State: ABy/qLbKHRjgyKBhhi8sHwJHfDu3WIgFj7ckings//Ab9UgPDpWeV+gj j55xJbQapdLwUyhhceV5O8a0j6YcHt8ozPOZYh7EsQ== X-Google-Smtp-Source: APBJJlEr4sZPZKt1pSvRmqKfaAP9GYJSKGJ8QkUVQZLj5hdhgIZselRTINa8olnaY5hG5PtRxVQS8bHbanBl3K9H9PY= X-Received: by 2002:a17:906:8451:b0:994:1fd2:cf96 with SMTP id e17-20020a170906845100b009941fd2cf96mr588912ejy.0.1689909978614; Thu, 20 Jul 2023 20:26:18 -0700 (PDT) List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 References: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com> In-Reply-To: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com> From: Warner Losh Date: Thu, 20 Jul 2023 21:26:07 -0600 Message-ID: Subject: Re: ASC/ASCQ Review To: dgilbert@interlog.com Cc: Alan Somers , scsi@freebsd.org Content-Type: multipart/alternative; boundary="000000000000fa60eb0600f6d3dc" X-Rspamd-Queue-Id: 4R6Zhr5ZWgz3DBV X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated --000000000000fa60eb0600f6d3dc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert wrote= : > On 2023-07-19 11:41, Warner Losh wrote: > > btw, it also occurs to me that if I do add a 'secondary' table, then yo= u > could > > use it to generate a unique errno and experiment > > with that w/o affecting the main code until that stuff was mature. > > > > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs > that I'd > > like to tag as 'if trying harder, retry, otherwise fail' since re-retry > needs > > have changed a lot since cam was written in the late 90s and at least > some of > > the asc/ascq pairs I'm looking at haven't changed since the initial > import, but > > that's based on a tiny sampling of the data I have and is preliminary a= t > best. I > > may just change it to reflect modern usage. > > Hi, > If you are looking for up-to-date [20230325] asc/ascq tables in C you cou= ld > borrow mine at https://github.com/doug-gilbert/sg3_utils in > lib/sg_lib_data.c > starting at line 745 . > In testing/sg_chk_asc.c is a small test program for checking that the > table in > sg_lib_data.c agrees with the file that T10 supplies: > https://www.t10.org/lists/asc-num.txt Thanks for the pointer. I'd already updated CAM's tables for that... what I'm doing now is to make sure CAM's reactions to the asc/ascq is good for the modern drives... it's a good idea though to create a program for our table to match... Warner > Doug Gilbert > > > On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh > > wrote: > > > > > > > > On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers > > wrote: > > > > On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh > > wrote: > > > > > > > > > > > > On Fri, Jul 14, 2023, 11:12 AM Alan Somers < > asomers@freebsd.org > > > wrote: > > >> > > >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh > > wrote: > > >> > > > >> > Greetings, > > >> > > > >> > i've been looking closely at failed drives for $WORK > lately. I've > > noticed that a lot of errors that kinda sound like fatal errors > have > > SS_RDEF set on them. > > >> > > > >> > What's the process for evaluating whether those error > codes are > > worth retrying. There are several errors that we seem to be > seeing > > (preliminary read of the data) before the drive gives up the > ghost > > altogether. For those cases, I'd like to post more specific > lists. > > Should I do that here? > > >> > > > >> > Independent of that, I may want to have a more aggressive > 'fail > > fast' policy than is appropriate for my work load (we have a lo= t > of data > > that's a copy of a copy of a copy, so if we lose it, we don't > care: > > we'll just delete any files we can't read and get on with life, > though I > > know others will have a more conservative attitude towards data > that > > might be precious and unique). I can set the number of retries > lower, I > > can do some other hacks for disks that tell the disk to fail > faster, but > > I think part of the solution is going to have to be failing for > some > > sense-code/ASC/ASCQ tuples that we don't want to fail in > upstream or the > > general case. I was thinking of identifying those and creating = a > 'global > > quirk table' that gets applied after the drive-specific quirk > table that > > would let $WORK override the defaults, while letting others kee= p > the > > current behavior. IMHO, it would be better to have these > separate rather > > than in the global data for tracking upstream... > > >> > > > >> > Is that clear, or should I give concrete examples? > > >> > > > >> > Comments? > > >> > > > >> > Warner > > >> > > >> Basically, you want to change the retry counts for certain > ASC/ASCQ > > >> codes only, on a site-by-site basis? That sounds > reasonable. Would > > >> it be configurable at runtime or only at build time? > > > > > > > > > I'd like to change the default actions. But maybe we just do > that for > > everyone and assume modern drives... > > > > > >> Also, I've been thinking lately that it would be real nice > if READ > > >> UNRECOVERABLE could be translated to EINTEGRITY instead of > EIO. That > > >> would let consumers know that retries are pointless, but > that the data > > >> is probably healable. > > > > > > > > > Unlikely, unless you've tuned things to not try for long at > recovery... > > > > > > But regardless... do you have a concrete example of a use > case? > > There's a number of places that map any error to EIO. And I'd > like a use > > case before we expand the errors the lower layers return... > > > > > > Warner > > > > My first use-case is a user-space FUSE file system. It only ha= s > > access to errnos, not ASC/ASCQ codes. If we do as I suggest, > then it > > could heal a READ UNRECOVERABLE by rewriting the sector, wherea= s > other > > EIO errors aren't likely to be healed that way. > > > > > > Yea... but READ UNRECOVERABLE is kinda hit or miss... > > > > My second use-case is ZFS. zfsd treats checksum errors > differently > > from I/O errors. A checksum error normally means that a read > returned > > wrong data. But I think that READ UNRECOVERABLE should also > count. > > After all, that means that the disk's media returned wrong data > which > > was detected by the disk's own EDC/ECC. I've noticed that zfsd > seems > > to fault disks too eagerly when their only problem is READ > > UNRECOVERABLE errors. Mapping it to EINTEGRITY, or even a new > error > > code, would let zfsd be tuned better. > > > > > > EINTEGRITY would then mean two different things. UFS returns in whe= n > > checksums fail for critical filesystem errors. I'm not saying no, > per se, > > just that it conflates two different errors. > > > > I think both of these use cases would be better served by CAM's > publishing > > of the errors to devctl today. Here's some example data from a > system I'm > > looking at: > > > > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04= cc 00 " > > timestamp=3D1634739729.312068 > > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00= c0 00 " > > timestamp=3D1634739729.585541 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB= =3D"28 00 > ad 1a > > 35 96 00 00 56 00 " timestamp=3D1641979267.469064 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB= =3D"28 00 > ad 1a > > 35 96 00 01 5e 00 " timestamp=3D1642252539.693699 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial= =3D"12346" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB= =3D"2a 00 > 01 2b > > c8 f6 00 07 81 00 " timestamp=3D1669603144.090835 > > > > Here we get the sense key, the asc and the ascq in the scsi_sense > data (I'm > > currently looking at expanding this to the entire sense buffer, > since it > > includes how hard the drive tried to read the data on media and > hardware > > errors). It doesn't include nvme data, but does include ata data > (I'll have > > to add that data, now that I've noticed it is missing). With the > sense data > > and the CDB you know what kind of error you got, plus what block > didn't > > read/write correctly. With the extended sense data, you can find ou= t > even > > more details that are sense-key dependent... > > > > So I'm unsure that trying to shoehorn our imperfect knowledge of > what's > > retriable, fixable, should be written with zeros into the kernel an= d > > converting that to a separate errno would give good results, and > tapping > > into this stream daemons that want to make more nuanced calls about > disks > > might be the better way to go. One of the things I'm planning for > $WORK is > > to enable the retry time limit of one of the mode pages so that we > fail > > faster and can just delete the file with the 'bad' block that we'd > get > > eventually if we allowed the full, default error processing to run, > but that > > 'slow path' processing kills performance for all other users of the > > drive... I'm unsure how well that will work out (and I know I'm > lucky that > > I can always recover any data for my application since it's just a > cache). > > > > I'd be interested to hear what others have to say here thought, > since my > > focus on this data is through the lense of my rather specialized > application... > > > > Warner > > > > P.S. That was generated with this rule if you wanted to play with > it... > > You'd have to translate absolute disk blocks to a partition and an > offset > > into the filesystem, then give the filesystem a chance to tell you > what of > > its data/metadata that block is used for... > > > > # Disk errors > > notify 10 { > > match "system" "CAM"; > > match "subsystem" "periph"; > > match "device" "[an]?da[0-9]+"; > > action "logger -t diskerr -p daemon.info < > http://daemon.info> $_ > > timestamp=3D$timestamp"; > > }; > > > > --000000000000fa60eb0600f6d3dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <d= gilbert@interlog.com> wrote:
On 2023-07-19 11:41, Warner Losh wrote:
> btw, it also occurs to me that if I do add a 'secondary' table= , then you could
> use it to generate a unique errno and experiment
> with that w/o affecting the main code until that stuff was mature.
>
> I'm not sure I'll do that now, since I've found maybe 10 a= sc/ascq pairs that I'd
> like to tag as 'if trying harder, retry, otherwise fail' since= re-retry needs
> have changed a lot since cam was written in the late 90s and at least = some of
> the asc/ascq pairs I'm looking at haven't changed since the in= itial import, but
> that's based on a tiny sampling of the data I have and is prelimin= ary at best. I
> may just change it to reflect modern usage.

Hi,
If you are looking for up-to-date [20230325] asc/ascq tables in C you could=
borrow mine at https://github.com/dou= g-gilbert/sg3_utils in lib/sg_lib_data.c
starting at line 745 .
In testing/sg_chk_asc.c is a small test program for checking that the table= in
sg_lib_data.c agrees with the file that T10 supplies:
=C2=A0 =C2=A0 =C2=A0 https://www.t10.org= /lists/asc-num.txt

=
Thanks for the pointer. I'd already updated CAM's= tables for that...

what= I'm doing now is to make sure CAM's reactions to the asc/ascq is g= ood for the modern drives... it's a good idea though to create a progra= m for our table to match...

Warner


Doug Gilbert

> On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <imp@bsd= imp.com
> <mailto:imp@bsdimp.com>> wrote:
>
>
>
>=C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somer= s <asomers@freebsd.org
>=C2=A0 =C2=A0 =C2=A0<mailto:asomers@freebsd.org>&= gt; wrote:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 11:05=E2=80= =AFAM Warner Losh <imp@bsdimp.com
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:imp@bsdimp.com&= gt;> wrote:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > On Fri, Jul 14, 2023, 11:12 AM = Alan Somers <asomers@freebsd.org
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:asomers@freebs= d.org>> wrote:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> On Thu, Jul 13, 2023 at 12:= 14=E2=80=AFPM Warner Losh <imp@bsdimp.com
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:imp@bsdimp.com&= gt;> wrote:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Greetings,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > i've been looking = closely at failed drives for $WORK lately. I've
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0noticed that a lot of errors that kin= da sound like fatal errors have
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SS_RDEF set on them.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > What's the process= for evaluating whether those error codes are
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0worth retrying. There are several err= ors that we seem to be seeing
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(preliminary read of the data) before= the drive gives up the ghost
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0altogether. For those cases, I'd = like to post more specific lists.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Should I do that here?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Independent of that, I= may want to have a more aggressive 'fail
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fast' policy than is appropriate = for my work load (we have a lot of data
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that's a copy of a copy of a copy= , so if we lose it, we don't care:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0we'll just delete any files we ca= n't read and get on with life, though I
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0know others will have a more conserva= tive attitude towards data that
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0might be precious and unique). I can = set the number of retries lower, I
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0can do some other hacks for disks tha= t tell the disk to fail faster, but
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I think part of the solution is going= to have to be failing for some
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sense-code/ASC/ASCQ tuples that we do= n't want to fail in upstream or the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0general case. I was thinking of ident= ifying those and creating a 'global
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0quirk table' that gets applied af= ter the drive-specific quirk table that
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would let $WORK override the defaults= , while letting others keep the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0current behavior. IMHO, it would be b= etter to have these separate rather
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0than in the global data for tracking = upstream...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Is that clear, or shou= ld I give concrete examples?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Comments?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Warner
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Basically, you want to chan= ge the retry counts for certain ASC/ASCQ
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> codes only, on a site-by-si= te basis?=C2=A0 That sounds reasonable.=C2=A0 Would
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> it be configurable at runti= me or only at build time?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > I'd like to change the defa= ult actions. But maybe we just do that for
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0everyone and assume modern drives...<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Also, I've been thinkin= g lately that it would be real nice if READ
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> UNRECOVERABLE could be tran= slated to EINTEGRITY instead of EIO.=C2=A0 That
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> would let consumers know th= at retries are pointless, but that the data
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> is probably healable.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Unlikely, unless you've tun= ed things to not try for long at recovery...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > But regardless... do you have a= concrete example of a use case?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0There's a number of places that m= ap any error to EIO. And I'd like a use
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0case before we expand the errors the = lower layers return...
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Warner
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My first use-case is a user-space FUS= E file system.=C2=A0 It only has
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0access to errnos, not ASC/ASCQ codes.= =C2=A0 If we do as I suggest, then it
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0could heal a READ UNRECOVERABLE by re= writing the sector, whereas other
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EIO errors aren't likely to be he= aled that way.
>
>
>=C2=A0 =C2=A0 =C2=A0Yea... but READ UNRECOVERABLE is kinda hit or miss.= ..
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My second use-case is ZFS.=C2=A0 zfsd= treats checksum errors differently
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from I/O errors.=C2=A0 A checksum err= or normally means that a read returned
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0wrong data.=C2=A0 But I think that RE= AD UNRECOVERABLE should also count.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0After all, that means that the disk&#= 39;s media returned wrong data which
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was detected by the disk's own ED= C/ECC.=C2=A0 I've noticed that zfsd seems
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to fault disks too eagerly when their= only problem is READ
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UNRECOVERABLE errors.=C2=A0 Mapping i= t to EINTEGRITY, or even a new error
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0code, would let zfsd be tuned better.=
>
>
>=C2=A0 =C2=A0 =C2=A0EINTEGRITY would then mean two different things. UF= S returns in when
>=C2=A0 =C2=A0 =C2=A0checksums fail for critical=C2=A0filesystem errors.= I'm not saying no, per se,
>=C2=A0 =C2=A0 =C2=A0just that it conflates two different errors.
>
>=C2=A0 =C2=A0 =C2=A0I think both of these use cases would be better ser= ved by CAM's publishing
>=C2=A0 =C2=A0 =C2=A0of the errors to devctl today. Here's some exam= ple data from a system I'm
>=C2=A0 =C2=A0 =C2=A0looking at:
>
>=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi= ce=3Dda36 serial=3D"12345"
>=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x44b" timeout=3D30000 CDB= =3D"28 00 4e b7 cb a3 00 04 cc 00 "
>=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.312068
>=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi= ce=3Dda36 serial=3D"12345"
>=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x44b" timeout=3D30000 CDB= =3D"28 00 20 6b d5 56 00 00 c0 00 "
>=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.585541
>=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda36 serial=3D"12345"
>=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 03 11 00" CDB=3D"28 00 ad 1a
>=C2=A0 =C2=A0 =C2=A035 96 00 00 56 00 " timestamp=3D1641979267.469= 064
>=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda36 serial=3D"12345"
>=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 03 11 00" CDB=3D"28 00 ad 1a
>=C2=A0 =C2=A0 =C2=A035 96 00 01 5e 00 " =C2=A0timestamp=3D16422525= 39.693699
>=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda39 serial=3D"12346"
>=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 04 02 00" CDB=3D"2a 00 01 2b
>=C2=A0 =C2=A0 =C2=A0c8 f6 00 07 81 00 " =C2=A0timestamp=3D16696031= 44.090835
>
>=C2=A0 =C2=A0 =C2=A0Here we get the sense key, the asc and the ascq in = the scsi_sense data (I'm
>=C2=A0 =C2=A0 =C2=A0currently looking at expanding this to the entire s= ense buffer, since it
>=C2=A0 =C2=A0 =C2=A0includes how hard the drive tried to read the data = on media and hardware
>=C2=A0 =C2=A0 =C2=A0errors).=C2=A0 It doesn't include nvme data, bu= t does include ata data (I'll have
>=C2=A0 =C2=A0 =C2=A0to add that data, now that I've noticed it is m= issing).=C2=A0 With the sense data
>=C2=A0 =C2=A0 =C2=A0and the CDB you know what kind of error you got, pl= us what block didn't
>=C2=A0 =C2=A0 =C2=A0read/write correctly. With the extended sense data,= you can find out even
>=C2=A0 =C2=A0 =C2=A0more details that are sense-key dependent...
>
>=C2=A0 =C2=A0 =C2=A0So I'm unsure that trying to shoehorn our imper= fect knowledge of what's
>=C2=A0 =C2=A0 =C2=A0retriable, fixable, should be written with zeros in= to the kernel and
>=C2=A0 =C2=A0 =C2=A0converting that to a separate errno would give good= results, and tapping
>=C2=A0 =C2=A0 =C2=A0into this stream daemons that want to make more nua= nced calls about disks
>=C2=A0 =C2=A0 =C2=A0might be the better way to go. One of the things I&= #39;m planning for $WORK is
>=C2=A0 =C2=A0 =C2=A0to enable the retry time limit of one of the mode p= ages so that we fail
>=C2=A0 =C2=A0 =C2=A0faster and can just delete the file with the 'b= ad' block that we'd get
>=C2=A0 =C2=A0 =C2=A0eventually if we allowed the full, default error pr= ocessing to run, but that
>=C2=A0 =C2=A0 =C2=A0'slow path' processing kills performance fo= r all other users of the
>=C2=A0 =C2=A0 =C2=A0drive...=C2=A0 I'm unsure how well that will wo= rk out (and I know I'm lucky that
>=C2=A0 =C2=A0 =C2=A0I can always recover any data for my application si= nce it's just a cache).
>
>=C2=A0 =C2=A0 =C2=A0I'd be interested to hear what others have to s= ay here thought, since my
>=C2=A0 =C2=A0 =C2=A0focus on this data is through the lense of my rathe= r specialized application...
>
>=C2=A0 =C2=A0 =C2=A0Warner
>
>=C2=A0 =C2=A0 =C2=A0P.S. That was generated with this rule if you wante= d to play with it...
>=C2=A0 =C2=A0 =C2=A0You'd have to translate absolute disk blocks to= a partition and an offset
>=C2=A0 =C2=A0 =C2=A0into the filesystem, then give the filesystem a cha= nce to tell you what of
>=C2=A0 =C2=A0 =C2=A0its data/metadata that block is used for...
>
>=C2=A0 =C2=A0 =C2=A0# Disk errors
>=C2=A0 =C2=A0 =C2=A0notify 10 {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "system&quo= t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"CAM";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "subsystem&= quot; =C2=A0 =C2=A0 =C2=A0 "periph";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "device&quo= t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"[an]?da[0-9]+";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 action "logger -t= diskerr -p daemon.info <http://daemon= .info> $_
>=C2=A0 =C2=A0 =C2=A0timestamp=3D$timestamp";
>=C2=A0 =C2=A0 =C2=A0};
>

--000000000000fa60eb0600f6d3dc--