Date: Thu, 20 Jul 2023 21:26:07 -0600 From: Warner Losh <imp@bsdimp.com> To: dgilbert@interlog.com Cc: Alan Somers <asomers@freebsd.org>, scsi@freebsd.org Subject: Re: ASC/ASCQ Review Message-ID: <CANCZdfoed3meq_z90aC=BP7RE_Gk%2BOq6K1sptO4E0s6jT_ge6Q@mail.gmail.com> In-Reply-To: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com> References: <CANCZdfokEoRtNp0en=9pjLQSQ%2BjtmfwH3OOwz1z09VcwWpE%2Bxg@mail.gmail.com> <CAOtMX2g4%2BSDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com> <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com> <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=%2BgUowXGS_gtyDOkig@mail.gmail.com> <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j%2Bbo%2BWiDwUT5uR7A@mail.gmail.com> <CANCZdfptEG=%2Bxa3m31Ngre26ZQxZ_Fqsfjmh%2BtVHgP2XpqhZ7g@mail.gmail.com> <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000fa60eb0600f6d3dc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <dgilbert@interlog.com> wrote= : > On 2023-07-19 11:41, Warner Losh wrote: > > btw, it also occurs to me that if I do add a 'secondary' table, then yo= u > could > > use it to generate a unique errno and experiment > > with that w/o affecting the main code until that stuff was mature. > > > > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs > that I'd > > like to tag as 'if trying harder, retry, otherwise fail' since re-retry > needs > > have changed a lot since cam was written in the late 90s and at least > some of > > the asc/ascq pairs I'm looking at haven't changed since the initial > import, but > > that's based on a tiny sampling of the data I have and is preliminary a= t > best. I > > may just change it to reflect modern usage. > > Hi, > If you are looking for up-to-date [20230325] asc/ascq tables in C you cou= ld > borrow mine at https://github.com/doug-gilbert/sg3_utils in > lib/sg_lib_data.c > starting at line 745 . > In testing/sg_chk_asc.c is a small test program for checking that the > table in > sg_lib_data.c agrees with the file that T10 supplies: > https://www.t10.org/lists/asc-num.txt Thanks for the pointer. I'd already updated CAM's tables for that... what I'm doing now is to make sure CAM's reactions to the asc/ascq is good for the modern drives... it's a good idea though to create a program for our table to match... Warner > Doug Gilbert > > > On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <imp@bsdimp.com > > <mailto:imp@bsdimp.com>> wrote: > > > > > > > > On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somers <asomers@freeb= sd.org > > <mailto:asomers@freebsd.org>> wrote: > > > > On Fri, Jul 14, 2023 at 11:05=E2=80=AFAM Warner Losh <imp@bsdim= p.com > > <mailto:imp@bsdimp.com>> wrote: > > > > > > > > > > > > On Fri, Jul 14, 2023, 11:12 AM Alan Somers < > asomers@freebsd.org > > <mailto:asomers@freebsd.org>> wrote: > > >> > > >> On Thu, Jul 13, 2023 at 12:14=E2=80=AFPM Warner Losh <imp@b= sdimp.com > > <mailto:imp@bsdimp.com>> wrote: > > >> > > > >> > Greetings, > > >> > > > >> > i've been looking closely at failed drives for $WORK > lately. I've > > noticed that a lot of errors that kinda sound like fatal errors > have > > SS_RDEF set on them. > > >> > > > >> > What's the process for evaluating whether those error > codes are > > worth retrying. There are several errors that we seem to be > seeing > > (preliminary read of the data) before the drive gives up the > ghost > > altogether. For those cases, I'd like to post more specific > lists. > > Should I do that here? > > >> > > > >> > Independent of that, I may want to have a more aggressive > 'fail > > fast' policy than is appropriate for my work load (we have a lo= t > of data > > that's a copy of a copy of a copy, so if we lose it, we don't > care: > > we'll just delete any files we can't read and get on with life, > though I > > know others will have a more conservative attitude towards data > that > > might be precious and unique). I can set the number of retries > lower, I > > can do some other hacks for disks that tell the disk to fail > faster, but > > I think part of the solution is going to have to be failing for > some > > sense-code/ASC/ASCQ tuples that we don't want to fail in > upstream or the > > general case. I was thinking of identifying those and creating = a > 'global > > quirk table' that gets applied after the drive-specific quirk > table that > > would let $WORK override the defaults, while letting others kee= p > the > > current behavior. IMHO, it would be better to have these > separate rather > > than in the global data for tracking upstream... > > >> > > > >> > Is that clear, or should I give concrete examples? > > >> > > > >> > Comments? > > >> > > > >> > Warner > > >> > > >> Basically, you want to change the retry counts for certain > ASC/ASCQ > > >> codes only, on a site-by-site basis? That sounds > reasonable. Would > > >> it be configurable at runtime or only at build time? > > > > > > > > > I'd like to change the default actions. But maybe we just do > that for > > everyone and assume modern drives... > > > > > >> Also, I've been thinking lately that it would be real nice > if READ > > >> UNRECOVERABLE could be translated to EINTEGRITY instead of > EIO. That > > >> would let consumers know that retries are pointless, but > that the data > > >> is probably healable. > > > > > > > > > Unlikely, unless you've tuned things to not try for long at > recovery... > > > > > > But regardless... do you have a concrete example of a use > case? > > There's a number of places that map any error to EIO. And I'd > like a use > > case before we expand the errors the lower layers return... > > > > > > Warner > > > > My first use-case is a user-space FUSE file system. It only ha= s > > access to errnos, not ASC/ASCQ codes. If we do as I suggest, > then it > > could heal a READ UNRECOVERABLE by rewriting the sector, wherea= s > other > > EIO errors aren't likely to be healed that way. > > > > > > Yea... but READ UNRECOVERABLE is kinda hit or miss... > > > > My second use-case is ZFS. zfsd treats checksum errors > differently > > from I/O errors. A checksum error normally means that a read > returned > > wrong data. But I think that READ UNRECOVERABLE should also > count. > > After all, that means that the disk's media returned wrong data > which > > was detected by the disk's own EDC/ECC. I've noticed that zfsd > seems > > to fault disks too eagerly when their only problem is READ > > UNRECOVERABLE errors. Mapping it to EINTEGRITY, or even a new > error > > code, would let zfsd be tuned better. > > > > > > EINTEGRITY would then mean two different things. UFS returns in whe= n > > checksums fail for critical filesystem errors. I'm not saying no, > per se, > > just that it conflates two different errors. > > > > I think both of these use cases would be better served by CAM's > publishing > > of the errors to devctl today. Here's some example data from a > system I'm > > looking at: > > > > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 4e b7 cb a3 00 04= cc 00 " > > timestamp=3D1634739729.312068 > > system=3DCAM subsystem=3Dperiph type=3Dtimeout device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x44b" timeout=3D30000 CDB=3D"28 00 20 6b d5 56 00 00= c0 00 " > > timestamp=3D1634739729.585541 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB= =3D"28 00 > ad 1a > > 35 96 00 00 56 00 " timestamp=3D1641979267.469064 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda36 serial= =3D"12345" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 03 11 00" CDB= =3D"28 00 > ad 1a > > 35 96 00 01 5e 00 " timestamp=3D1642252539.693699 > > system=3DCAM subsystem=3Dperiph type=3Derror device=3Dda39 serial= =3D"12346" > > cam_status=3D"0x4cc" scsi_status=3D2 scsi_sense=3D"72 04 02 00" CDB= =3D"2a 00 > 01 2b > > c8 f6 00 07 81 00 " timestamp=3D1669603144.090835 > > > > Here we get the sense key, the asc and the ascq in the scsi_sense > data (I'm > > currently looking at expanding this to the entire sense buffer, > since it > > includes how hard the drive tried to read the data on media and > hardware > > errors). It doesn't include nvme data, but does include ata data > (I'll have > > to add that data, now that I've noticed it is missing). With the > sense data > > and the CDB you know what kind of error you got, plus what block > didn't > > read/write correctly. With the extended sense data, you can find ou= t > even > > more details that are sense-key dependent... > > > > So I'm unsure that trying to shoehorn our imperfect knowledge of > what's > > retriable, fixable, should be written with zeros into the kernel an= d > > converting that to a separate errno would give good results, and > tapping > > into this stream daemons that want to make more nuanced calls about > disks > > might be the better way to go. One of the things I'm planning for > $WORK is > > to enable the retry time limit of one of the mode pages so that we > fail > > faster and can just delete the file with the 'bad' block that we'd > get > > eventually if we allowed the full, default error processing to run, > but that > > 'slow path' processing kills performance for all other users of the > > drive... I'm unsure how well that will work out (and I know I'm > lucky that > > I can always recover any data for my application since it's just a > cache). > > > > I'd be interested to hear what others have to say here thought, > since my > > focus on this data is through the lense of my rather specialized > application... > > > > Warner > > > > P.S. That was generated with this rule if you wanted to play with > it... > > You'd have to translate absolute disk blocks to a partition and an > offset > > into the filesystem, then give the filesystem a chance to tell you > what of > > its data/metadata that block is used for... > > > > # Disk errors > > notify 10 { > > match "system" "CAM"; > > match "subsystem" "periph"; > > match "device" "[an]?da[0-9]+"; > > action "logger -t diskerr -p daemon.info < > http://daemon.info> $_ > > timestamp=3D$timestamp"; > > }; > > > > --000000000000fa60eb0600f6d3dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" = class=3D"gmail_attr">On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <<a h= ref=3D"mailto:dgilbert@interlog.com" target=3D"_blank" rel=3D"noreferrer">d= gilbert@interlog.com</a>> wrote:<br></div><blockquote class=3D"gmail_quo= te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"= >On 2023-07-19 11:41, Warner Losh wrote:<br> > btw, it also occurs to me that if I do add a 'secondary' table= , then you could <br> > use it to generate a unique errno and experiment<br> > with that w/o affecting the main code until that stuff was mature.<br> > <br> > I'm not sure I'll do that now, since I've found maybe 10 a= sc/ascq pairs that I'd <br> > like to tag as 'if trying harder, retry, otherwise fail' since= re-retry needs <br> > have changed a lot since cam was written in the late 90s and at least = some of <br> > the asc/ascq pairs I'm looking at haven't changed since the in= itial import, but <br> > that's based on a tiny sampling of the data I have and is prelimin= ary at best. I <br> > may just change it to reflect modern usage.<br> <br> Hi,<br> If you are looking for up-to-date [20230325] asc/ascq tables in C you could= <br> borrow mine at <a href=3D"https://github.com/doug-gilbert/sg3_utils" rel=3D= "noreferrer noreferrer noreferrer" target=3D"_blank">https://github.com/dou= g-gilbert/sg3_utils</a> in lib/sg_lib_data.c<br> starting at line 745 .<br> In testing/sg_chk_asc.c is a small test program for checking that the table= in<br> sg_lib_data.c agrees with the file that T10 supplies:<br> =C2=A0 =C2=A0 =C2=A0 <a href=3D"https://www.t10.org/lists/asc-num.txt" rel= =3D"noreferrer noreferrer noreferrer" target=3D"_blank">https://www.t10.org= /lists/asc-num.txt</a></blockquote></div></div><div dir=3D"auto"><br></div>= <div dir=3D"auto">Thanks for the pointer. I'd already updated CAM's= tables for that...</div><div dir=3D"auto"><br></div><div dir=3D"auto">what= I'm doing now is to make sure CAM's reactions to the asc/ascq is g= ood for the modern drives... it's a good idea though to create a progra= m for our table to match...</div><div dir=3D"auto"><br></div><div dir=3D"au= to">Warner</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D= "gmail_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;= border-left:1px #ccc solid;padding-left:1ex"><br> Doug Gilbert<br> <br> > On Fri, Jul 14, 2023 at 5:34=E2=80=AFPM Warner Losh <<a href=3D"mai= lto:imp@bsdimp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsd= imp.com</a> <br> > <mailto:<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer norefer= rer" target=3D"_blank">imp@bsdimp.com</a>>> wrote:<br> > <br> > <br> > <br> >=C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 12:31=E2=80=AFPM Alan Somer= s <<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer noreferrer" = target=3D"_blank">asomers@freebsd.org</a><br> >=C2=A0 =C2=A0 =C2=A0<mailto:<a href=3D"mailto:asomers@freebsd.org" r= el=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebsd.org</a>>&= gt; wrote:<br> > <br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0On Fri, Jul 14, 2023 at 11:05=E2=80= =AFAM Warner Losh <<a href=3D"mailto:imp@bsdimp.com" rel=3D"noreferrer n= oreferrer" target=3D"_blank">imp@bsdimp.com</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:<a href=3D"mailto:imp@bsdi= mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&= gt;> wrote:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > On Fri, Jul 14, 2023, 11:12 AM = Alan Somers <<a href=3D"mailto:asomers@freebsd.org" rel=3D"noreferrer no= referrer" target=3D"_blank">asomers@freebsd.org</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:<a href=3D"mailto:asomers@= freebsd.org" rel=3D"noreferrer noreferrer" target=3D"_blank">asomers@freebs= d.org</a>>> wrote:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> On Thu, Jul 13, 2023 at 12:= 14=E2=80=AFPM Warner Losh <<a href=3D"mailto:imp@bsdimp.com" rel=3D"nore= ferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<mailto:<a href=3D"mailto:imp@bsdi= mp.com" rel=3D"noreferrer noreferrer" target=3D"_blank">imp@bsdimp.com</a>&= gt;> wrote:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Greetings,<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > i've been looking = closely at failed drives for $WORK lately. I've<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0noticed that a lot of errors that kin= da sound like fatal errors have<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0SS_RDEF set on them.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > What's the process= for evaluating whether those error codes are<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0worth retrying. There are several err= ors that we seem to be seeing<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(preliminary read of the data) before= the drive gives up the ghost<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0altogether. For those cases, I'd = like to post more specific lists.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Should I do that here?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Independent of that, I= may want to have a more aggressive 'fail<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fast' policy than is appropriate = for my work load (we have a lot of data<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0that's a copy of a copy of a copy= , so if we lose it, we don't care:<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0we'll just delete any files we ca= n't read and get on with life, though I<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0know others will have a more conserva= tive attitude towards data that<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0might be precious and unique). I can = set the number of retries lower, I<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0can do some other hacks for disks tha= t tell the disk to fail faster, but<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0I think part of the solution is going= to have to be failing for some<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sense-code/ASC/ASCQ tuples that we do= n't want to fail in upstream or the<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0general case. I was thinking of ident= ifying those and creating a 'global<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0quirk table' that gets applied af= ter the drive-specific quirk table that<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would let $WORK override the defaults= , while letting others keep the<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0current behavior. IMHO, it would be b= etter to have these separate rather<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0than in the global data for tracking = upstream...<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Is that clear, or shou= ld I give concrete examples?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Comments?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> > Warner<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Basically, you want to chan= ge the retry counts for certain ASC/ASCQ<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> codes only, on a site-by-si= te basis?=C2=A0 That sounds reasonable.=C2=A0 Would<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> it be configurable at runti= me or only at build time?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > I'd like to change the defa= ult actions. But maybe we just do that for<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0everyone and assume modern drives...<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Also, I've been thinkin= g lately that it would be real nice if READ<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> UNRECOVERABLE could be tran= slated to EINTEGRITY instead of EIO.=C2=A0 That<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> would let consumers know th= at retries are pointless, but that the data<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> is probably healable.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Unlikely, unless you've tun= ed things to not try for long at recovery...<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > But regardless... do you have a= concrete example of a use case?<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0There's a number of places that m= ap any error to EIO. And I'd like a use<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0case before we expand the errors the = lower layers return...<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ><br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Warner<br> > <br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My first use-case is a user-space FUS= E file system.=C2=A0 It only has<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0access to errnos, not ASC/ASCQ codes.= =C2=A0 If we do as I suggest, then it<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0could heal a READ UNRECOVERABLE by re= writing the sector, whereas other<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EIO errors aren't likely to be he= aled that way.<br> > <br> > <br> >=C2=A0 =C2=A0 =C2=A0Yea... but READ UNRECOVERABLE is kinda hit or miss.= ..<br> > <br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0My second use-case is ZFS.=C2=A0 zfsd= treats checksum errors differently<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0from I/O errors.=C2=A0 A checksum err= or normally means that a read returned<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0wrong data.=C2=A0 But I think that RE= AD UNRECOVERABLE should also count.<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0After all, that means that the disk&#= 39;s media returned wrong data which<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0was detected by the disk's own ED= C/ECC.=C2=A0 I've noticed that zfsd seems<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0to fault disks too eagerly when their= only problem is READ<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0UNRECOVERABLE errors.=C2=A0 Mapping i= t to EINTEGRITY, or even a new error<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0code, would let zfsd be tuned better.= <br> > <br> > <br> >=C2=A0 =C2=A0 =C2=A0EINTEGRITY would then mean two different things. UF= S returns in when<br> >=C2=A0 =C2=A0 =C2=A0checksums fail for critical=C2=A0filesystem errors.= I'm not saying no, per se,<br> >=C2=A0 =C2=A0 =C2=A0just that it conflates two different errors.<br> > <br> >=C2=A0 =C2=A0 =C2=A0I think both of these use cases would be better ser= ved by CAM's publishing<br> >=C2=A0 =C2=A0 =C2=A0of the errors to devctl today. Here's some exam= ple data from a system I'm<br> >=C2=A0 =C2=A0 =C2=A0looking at:<br> > <br> >=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi= ce=3Dda36 serial=3D"12345"<br> >=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x44b" timeout=3D30000 CDB= =3D"28 00 4e b7 cb a3 00 04 cc 00 "<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.312068<br> >=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Dtimeout devi= ce=3Dda36 serial=3D"12345"<br> >=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x44b" timeout=3D30000 CDB= =3D"28 00 20 6b d5 56 00 00 c0 00 "<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0timestamp=3D1634739729.585541<br> >=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda36 serial=3D"12345"<br> >=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 03 11 00" CDB=3D"28 00 ad 1a<br> >=C2=A0 =C2=A0 =C2=A035 96 00 00 56 00 " timestamp=3D1641979267.469= 064<br> >=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda36 serial=3D"12345"<br> >=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 03 11 00" CDB=3D"28 00 ad 1a<br> >=C2=A0 =C2=A0 =C2=A035 96 00 01 5e 00 " =C2=A0timestamp=3D16422525= 39.693699<br> >=C2=A0 =C2=A0 =C2=A0system=3DCAM subsystem=3Dperiph type=3Derror device= =3Dda39 serial=3D"12346"<br> >=C2=A0 =C2=A0 =C2=A0cam_status=3D"0x4cc" scsi_status=3D2 scsi= _sense=3D"72 04 02 00" CDB=3D"2a 00 01 2b<br> >=C2=A0 =C2=A0 =C2=A0c8 f6 00 07 81 00 " =C2=A0timestamp=3D16696031= 44.090835<br> > <br> >=C2=A0 =C2=A0 =C2=A0Here we get the sense key, the asc and the ascq in = the scsi_sense data (I'm<br> >=C2=A0 =C2=A0 =C2=A0currently looking at expanding this to the entire s= ense buffer, since it<br> >=C2=A0 =C2=A0 =C2=A0includes how hard the drive tried to read the data = on media and hardware<br> >=C2=A0 =C2=A0 =C2=A0errors).=C2=A0 It doesn't include nvme data, bu= t does include ata data (I'll have<br> >=C2=A0 =C2=A0 =C2=A0to add that data, now that I've noticed it is m= issing).=C2=A0 With the sense data<br> >=C2=A0 =C2=A0 =C2=A0and the CDB you know what kind of error you got, pl= us what block didn't<br> >=C2=A0 =C2=A0 =C2=A0read/write correctly. With the extended sense data,= you can find out even<br> >=C2=A0 =C2=A0 =C2=A0more details that are sense-key dependent...<br> > <br> >=C2=A0 =C2=A0 =C2=A0So I'm unsure that trying to shoehorn our imper= fect knowledge of what's<br> >=C2=A0 =C2=A0 =C2=A0retriable, fixable, should be written with zeros in= to the kernel and<br> >=C2=A0 =C2=A0 =C2=A0converting that to a separate errno would give good= results, and tapping<br> >=C2=A0 =C2=A0 =C2=A0into this stream daemons that want to make more nua= nced calls about disks<br> >=C2=A0 =C2=A0 =C2=A0might be the better way to go. One of the things I&= #39;m planning for $WORK is<br> >=C2=A0 =C2=A0 =C2=A0to enable the retry time limit of one of the mode p= ages so that we fail<br> >=C2=A0 =C2=A0 =C2=A0faster and can just delete the file with the 'b= ad' block that we'd get<br> >=C2=A0 =C2=A0 =C2=A0eventually if we allowed the full, default error pr= ocessing to run, but that<br> >=C2=A0 =C2=A0 =C2=A0'slow path' processing kills performance fo= r all other users of the<br> >=C2=A0 =C2=A0 =C2=A0drive...=C2=A0 I'm unsure how well that will wo= rk out (and I know I'm lucky that<br> >=C2=A0 =C2=A0 =C2=A0I can always recover any data for my application si= nce it's just a cache).<br> > <br> >=C2=A0 =C2=A0 =C2=A0I'd be interested to hear what others have to s= ay here thought, since my<br> >=C2=A0 =C2=A0 =C2=A0focus on this data is through the lense of my rathe= r specialized application...<br> > <br> >=C2=A0 =C2=A0 =C2=A0Warner<br> > <br> >=C2=A0 =C2=A0 =C2=A0P.S. That was generated with this rule if you wante= d to play with it...<br> >=C2=A0 =C2=A0 =C2=A0You'd have to translate absolute disk blocks to= a partition and an offset<br> >=C2=A0 =C2=A0 =C2=A0into the filesystem, then give the filesystem a cha= nce to tell you what of<br> >=C2=A0 =C2=A0 =C2=A0its data/metadata that block is used for...<br> > <br> >=C2=A0 =C2=A0 =C2=A0# Disk errors<br> >=C2=A0 =C2=A0 =C2=A0notify 10 {<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "system&quo= t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"CAM";<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "subsystem&= quot; =C2=A0 =C2=A0 =C2=A0 "periph";<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 match "device&quo= t; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"[an]?da[0-9]+";<br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 action "logger -t= diskerr -p <a href=3D"http://daemon.info" rel=3D"noreferrer noreferrer nor= eferrer" target=3D"_blank">daemon.info</a> <<a href=3D"http://daemon.inf= o" rel=3D"noreferrer noreferrer noreferrer" target=3D"_blank">http://daemon= .info</a>> $_<br> >=C2=A0 =C2=A0 =C2=A0timestamp=3D$timestamp";<br> >=C2=A0 =C2=A0 =C2=A0};<br> > <br> <br> </blockquote></div></div></div> --000000000000fa60eb0600f6d3dc--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoed3meq_z90aC=BP7RE_Gk%2BOq6K1sptO4E0s6jT_ge6Q>