FreeBSD Mail Archives

Date:      Thu, 20 Jul 2023 21:26:07 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        dgilbert@interlog.com
Cc:        Alan Somers <asomers@freebsd.org>, scsi@freebsd.org
Subject:   Re: ASC/ASCQ Review
Message-ID:  <CANCZdfoed3meq_z90aC=BP7RE_Gk%2BOq6K1sptO4E0s6jT_ge6Q@mail.gmail.com>
In-Reply-To: <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>
References:  <CANCZdfokEoRtNp0en=9pjLQSQ%2BjtmfwH3OOwz1z09VcwWpE%2Bxg@mail.gmail.com> <CAOtMX2g4%2BSDWg9WKbwZcqh4GpRan593O6qtNf7feoVejVK0YyQ@mail.gmail.com> <CANCZdfq5qti5uzWLkZaQEpyd5Q255sQeaR_kC_OQinmE9Qcqaw@mail.gmail.com> <CAOtMX2iwnpHL6b2-1D4N4Bi4eKoLnGK4=%2BgUowXGS_gtyDOkig@mail.gmail.com> <CANCZdfr-y8HYBb6GCFqZ7LAarxUAGb36Y6j%2Bbo%2BWiDwUT5uR7A@mail.gmail.com> <CANCZdfptEG=%2Bxa3m31Ngre26ZQxZ_Fqsfjmh%2BtVHgP2XpqhZ7g@mail.gmail.com> <7df852e4-5df5-de51-70a6-08bcbcb2f757@interlog.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <dgilbert@interlog.com> wrote:

> On 2023-07-19 11:41, Warner Losh wrote:
> > btw, it also occurs to me that if I do add a 'secondary' table, then you
> could
> > use it to generate a unique errno and experiment
> > with that w/o affecting the main code until that stuff was mature.
> >
> > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs
> that I'd
> > like to tag as 'if trying harder, retry, otherwise fail' since re-retry
> needs
> > have changed a lot since cam was written in the late 90s and at least
> some of
> > the asc/ascq pairs I'm looking at haven't changed since the initial
> import, but
> > that's based on a tiny sampling of the data I have and is preliminary at
> best. I
> > may just change it to reflect modern usage.
>
> Hi,
> If you are looking for up-to-date [20230325] asc/ascq tables in C you could
> borrow mine at https://github.com/doug-gilbert/sg3_utils in
> lib/sg_lib_data.c
> starting at line 745 .
> In testing/sg_chk_asc.c is a small test program for checking that the
> table in
> sg_lib_data.c agrees with the file that T10 supplies:
>       https://www.t10.org/lists/asc-num.txt

Thanks for the pointer. I'd already updated CAM's tables for that...

what I'm doing now is to make sure CAM's reactions to the asc/ascq is good
for the modern drives... it's a good idea though to create a program for
our table to match...

Warner

> Doug Gilbert
>
> > On Fri, Jul 14, 2023 at 5:34 PM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> >
> >
> >
> >     On Fri, Jul 14, 2023 at 12:31 PM Alan Somers <asomers@freebsd.org
> >     <mailto:asomers@freebsd.org>> wrote:
> >
> >         On Fri, Jul 14, 2023 at 11:05 AM Warner Losh <imp@bsdimp.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >
> >          >
> >          >
> >          > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <
> asomers@freebsd.org
> >         <mailto:asomers@freebsd.org>> wrote:
> >          >>
> >          >> On Thu, Jul 13, 2023 at 12:14 PM Warner Losh <imp@bsdimp.com
> >         <mailto:imp@bsdimp.com>> wrote:
> >          >> >
> >          >> > Greetings,
> >          >> >
> >          >> > i've been looking closely at failed drives for $WORK
> lately. I've
> >         noticed that a lot of errors that kinda sound like fatal errors
> have
> >         SS_RDEF set on them.
> >          >> >
> >          >> > What's the process for evaluating whether those error
> codes are
> >         worth retrying. There are several errors that we seem to be
> seeing
> >         (preliminary read of the data) before the drive gives up the
> ghost
> >         altogether. For those cases, I'd like to post more specific
> lists.
> >         Should I do that here?
> >          >> >
> >          >> > Independent of that, I may want to have a more aggressive
> 'fail
> >         fast' policy than is appropriate for my work load (we have a lot
> of data
> >         that's a copy of a copy of a copy, so if we lose it, we don't
> care:
> >         we'll just delete any files we can't read and get on with life,
> though I
> >         know others will have a more conservative attitude towards data
> that
> >         might be precious and unique). I can set the number of retries
> lower, I
> >         can do some other hacks for disks that tell the disk to fail
> faster, but
> >         I think part of the solution is going to have to be failing for
> some
> >         sense-code/ASC/ASCQ tuples that we don't want to fail in
> upstream or the
> >         general case. I was thinking of identifying those and creating a
> 'global
> >         quirk table' that gets applied after the drive-specific quirk
> table that
> >         would let $WORK override the defaults, while letting others keep
> the
> >         current behavior. IMHO, it would be better to have these
> separate rather
> >         than in the global data for tracking upstream...
> >          >> >
> >          >> > Is that clear, or should I give concrete examples?
> >          >> >
> >          >> > Comments?
> >          >> >
> >          >> > Warner
> >          >>
> >          >> Basically, you want to change the retry counts for certain
> ASC/ASCQ
> >          >> codes only, on a site-by-site basis?  That sounds
> reasonable.  Would
> >          >> it be configurable at runtime or only at build time?
> >          >
> >          >
> >          > I'd like to change the default actions. But maybe we just do
> that for
> >         everyone and assume modern drives...
> >          >
> >          >> Also, I've been thinking lately that it would be real nice
> if READ
> >          >> UNRECOVERABLE could be translated to EINTEGRITY instead of
> EIO.  That
> >          >> would let consumers know that retries are pointless, but
> that the data
> >          >> is probably healable.
> >          >
> >          >
> >          > Unlikely, unless you've tuned things to not try for long at
> recovery...
> >          >
> >          > But regardless... do you have a concrete example of a use
> case?
> >         There's a number of places that map any error to EIO. And I'd
> like a use
> >         case before we expand the errors the lower layers return...
> >          >
> >          > Warner
> >
> >         My first use-case is a user-space FUSE file system.  It only has
> >         access to errnos, not ASC/ASCQ codes.  If we do as I suggest,
> then it
> >         could heal a READ UNRECOVERABLE by rewriting the sector, whereas
> other
> >         EIO errors aren't likely to be healed that way.
> >
> >
> >     Yea... but READ UNRECOVERABLE is kinda hit or miss...
> >
> >         My second use-case is ZFS.  zfsd treats checksum errors
> differently
> >         from I/O errors.  A checksum error normally means that a read
> returned
> >         wrong data.  But I think that READ UNRECOVERABLE should also
> count.
> >         After all, that means that the disk's media returned wrong data
> which
> >         was detected by the disk's own EDC/ECC.  I've noticed that zfsd
> seems
> >         to fault disks too eagerly when their only problem is READ
> >         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new
> error
> >         code, would let zfsd be tuned better.
> >
> >
> >     EINTEGRITY would then mean two different things. UFS returns in when
> >     checksums fail for critical filesystem errors. I'm not saying no,
> per se,
> >     just that it conflates two different errors.
> >
> >     I think both of these use cases would be better served by CAM's
> publishing
> >     of the errors to devctl today. Here's some example data from a
> system I'm
> >     looking at:
> >
> >     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
> >     cam_status="0x44b" timeout=30000 CDB="28 00 4e b7 cb a3 00 04 cc 00 "
> >       timestamp=1634739729.312068
> >     system=CAM subsystem=periph type=timeout device=da36 serial="12345"
> >     cam_status="0x44b" timeout=30000 CDB="28 00 20 6b d5 56 00 00 c0 00 "
> >       timestamp=1634739729.585541
> >     system=CAM subsystem=periph type=error device=da36 serial="12345"
> >     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00
> ad 1a
> >     35 96 00 00 56 00 " timestamp=1641979267.469064
> >     system=CAM subsystem=periph type=error device=da36 serial="12345"
> >     cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00
> ad 1a
> >     35 96 00 01 5e 00 "  timestamp=1642252539.693699
> >     system=CAM subsystem=periph type=error device=da39 serial="12346"
> >     cam_status="0x4cc" scsi_status=2 scsi_sense="72 04 02 00" CDB="2a 00
> 01 2b
> >     c8 f6 00 07 81 00 "  timestamp=1669603144.090835
> >
> >     Here we get the sense key, the asc and the ascq in the scsi_sense
> data (I'm
> >     currently looking at expanding this to the entire sense buffer,
> since it
> >     includes how hard the drive tried to read the data on media and
> hardware
> >     errors).  It doesn't include nvme data, but does include ata data
> (I'll have
> >     to add that data, now that I've noticed it is missing).  With the
> sense data
> >     and the CDB you know what kind of error you got, plus what block
> didn't
> >     read/write correctly. With the extended sense data, you can find out
> even
> >     more details that are sense-key dependent...
> >
> >     So I'm unsure that trying to shoehorn our imperfect knowledge of
> what's
> >     retriable, fixable, should be written with zeros into the kernel and
> >     converting that to a separate errno would give good results, and
> tapping
> >     into this stream daemons that want to make more nuanced calls about
> disks
> >     might be the better way to go. One of the things I'm planning for
> $WORK is
> >     to enable the retry time limit of one of the mode pages so that we
> fail
> >     faster and can just delete the file with the 'bad' block that we'd
> get
> >     eventually if we allowed the full, default error processing to run,
> but that
> >     'slow path' processing kills performance for all other users of the
> >     drive...  I'm unsure how well that will work out (and I know I'm
> lucky that
> >     I can always recover any data for my application since it's just a
> cache).
> >
> >     I'd be interested to hear what others have to say here thought,
> since my
> >     focus on this data is through the lense of my rather specialized
> application...
> >
> >     Warner
> >
> >     P.S. That was generated with this rule if you wanted to play with
> it...
> >     You'd have to translate absolute disk blocks to a partition and an
> offset
> >     into the filesystem, then give the filesystem a chance to tell you
> what of
> >     its data/metadata that block is used for...
> >
> >     # Disk errors
> >     notify 10 {
> >              match "system"          "CAM";
> >              match "subsystem"       "periph";
> >              match "device"          "[an]?da[0-9]+";
> >              action "logger -t diskerr -p daemon.info <
> http://daemon.info>; $_
> >     timestamp=$timestamp";
> >     };
> >
>
>

[-- Attachment #2 --]
<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert &lt;<a href="mailto:dgilbert@interlog.com" target="_blank" rel="noreferrer">dgilbert@interlog.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 2023-07-19 11:41, Warner Losh wrote:<br>
&gt; btw, it also occurs to me that if I do add a &#39;secondary&#39; table, then you could <br>
&gt; use it to generate a unique errno and experiment<br>
&gt; with that w/o affecting the main code until that stuff was mature.<br>
&gt; <br>
&gt; I&#39;m not sure I&#39;ll do that now, since I&#39;ve found maybe 10 asc/ascq pairs that I&#39;d <br>
&gt; like to tag as &#39;if trying harder, retry, otherwise fail&#39; since re-retry needs <br>
&gt; have changed a lot since cam was written in the late 90s and at least some of <br>
&gt; the asc/ascq pairs I&#39;m looking at haven&#39;t changed since the initial import, but <br>
&gt; that&#39;s based on a tiny sampling of the data I have and is preliminary at best. I <br>
&gt; may just change it to reflect modern usage.<br>
<br>
Hi,<br>
If you are looking for up-to-date [20230325] asc/ascq tables in C you could<br>
borrow mine at <a href="https://github.com/doug-gilbert/sg3_utils" rel="noreferrer noreferrer noreferrer" target="_blank">https://github.com/doug-gilbert/sg3_utils</a>; in lib/sg_lib_data.c<br>
starting at line 745 .<br>
In testing/sg_chk_asc.c is a small test program for checking that the table in<br>
sg_lib_data.c agrees with the file that T10 supplies:<br>
      <a href="https://www.t10.org/lists/asc-num.txt" rel="noreferrer noreferrer noreferrer" target="_blank">https://www.t10.org/lists/asc-num.txt</a></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Thanks for the pointer. I&#39;d already updated CAM&#39;s tables for that...</div><div dir="auto"><br></div><div dir="auto">what I&#39;m doing now is to make sure CAM&#39;s reactions to the asc/ascq is good for the modern drives... it&#39;s a good idea though to create a program for our table to match...</div><div dir="auto"><br></div><div dir="auto">Warner</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Doug Gilbert<br>
<br>
&gt; On Fri, Jul 14, 2023 at 5:34 PM Warner Losh &lt;<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a> <br>
&gt; &lt;mailto:<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt;     On Fri, Jul 14, 2023 at 12:31 PM Alan Somers &lt;<a href="mailto:asomers@freebsd.org" rel="noreferrer noreferrer" target="_blank">asomers@freebsd.org</a><br>
&gt;     &lt;mailto:<a href="mailto:asomers@freebsd.org" rel="noreferrer noreferrer" target="_blank">asomers@freebsd.org</a>&gt;&gt; wrote:<br>
&gt; <br>
&gt;         On Fri, Jul 14, 2023 at 11:05 AM Warner Losh &lt;<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a><br>
&gt;         &lt;mailto:<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt;          &gt;<br>
&gt;          &gt;<br>
&gt;          &gt;<br>
&gt;          &gt; On Fri, Jul 14, 2023, 11:12 AM Alan Somers &lt;<a href="mailto:asomers@freebsd.org" rel="noreferrer noreferrer" target="_blank">asomers@freebsd.org</a><br>
&gt;         &lt;mailto:<a href="mailto:asomers@freebsd.org" rel="noreferrer noreferrer" target="_blank">asomers@freebsd.org</a>&gt;&gt; wrote:<br>
&gt;          &gt;&gt;<br>
&gt;          &gt;&gt; On Thu, Jul 13, 2023 at 12:14 PM Warner Losh &lt;<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a><br>
&gt;         &lt;mailto:<a href="mailto:imp@bsdimp.com" rel="noreferrer noreferrer" target="_blank">imp@bsdimp.com</a>&gt;&gt; wrote:<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; Greetings,<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; i&#39;ve been looking closely at failed drives for $WORK lately. I&#39;ve<br>
&gt;         noticed that a lot of errors that kinda sound like fatal errors have<br>
&gt;         SS_RDEF set on them.<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; What&#39;s the process for evaluating whether those error codes are<br>
&gt;         worth retrying. There are several errors that we seem to be seeing<br>
&gt;         (preliminary read of the data) before the drive gives up the ghost<br>
&gt;         altogether. For those cases, I&#39;d like to post more specific lists.<br>
&gt;         Should I do that here?<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; Independent of that, I may want to have a more aggressive &#39;fail<br>
&gt;         fast&#39; policy than is appropriate for my work load (we have a lot of data<br>
&gt;         that&#39;s a copy of a copy of a copy, so if we lose it, we don&#39;t care:<br>
&gt;         we&#39;ll just delete any files we can&#39;t read and get on with life, though I<br>
&gt;         know others will have a more conservative attitude towards data that<br>
&gt;         might be precious and unique). I can set the number of retries lower, I<br>
&gt;         can do some other hacks for disks that tell the disk to fail faster, but<br>
&gt;         I think part of the solution is going to have to be failing for some<br>
&gt;         sense-code/ASC/ASCQ tuples that we don&#39;t want to fail in upstream or the<br>
&gt;         general case. I was thinking of identifying those and creating a &#39;global<br>
&gt;         quirk table&#39; that gets applied after the drive-specific quirk table that<br>
&gt;         would let $WORK override the defaults, while letting others keep the<br>
&gt;         current behavior. IMHO, it would be better to have these separate rather<br>
&gt;         than in the global data for tracking upstream...<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; Is that clear, or should I give concrete examples?<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; Comments?<br>
&gt;          &gt;&gt; &gt;<br>
&gt;          &gt;&gt; &gt; Warner<br>
&gt;          &gt;&gt;<br>
&gt;          &gt;&gt; Basically, you want to change the retry counts for certain ASC/ASCQ<br>
&gt;          &gt;&gt; codes only, on a site-by-site basis?  That sounds reasonable.  Would<br>
&gt;          &gt;&gt; it be configurable at runtime or only at build time?<br>
&gt;          &gt;<br>
&gt;          &gt;<br>
&gt;          &gt; I&#39;d like to change the default actions. But maybe we just do that for<br>
&gt;         everyone and assume modern drives...<br>
&gt;          &gt;<br>
&gt;          &gt;&gt; Also, I&#39;ve been thinking lately that it would be real nice if READ<br>
&gt;          &gt;&gt; UNRECOVERABLE could be translated to EINTEGRITY instead of EIO.  That<br>
&gt;          &gt;&gt; would let consumers know that retries are pointless, but that the data<br>
&gt;          &gt;&gt; is probably healable.<br>
&gt;          &gt;<br>
&gt;          &gt;<br>
&gt;          &gt; Unlikely, unless you&#39;ve tuned things to not try for long at recovery...<br>
&gt;          &gt;<br>
&gt;          &gt; But regardless... do you have a concrete example of a use case?<br>
&gt;         There&#39;s a number of places that map any error to EIO. And I&#39;d like a use<br>
&gt;         case before we expand the errors the lower layers return...<br>
&gt;          &gt;<br>
&gt;          &gt; Warner<br>
&gt; <br>
&gt;         My first use-case is a user-space FUSE file system.  It only has<br>
&gt;         access to errnos, not ASC/ASCQ codes.  If we do as I suggest, then it<br>
&gt;         could heal a READ UNRECOVERABLE by rewriting the sector, whereas other<br>
&gt;         EIO errors aren&#39;t likely to be healed that way.<br>
&gt; <br>
&gt; <br>
&gt;     Yea... but READ UNRECOVERABLE is kinda hit or miss...<br>
&gt; <br>
&gt;         My second use-case is ZFS.  zfsd treats checksum errors differently<br>
&gt;         from I/O errors.  A checksum error normally means that a read returned<br>
&gt;         wrong data.  But I think that READ UNRECOVERABLE should also count.<br>
&gt;         After all, that means that the disk&#39;s media returned wrong data which<br>
&gt;         was detected by the disk&#39;s own EDC/ECC.  I&#39;ve noticed that zfsd seems<br>
&gt;         to fault disks too eagerly when their only problem is READ<br>
&gt;         UNRECOVERABLE errors.  Mapping it to EINTEGRITY, or even a new error<br>
&gt;         code, would let zfsd be tuned better.<br>
&gt; <br>
&gt; <br>
&gt;     EINTEGRITY would then mean two different things. UFS returns in when<br>
&gt;     checksums fail for critical filesystem errors. I&#39;m not saying no, per se,<br>
&gt;     just that it conflates two different errors.<br>
&gt; <br>
&gt;     I think both of these use cases would be better served by CAM&#39;s publishing<br>
&gt;     of the errors to devctl today. Here&#39;s some example data from a system I&#39;m<br>
&gt;     looking at:<br>
&gt; <br>
&gt;     system=CAM subsystem=periph type=timeout device=da36 serial=&quot;12345&quot;<br>
&gt;     cam_status=&quot;0x44b&quot; timeout=30000 CDB=&quot;28 00 4e b7 cb a3 00 04 cc 00 &quot;<br>
&gt;       timestamp=1634739729.312068<br>
&gt;     system=CAM subsystem=periph type=timeout device=da36 serial=&quot;12345&quot;<br>
&gt;     cam_status=&quot;0x44b&quot; timeout=30000 CDB=&quot;28 00 20 6b d5 56 00 00 c0 00 &quot;<br>
&gt;       timestamp=1634739729.585541<br>
&gt;     system=CAM subsystem=periph type=error device=da36 serial=&quot;12345&quot;<br>
&gt;     cam_status=&quot;0x4cc&quot; scsi_status=2 scsi_sense=&quot;72 03 11 00&quot; CDB=&quot;28 00 ad 1a<br>
&gt;     35 96 00 00 56 00 &quot; timestamp=1641979267.469064<br>
&gt;     system=CAM subsystem=periph type=error device=da36 serial=&quot;12345&quot;<br>
&gt;     cam_status=&quot;0x4cc&quot; scsi_status=2 scsi_sense=&quot;72 03 11 00&quot; CDB=&quot;28 00 ad 1a<br>
&gt;     35 96 00 01 5e 00 &quot;  timestamp=1642252539.693699<br>
&gt;     system=CAM subsystem=periph type=error device=da39 serial=&quot;12346&quot;<br>
&gt;     cam_status=&quot;0x4cc&quot; scsi_status=2 scsi_sense=&quot;72 04 02 00&quot; CDB=&quot;2a 00 01 2b<br>
&gt;     c8 f6 00 07 81 00 &quot;  timestamp=1669603144.090835<br>
&gt; <br>
&gt;     Here we get the sense key, the asc and the ascq in the scsi_sense data (I&#39;m<br>
&gt;     currently looking at expanding this to the entire sense buffer, since it<br>
&gt;     includes how hard the drive tried to read the data on media and hardware<br>
&gt;     errors).  It doesn&#39;t include nvme data, but does include ata data (I&#39;ll have<br>
&gt;     to add that data, now that I&#39;ve noticed it is missing).  With the sense data<br>
&gt;     and the CDB you know what kind of error you got, plus what block didn&#39;t<br>
&gt;     read/write correctly. With the extended sense data, you can find out even<br>
&gt;     more details that are sense-key dependent...<br>
&gt; <br>
&gt;     So I&#39;m unsure that trying to shoehorn our imperfect knowledge of what&#39;s<br>
&gt;     retriable, fixable, should be written with zeros into the kernel and<br>
&gt;     converting that to a separate errno would give good results, and tapping<br>
&gt;     into this stream daemons that want to make more nuanced calls about disks<br>
&gt;     might be the better way to go. One of the things I&#39;m planning for $WORK is<br>
&gt;     to enable the retry time limit of one of the mode pages so that we fail<br>
&gt;     faster and can just delete the file with the &#39;bad&#39; block that we&#39;d get<br>
&gt;     eventually if we allowed the full, default error processing to run, but that<br>
&gt;     &#39;slow path&#39; processing kills performance for all other users of the<br>
&gt;     drive...  I&#39;m unsure how well that will work out (and I know I&#39;m lucky that<br>
&gt;     I can always recover any data for my application since it&#39;s just a cache).<br>
&gt; <br>
&gt;     I&#39;d be interested to hear what others have to say here thought, since my<br>
&gt;     focus on this data is through the lense of my rather specialized application...<br>
&gt; <br>
&gt;     Warner<br>
&gt; <br>
&gt;     P.S. That was generated with this rule if you wanted to play with it...<br>
&gt;     You&#39;d have to translate absolute disk blocks to a partition and an offset<br>
&gt;     into the filesystem, then give the filesystem a chance to tell you what of<br>
&gt;     its data/metadata that block is used for...<br>
&gt; <br>
&gt;     # Disk errors<br>
&gt;     notify 10 {<br>
&gt;              match &quot;system&quot;          &quot;CAM&quot;;<br>
&gt;              match &quot;subsystem&quot;       &quot;periph&quot;;<br>
&gt;              match &quot;device&quot;          &quot;[an]?da[0-9]+&quot;;<br>
&gt;              action &quot;logger -t diskerr -p <a href="http://daemon.info" rel="noreferrer noreferrer noreferrer" target="_blank">daemon.info</a> &lt;<a href="http://daemon.info" rel="noreferrer noreferrer noreferrer" target="_blank">http://daemon.info</a>&gt; $_<br>
&gt;     timestamp=$timestamp&quot;;<br>
&gt;     };<br>
&gt; <br>
<br>
</blockquote></div></div></div>

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoed3meq_z90aC=BP7RE_Gk%2BOq6K1sptO4E0s6jT_ge6Q>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation