Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Feb 2022 12:11:01 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Sean Bruno <sbruno@freebsd.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: USB Disk Stalls on -current
Message-ID:  <CANCZdfpWwP5oSh8ktgj9hBnhpn%2BRR1HaEAY7sLQSTtQOw-AHGA@mail.gmail.com>
In-Reply-To: <60ebd011-c2b8-3524-1476-123f11128ffe@freebsd.org>
References:  <7e8459e4-d708-7750-402c-cda2adf6199f@freebsd.org> <CANCZdfqG-%2B9dfFz-%2BeezZaqbPQN5-mQpw%2B214CkiKC%2B_kmW2ig@mail.gmail.com> <60ebd011-c2b8-3524-1476-123f11128ffe@freebsd.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno <sbruno@freebsd.org> wrote:

>
>
> >
> >
> > So there's some tools you can use. For usb, there's usbdump that can
> > get you the USB transactions. I've not used it enough to give more
> details
> > here. This will let you know what's going on, and when, on the USB
> endpoint.
> >
> > You can also enable the CAM_IOSCHED stuff. This will allow you to get
> > latency
> > measurements for 'requests in the sim' which basically will tell you
> > what your
> > latency spread is for the drives. This will tell you if things are
> > getting caught
> > up in the USB layer, or after CAM's da driver completes the I/O request
> > (granted, that's almost certainly not happening, but it will help you
> > figure out
> > what's going on and put numbers to the oddities you are seeing).
> >
> > Also, make sure you have good cables. I've had lots of hicups over the
> > years from dodgy USB cables. Also make sure you have good, high quality
> > enclosures. Many from the USB2 time-period are sketchy at best and I
> > went through several at one point trying to find a good one. I'd be
> > tempted to
> > get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
> > here, but you need a USB-3 controller to get USB-3 speeds which might not
> > be compatible with the NUC's built-in stuff (though my NUC has one USB3
> > port, there's lots of different models).
> >
> > Usually, though, I see weirdness associated with dmesg messages from
> > usb, cam, etc when the hardware is on the sketch end.
> >
> > Warner
>
> I'm assuming that I have a fairly dodgy USB device, as the pauses seem
> to correspond to this from CAM being emitted:
>
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
> 00 36 69 02 6e 00 00 80 00
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
> request completed with an error
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
> 2 more tries remain
>
>
> Things resume after this is emitted, but there is a substantial
> (multiple minutes) pause here.  I would assume that timeouts would fire
> much quicker.
>

The default timeout is 60s.

You can reduce that substantially by setting kern.cam.da.default_timeout
to a smaller level. Disk operations completed within 5s these days,
except spin ups. Heck, nearly all complete within 500ms. You
might try setting this value to maybe 3 or 5 or 10 to see if that helps the
hiccups without introducing extra retries when the load is heavy. The
smaller values give a faster recovery, but too small a number may result
in timeouts and errors under load. I think you need to set this as a
tuneable.

Warner

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno &lt;<a href="mailto:sbruno@freebsd.org">sbruno@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
&gt; <br>
&gt; <br>
&gt; So there&#39;s some tools you can use. For usb, there&#39;s usbdump that can<br>
&gt; get you the USB transactions. I&#39;ve not used it enough to give more details<br>
&gt; here. This will let you know what&#39;s going on, and when, on the USB endpoint.<br>
&gt; <br>
&gt; You can also enable the CAM_IOSCHED stuff. This will allow you to get <br>
&gt; latency<br>
&gt; measurements for &#39;requests in the sim&#39; which basically will tell you <br>
&gt; what your<br>
&gt; latency spread is for the drives. This will tell you if things are <br>
&gt; getting caught<br>
&gt; up in the USB layer, or after CAM&#39;s da driver completes the I/O request<br>
&gt; (granted, that&#39;s almost certainly not happening, but it will help you <br>
&gt; figure out<br>
&gt; what&#39;s going on and put numbers to the oddities you are seeing).<br>
&gt; <br>
&gt; Also, make sure you have good cables. I&#39;ve had lots of hicups over the<br>
&gt; years from dodgy USB cables. Also make sure you have good, high quality<br>
&gt; enclosures. Many from the USB2 time-period are sketchy at best and I<br>
&gt; went through several at one point trying to find a good one. I&#39;d be <br>
&gt; tempted to<br>
&gt; get USB 3 enclosures. I&#39;ve had better luck with USB3 gear than USB2 gear<br>
&gt; here, but you need a USB-3 controller to get USB-3 speeds which might not<br>
&gt; be compatible with the NUC&#39;s built-in stuff (though my NUC has one USB3<br>
&gt; port, there&#39;s lots of different models).<br>
&gt; <br>
&gt; Usually, though, I see weirdness associated with dmesg messages from<br>
&gt; usb, cam, etc when the hardware is on the sketch end.<br>
&gt; <br>
&gt; Warner<br>
<br>
I&#39;m assuming that I have a fairly dodgy USB device, as the pauses seem <br>
to correspond to this from CAM being emitted:<br>
<br>
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28 <br>
00 36 69 02 6e 00 00 80 00<br>
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB <br>
request completed with an error<br>
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command, <br>
2 more tries remain<br>
<br>
<br>
Things resume after this is emitted, but there is a substantial <br>
(multiple minutes) pause here.  I would assume that timeouts would fire <br>
much quicker.<br></blockquote><div><br></div><div>The default timeout is 60s.</div><div><br></div><div>You can reduce that substantially by setting kern.cam.da.default_timeout</div><div>to a smaller level. Disk operations completed within 5s these days,</div><div>except spin ups. Heck, nearly all complete within 500ms. You</div><div>might try setting this value to maybe 3 or 5 or 10 to see if that helps the</div><div>hiccups without introducing extra retries when the load is heavy. The</div><div>smaller values give a faster recovery, but too small a number may result</div><div>in timeouts and errors under load. I think you need to set this as a tuneable.<br></div><div><br></div><div>Warner<br></div></div></div>
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpWwP5oSh8ktgj9hBnhpn%2BRR1HaEAY7sLQSTtQOw-AHGA>