From nobody Sun Feb 6 20:05:14 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C290719AB091 for ; Sun, 6 Feb 2022 20:05:57 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JsKyr3mjbz3JY7; Sun, 6 Feb 2022 20:05:56 +0000 (UTC) (envelope-from m.e.sanliturk@gmail.com) Received: by mail-oi1-x229.google.com with SMTP id m10so15015852oie.2; Sun, 06 Feb 2022 12:05:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZS+4VQB1OfpqOWC1g+yH3gCh4jPhOwetxMt3VhWf9Nw=; b=MAqArU/wOew/bYajQZ2ntjgF5fw3VjQIUQw3tN4W6JQCnPkv6H0hLzpx9NMUi9tqHD d7z5tFKAiRpTRxsuDkghKJL9SxCRzOeosOWaHow1ai7AhlSiJj4b7aqDgiQJpi+cxxVp mem7sSi2bS/0/gEJsShEU/3Uii6rmryeJE4WHpyprVeCpO+Zmm+TIfjjyq1fgJCWITN0 9Tzz15qTp4Pk8quSXqIKKycpYfcRTo+wTsKDBJU2fUrt+j+FXB1/vyo7WBjlFCPI+dv8 mWIywLdImPDlPpJgeV1EpmQEOUR4AjwBCKjyxKDzcql/zPgvG1bh4Pt8+s4EgeHmMFmx jcyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZS+4VQB1OfpqOWC1g+yH3gCh4jPhOwetxMt3VhWf9Nw=; b=v4mWhDJRZInbvOL8PsCGFT9OEnQ0isizwho/ddvsF5tE5jSkdPEjhcLu8kR9xRdwUX QI78Iewq6cHUwKiDOqSH1NEIOHnlZgg/4tsGjaST0b8Lmb6Yrg5h9gspjqyyfsjLDOW4 ENkPRowQ0w4Z7PVlO0obMxDtKT0AuhssiIVDs5XZl1jojZn9gvGfARSs1HkeGzLNtyCK SQNFLGLl+M8n6lLC6+C/kGJuBfwaymAClCCCDIsFCqfSjjWlFUrSFgz2Tlm9B5NNS+R/ T7VVChXptZO98U5jhAxmKvtF4qi3W7gbywsX0oSzcf11yeSVHxCDPZ4nHmb+aVmu8ELY cDqQ== X-Gm-Message-State: AOAM533sOM/uE4GrA5lTQAQMAKGIfFpPO00HljUdTrSfvbBu8AJVnBmD zu0PpqvMyBR0gz0G7yTRmv7mXgqd1Yq0R/reydVL3ddmd4I= X-Google-Smtp-Source: ABdhPJwohNq5T6r/GXTO4eee0urfSWOImcspIHMfUtizGxhYiXebmwKeZRyz3gJfyeS0QWp0jUSqYTAWMdn/f5hI56Q= X-Received: by 2002:a05:6808:d4b:: with SMTP id w11mr5911660oik.62.1644177950446; Sun, 06 Feb 2022 12:05:50 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <7e8459e4-d708-7750-402c-cda2adf6199f@freebsd.org> <60ebd011-c2b8-3524-1476-123f11128ffe@freebsd.org> In-Reply-To: From: Mehmet Erol Sanliturk Date: Sun, 6 Feb 2022 23:05:14 +0300 Message-ID: Subject: Re: USB Disk Stalls on -current To: Warner Losh Cc: Sean Bruno , freebsd-current Content-Type: multipart/alternative; boundary="000000000000af1a4d05d75f0223" X-Rspamd-Queue-Id: 4JsKyr3mjbz3JY7 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b="MAqArU/w"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of mesanliturk@gmail.com designates 2607:f8b0:4864:20::229 as permitted sender) smtp.mailfrom=mesanliturk@gmail.com X-Spamd-Result: default: False [-3.97 / 15.00]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; NEURAL_HAM_MEDIUM(-0.99)[-0.989]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; NEURAL_HAM_LONG(-0.99)[-0.986]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::229:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MLMMJ_DEST(0.00)[freebsd-current]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N --000000000000af1a4d05d75f0223 Content-Type: text/plain; charset="UTF-8" On Sun, Feb 6, 2022 at 10:11 PM Warner Losh wrote: > > > On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno wrote: > >> >> >> > >> > >> > So there's some tools you can use. For usb, there's usbdump that can >> > get you the USB transactions. I've not used it enough to give more >> details >> > here. This will let you know what's going on, and when, on the USB >> endpoint. >> > >> > You can also enable the CAM_IOSCHED stuff. This will allow you to get >> > latency >> > measurements for 'requests in the sim' which basically will tell you >> > what your >> > latency spread is for the drives. This will tell you if things are >> > getting caught >> > up in the USB layer, or after CAM's da driver completes the I/O request >> > (granted, that's almost certainly not happening, but it will help you >> > figure out >> > what's going on and put numbers to the oddities you are seeing). >> > >> > Also, make sure you have good cables. I've had lots of hicups over the >> > years from dodgy USB cables. Also make sure you have good, high quality >> > enclosures. Many from the USB2 time-period are sketchy at best and I >> > went through several at one point trying to find a good one. I'd be >> > tempted to >> > get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear >> > here, but you need a USB-3 controller to get USB-3 speeds which might >> not >> > be compatible with the NUC's built-in stuff (though my NUC has one USB3 >> > port, there's lots of different models). >> > >> > Usually, though, I see weirdness associated with dmesg messages from >> > usb, cam, etc when the hardware is on the sketch end. >> > >> > Warner >> >> I'm assuming that I have a fairly dodgy USB device, as the pauses seem >> to correspond to this from CAM being emitted: >> >> Feb 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28 >> 00 36 69 02 6e 00 00 80 00 >> Feb 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB >> request completed with an error >> Feb 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command, >> 2 more tries remain >> >> >> Things resume after this is emitted, but there is a substantial >> (multiple minutes) pause here. I would assume that timeouts would fire >> much quicker. >> > > The default timeout is 60s. > > You can reduce that substantially by setting kern.cam.da.default_timeout > to a smaller level. Disk operations completed within 5s these days, > except spin ups. Heck, nearly all complete within 500ms. You > might try setting this value to maybe 3 or 5 or 10 to see if that helps the > hiccups without introducing extra retries when the load is heavy. The > smaller values give a faster recovery, but too small a number may result > in timeouts and errors under load. I think you need to set this as a > tuneable. > > Warner > Are your external disks "GREEN" , i.e. , "energy saver" kind . If the external disks are energy saver kind , they will start to sleep when they are not used for a while , and waking them up will take time which causes significant distress , because to use them requires waiting every such wake up . At that point another important trouble is slowness of USB external disks with respect to internal ( non-energy saver ) SATA disks . When response time is important , it is necessary to avoid such "GREEN" disks . Mehmet Erol Sanliturk --000000000000af1a4d05d75f0223 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sun, Feb 6, 2022 = at 10:11 PM Warner Losh <imp@bsdimp.co= m> wrote:


On Sun, Feb 6, 2022 at 12:02 PM Sean= Bruno <sbruno@f= reebsd.org> wrote:


>
>
> So there's some tools you can use. For usb, there's usbdump th= at can
> get you the USB transactions. I've not used it enough to give more= details
> here. This will let you know what's going on, and when, on the USB= endpoint.
>
> You can also enable the CAM_IOSCHED stuff. This will allow you to get =
> latency
> measurements for 'requests in the sim' which basically will te= ll you
> what your
> latency spread is for the drives. This will tell you if things are > getting caught
> up in the USB layer, or after CAM's da driver completes the I/O re= quest
> (granted, that's almost certainly not happening, but it will help = you
> figure out
> what's going on and put numbers to the oddities you are seeing). >
> Also, make sure you have good cables. I've had lots of hicups=C2= =A0over the
> years from dodgy USB cables. Also make sure you have good, high qualit= y
> enclosures. Many from the USB2 time-period are sketchy at best and I > went through several at one point trying to find a good one. I'd b= e
> tempted to
> get USB 3 enclosures. I've had better luck with USB3 gear than USB= 2 gear
> here, but you need a USB-3 controller to get USB-3 speeds which might = not
> be compatible with the NUC's built-in stuff (though my NUC has one= USB3
> port, there's lots of different models).
>
> Usually, though, I see weirdness associated with dmesg messages from > usb, cam, etc when the hardware is on the sketch end.
>
> Warner

I'm assuming that I have a fairly dodgy USB device, as the pauses seem =
to correspond to this from CAM being emitted:

Feb=C2=A0 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 2= 8
00 36 69 02 6e 00 00 80 00
Feb=C2=A0 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB =
request completed with an error
Feb=C2=A0 6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command= ,
2 more tries remain


Things resume after this is emitted, but there is a substantial
(multiple minutes) pause here.=C2=A0 I would assume that timeouts would fir= e
much quicker.

The default timeout is 60= s.

You can reduce that substantially by setting ke= rn.cam.da.default_timeout
to a smaller level. Disk operations com= pleted within 5s these days,
except spin ups. Heck, nearly all co= mplete within 500ms. You
might try setting this value to maybe 3 = or 5 or 10 to see if that helps the
hiccups without introducing e= xtra retries when the load is heavy. The
smaller values give a fa= ster recovery, but too small a number may result
in timeouts and = errors under load. I think you need to set this as a tuneable.

Warner



Are your external disks=C2=A0 &q= uot;GREEN" , i.e. ,=C2=A0 "energy saver" kind .

If the external disks are energy saver kind , they w= ill start to sleep when they are not
used for a while , and = waking them up will take time which causes significant distress ,
because to use them requires waiting every such wake up=C2=A0 .
=

At that point another important trouble is sl= owness of USB external disks
with respect to internal (= non-energy saver ) SATA disks .

Whe= n response time is important , it is necessary to avoid such "GREEN&qu= ot; disks .



Mehmet Erol Sanliturk

=


=C2=A0
--000000000000af1a4d05d75f0223--