From owner-freebsd-usb@freebsd.org Wed May 13 16:29:11 2020 Return-Path: Delivered-To: freebsd-usb@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 418992FA5CA for ; Wed, 13 May 2020 16:29:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 49Mg9M130zz4FgP for ; Wed, 13 May 2020 16:29:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 2208C2FA5C9; Wed, 13 May 2020 16:29:11 +0000 (UTC) Delivered-To: usb@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 21CF32FA5C8 for ; Wed, 13 May 2020 16:29:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49Mg9M07npz4FgN for ; Wed, 13 May 2020 16:29:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 003BB24300 for ; Wed, 13 May 2020 16:29:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 04DGTAdt077451 for ; Wed, 13 May 2020 16:29:10 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 04DGTAOd077450 for usb@FreeBSD.org; Wed, 13 May 2020 16:29:10 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: usb@FreeBSD.org Subject: [Bug 244356] Writing to a USB 3.0 stick is very slow Date: Wed, 13 May 2020 16:29:10 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: usb X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: needs-qa, performance X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: olivier.freebsd@free.fr X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: usb@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2020 16:29:11 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244356 --- Comment #36 from Olivier Certner --- (In reply to Sebastien Boisvert from comment #35) That `select` takes a while before returning is a symptom you saw, but `sel= ect` itself is not the problem's cause. Also, this symptom depends on the way da= ta is copied. For example, I do not have it in the same form with `dd`. Absolu= tely all the top-lines in truss' output for `dd` are `write` lines, there are no `select` lines at all. However, they show similar variability in latencies,= and long ones, that you have with `select`. Using `rsync`, I indeed see `select` calls as top lines as you do, so I can apparently "reproduce" the behavior = you see. As for getting stack traces (yes, this is not exactly what you asked for), I tried to use hwpmc(4) (see pmcstat(8) and pmccontrol(8)), but unfortunately= no performance counter class seems to be available for my machine. I saw that there are files for Ryzen processors in the source tree though, so I may be able to figure out to teach libpmc to use them (but maybe some code is real= ly missing). You could try on your machine (try typing `pmcstat -L` and `pmccontrol -L` first, after having loaded `hwpmc.ko`), but given it has a Ryzen 5, it is likely you'll get the same (non-)result as I. Another option is to use `procstat -k -k -p `. For example, here is wh= at I typically get when running this against a `rsync` process: PID TID COMM TDNAME KSTACK 78175 104627 dd - mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 getpbuf+0xb8 cluster_wbuild+0x745 cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec vn_write+0x260 vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f dofilewrite+0xb0 sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101 78175 104627 dd - mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 waitrunningbufspace+0x87 bufwrite+0x243 cluster_wbuild+0x54b cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec vn_write+0x260 vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f dofilewrite+0xb0 sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x1= 01 So I don't think we are much progressing here: Process is blocked on getting buffers, because (educated guess) all buffers are full when the stick is stalled. But I fear that things are going to be much more complicated than just gett= ing call stacks at several points, because several subsystems are involved, and= CAM is essentially said to be a hard to understand state machine (I haven't tri= ed to figure it out yet, though). Indeed, I've tried running `procstat` on `us= b` and `cam` threads, when `iostat` shows that traffic is happening, but didn't get anything meaningful (i.e., same output as when the stick is idle). For = the record, respective outputs: PID TID COMM TDNAME KSTACK 32 100170 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100171 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100172 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100173 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100174 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100176 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100177 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100178 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100179 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100180 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100214 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100215 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100216 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100217 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100218 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe PID TID COMM TDNAME KSTACK 31 100162 cam doneq0 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100163 cam doneq1 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100164 cam doneq2 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100165 cam doneq3 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100166 cam doneq4 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100341 cam scanner mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_scanner_thread+0x99 fork_exit+0x82 fork_trampoline+0xe r Dtrace itself could be used to produce call trees, and may actually be the = best chance to obtain useful ones for `usb` and `cam` threads, by triggering data collection inside certain functions. I'm trying to figure out how to do thi= s, but I won't have much time until next week, so don't expect quick news from= me. >From the elements we have, I think that HPS (comment #4) is probably on the right track. First thing now would be to determine what exactly CAM is wait= ing form (and in which state it is). And for this, I bet that the help of a CAM guru would save us a lot of time. --=20 You are receiving this mail because: You are the assignee for the bug.=