Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 May 2020 16:29:10 +0000
From:      bugzilla-noreply@freebsd.org
To:        usb@FreeBSD.org
Subject:   [Bug 244356] Writing to a USB 3.0 stick is very slow
Message-ID:  <bug-244356-19105-d5191NL0Ib@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-244356-19105@https.bugs.freebsd.org/bugzilla/>
References:  <bug-244356-19105@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244356

--- Comment #36 from Olivier Certner <olivier.freebsd@free.fr> ---
(In reply to Sebastien Boisvert from comment #35)

That `select` takes a while before returning is a symptom you saw, but `sel=
ect`
itself is not the problem's cause. Also, this symptom depends on the way da=
ta
is copied. For example, I do not have it in the same form with `dd`. Absolu=
tely
all the top-lines in truss' output for `dd` are `write` lines, there are no
`select` lines at all. However, they show similar variability in latencies,=
 and
long ones, that you have with `select`. Using `rsync`, I indeed see `select`
calls as top lines as you do, so I can apparently "reproduce" the behavior =
you
see.

As for getting stack traces (yes, this is not exactly what you asked for), I
tried to use hwpmc(4) (see pmcstat(8) and pmccontrol(8)), but unfortunately=
 no
performance counter class seems to be available for my machine. I saw that
there are files for Ryzen processors in the source tree though, so I may be
able to figure out to teach libpmc to use them (but maybe some code is real=
ly
missing). You could try on your machine (try typing `pmcstat -L` and
`pmccontrol -L` first, after having loaded `hwpmc.ko`), but given it has a
Ryzen 5, it is likely you'll get the same (non-)result as I.

Another option is to use `procstat -k -k -p <pid>`. For example, here is wh=
at I
typically get when running this against a `rsync` process:
  PID    TID COMM                TDNAME              KSTACK
78175 104627 dd                  -                   mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 getpbuf+0xb8 cluster_wbuild+0x745
cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec vn_write+0x260
vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f dofilewrite+0xb0
sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101
78175 104627 dd                  -                   mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 waitrunningbufspace+0x87 bufwrite+0x243
cluster_wbuild+0x54b cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec
vn_write+0x260 vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f
dofilewrite+0xb0 sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x1=
01
So I don't think we are much progressing here: Process is blocked on getting
buffers, because (educated guess) all buffers are full when the stick is
stalled.

But I fear that things are going to be much more complicated than just gett=
ing
call stacks at several points, because several subsystems are involved, and=
 CAM
is essentially said to be a hard to understand state machine (I haven't tri=
ed
to figure it out yet, though). Indeed, I've tried running `procstat` on `us=
b`
and `cam` threads, when `iostat` shows that traffic is happening, but didn't
get anything meaningful (i.e., same output as when the stick is idle). For =
the
record, respective outputs:
  PID    TID COMM                TDNAME              KSTACK
   32 100170 usb                 usbus0              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100171 usb                 usbus0              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100172 usb                 usbus0              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100173 usb                 usbus0              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100174 usb                 usbus0              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100176 usb                 usbus1              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100177 usb                 usbus1              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100178 usb                 usbus1              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100179 usb                 usbus1              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100180 usb                 usbus1              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100214 usb                 usbus2              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100215 usb                 usbus2              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100216 usb                 usbus2              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100217 usb                 usbus2              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
   32 100218 usb                 usbus2              mi_switch+0xdd
sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82
fork_trampoline+0xe
  PID    TID COMM                TDNAME              KSTACK
   31 100162 cam                 doneq0              mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82
fork_trampoline+0xe
   31 100163 cam                 doneq1              mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82
fork_trampoline+0xe
   31 100164 cam                 doneq2              mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82
fork_trampoline+0xe
   31 100165 cam                 doneq3              mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82
fork_trampoline+0xe
   31 100166 cam                 doneq4              mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82
fork_trampoline+0xe
   31 100341 cam                 scanner             mi_switch+0xdd
sleepq_wait+0x2c _sleep+0x246 xpt_scanner_thread+0x99 fork_exit+0x82
fork_trampoline+0xe
r

Dtrace itself could be used to produce call trees, and may actually be the =
best
chance to obtain useful ones for `usb` and `cam` threads, by triggering data
collection inside certain functions. I'm trying to figure out how to do thi=
s,
but I won't have much time until next week, so don't expect quick news from=
 me.

>From the elements we have, I think that HPS (comment #4) is probably on the
right track. First thing now would be to determine what exactly CAM is wait=
ing
form (and in which state it is). And for this, I bet that the help of a CAM
guru would save us a lot of time.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244356-19105-d5191NL0Ib>