Date: Wed, 13 May 2020 16:29:10 +0000 From: bugzilla-noreply@freebsd.org To: usb@FreeBSD.org Subject: [Bug 244356] Writing to a USB 3.0 stick is very slow Message-ID: <bug-244356-19105-d5191NL0Ib@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-244356-19105@https.bugs.freebsd.org/bugzilla/> References: <bug-244356-19105@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244356 --- Comment #36 from Olivier Certner <olivier.freebsd@free.fr> --- (In reply to Sebastien Boisvert from comment #35) That `select` takes a while before returning is a symptom you saw, but `sel= ect` itself is not the problem's cause. Also, this symptom depends on the way da= ta is copied. For example, I do not have it in the same form with `dd`. Absolu= tely all the top-lines in truss' output for `dd` are `write` lines, there are no `select` lines at all. However, they show similar variability in latencies,= and long ones, that you have with `select`. Using `rsync`, I indeed see `select` calls as top lines as you do, so I can apparently "reproduce" the behavior = you see. As for getting stack traces (yes, this is not exactly what you asked for), I tried to use hwpmc(4) (see pmcstat(8) and pmccontrol(8)), but unfortunately= no performance counter class seems to be available for my machine. I saw that there are files for Ryzen processors in the source tree though, so I may be able to figure out to teach libpmc to use them (but maybe some code is real= ly missing). You could try on your machine (try typing `pmcstat -L` and `pmccontrol -L` first, after having loaded `hwpmc.ko`), but given it has a Ryzen 5, it is likely you'll get the same (non-)result as I. Another option is to use `procstat -k -k -p <pid>`. For example, here is wh= at I typically get when running this against a `rsync` process: PID TID COMM TDNAME KSTACK 78175 104627 dd - mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 getpbuf+0xb8 cluster_wbuild+0x745 cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec vn_write+0x260 vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f dofilewrite+0xb0 sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101 78175 104627 dd - mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 waitrunningbufspace+0x87 bufwrite+0x243 cluster_wbuild+0x54b cluster_write+0x12d ffs_write+0x3dd VOP_WRITE_APV+0xec vn_write+0x260 vn_io_fault_doio+0x43 vn_io_fault1+0x16a vn_io_fault+0x18f dofilewrite+0xb0 sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x1= 01 So I don't think we are much progressing here: Process is blocked on getting buffers, because (educated guess) all buffers are full when the stick is stalled. But I fear that things are going to be much more complicated than just gett= ing call stacks at several points, because several subsystems are involved, and= CAM is essentially said to be a hard to understand state machine (I haven't tri= ed to figure it out yet, though). Indeed, I've tried running `procstat` on `us= b` and `cam` threads, when `iostat` shows that traffic is happening, but didn't get anything meaningful (i.e., same output as when the stick is idle). For = the record, respective outputs: PID TID COMM TDNAME KSTACK 32 100170 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100171 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100172 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100173 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100174 usb usbus0 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100176 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100177 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100178 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100179 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100180 usb usbus1 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100214 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100215 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100216 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100217 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe 32 100218 usb usbus2 mi_switch+0xdd sleepq_wait+0x2c _cv_wait+0x151 usb_process+0x137 fork_exit+0x82 fork_trampoline+0xe PID TID COMM TDNAME KSTACK 31 100162 cam doneq0 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100163 cam doneq1 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100164 cam doneq2 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100165 cam doneq3 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100166 cam doneq4 mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_done_td+0x9b fork_exit+0x82 fork_trampoline+0xe 31 100341 cam scanner mi_switch+0xdd sleepq_wait+0x2c _sleep+0x246 xpt_scanner_thread+0x99 fork_exit+0x82 fork_trampoline+0xe r Dtrace itself could be used to produce call trees, and may actually be the = best chance to obtain useful ones for `usb` and `cam` threads, by triggering data collection inside certain functions. I'm trying to figure out how to do thi= s, but I won't have much time until next week, so don't expect quick news from= me. >From the elements we have, I think that HPS (comment #4) is probably on the right track. First thing now would be to determine what exactly CAM is wait= ing form (and in which state it is). And for this, I bet that the help of a CAM guru would save us a lot of time. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-244356-19105-d5191NL0Ib>