Date: Wed, 22 Jun 2022 21:53:03 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-arm@FreeBSD.org Subject: [Bug 264836] arm/arm/busdma_machdep-v6.c: bounce page accounting leak (noticed with high traffic ftdi usb serial devices) Message-ID: <bug-264836-7@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264836 Bug ID: 264836 Summary: arm/arm/busdma_machdep-v6.c: bounce page accounting leak (noticed with high traffic ftdi usb serial devices) Product: Base System Version: 13.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: arm Assignee: freebsd-arm@FreeBSD.org Reporter: jcfyecrayz@liamekaens.com In bus_dmamap_unload(), the counters for free_bpages and reserved_bpages ap= pear to be vulnerable to unprotected read-modify-write operations that result in accounting that looks like a page leak. This was noticed on a 2GB quad core i.MX6 system that has more than one dev= ice attached via FTDI based USB serial connection. This system happens to be u= sing FTDI US4232H quad port chips, but the problem is more general. There is a latency timer setting in FTDI chips that is used to set the inte= rval at which short packets of data are flushed from the USB endpoint by the FTDI chip (which has some internal buffer memory). The default latency is 16 ms= .=20 We had set the latency to 4 ms to get data more quickly. We started noticing problems with slower USB responses and eventually the network stack would be affected as well. In the system in question, it fai= rly reliably "locked up" (couldn't ssh any more, trouble spawning processes when logged in on the serial port). In the locked up state, the usb/usbus0.xplr thread of the usb system process was hung and the system could not process usb messages (this i.MX6 system h= as an ehci USB controller). The typical stack dump for usbus0.xplr when things were hung is: 13 100029 usb usbus0.xplr sched_switch+0x9d4 mi_switch+0x184 sleepq_wait+0x2c _cv_wait+0x1bc usbd_do_request_flags+0x4bc usbd_req_get_port_status+0x44 uhub_explore+0xc4 uhub_explore+0x8f8 uhub_explore+0x8f8 usb_bus_explore+0x150 usb_process+0x124 fork_exit+0xc0 swi_exit+0 Once we noticed that hw.busdma.zone0.free_pages was steadily decrementing - eventually down to zero - we started investigating (dtrace was helpful here) why there appeared to be a leak of bounce pages. That's when we found what appears to be the vulnerability in bus_dmamap_unload(). This code has been this way for more than a decade, b= ut it takes a lot of transactions for this to occur and was particularly hard = to find. For a long time, we would work around the problem by detecting the symptoms and just reboot this system to recover - hardly ideal. It would t= ake weeks to months depending on USB traffic load. Adjusting the FTDI latency timer to 0 ms (force packet delivery on every high speed microframe) finally made this happen more quickly. Even if someone else has enough traffic to experience the same problem,=20 I will submit a patch for review. Early results seem promising (in particu= lar the free bounce page accounting now does not show what looks like a leak). This was originally noticed quite a while ago on 11.x, but it has been confirmed on 13.x as well. As an indication of system load, with the 0 ms latency timer we see more th= an 100 bounce pages per second (based on hw.busdma.zone0.total_bounced), and t= he load due to interrupts is about 15%. The high rate of bounce page and interrupt activity gives lots of good opportunity for preemption at just the right time to trigger the accounting leak. Work sponsored by: Microchip Technology, Inc. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264836-7>