Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jun 2022 21:53:03 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-arm@FreeBSD.org
Subject:   [Bug 264836] arm/arm/busdma_machdep-v6.c: bounce page accounting leak (noticed with high traffic ftdi usb serial devices)
Message-ID:  <bug-264836-7@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264836

            Bug ID: 264836
           Summary: arm/arm/busdma_machdep-v6.c: bounce page accounting
                    leak (noticed with high traffic ftdi usb serial
                    devices)
           Product: Base System
           Version: 13.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: arm
          Assignee: freebsd-arm@FreeBSD.org
          Reporter: jcfyecrayz@liamekaens.com

In bus_dmamap_unload(), the counters for free_bpages and reserved_bpages ap=
pear
to be vulnerable to unprotected read-modify-write operations that result in
accounting that looks like a page leak.

This was noticed on a 2GB quad core i.MX6 system that has more than one dev=
ice
attached via FTDI based USB serial connection.  This system happens to be u=
sing
FTDI US4232H quad port chips, but the problem is more general.

There is a latency timer setting in FTDI chips that is used to set the inte=
rval
at which short packets of data are flushed from the USB endpoint by the FTDI
chip (which has some internal buffer memory).  The default latency is 16 ms=
.=20
We had set the latency to 4 ms to get data more quickly.

We started noticing problems with slower USB responses and eventually the
network stack would be affected as well.  In the system in question, it fai=
rly
reliably "locked up" (couldn't ssh any more, trouble spawning processes when
logged in on the serial port).

In the locked up state, the usb/usbus0.xplr thread of the usb system process
was hung and the system could not process usb messages (this i.MX6 system h=
as
an ehci USB controller).

The typical stack dump for usbus0.xplr when things were hung is:

   13 100029 usb                 usbus0.xplr         sched_switch+0x9d4
mi_switch+0x184 sleepq_wait+0x2c _cv_wait+0x1bc usbd_do_request_flags+0x4bc
usbd_req_get_port_status+0x44 uhub_explore+0xc4 uhub_explore+0x8f8
uhub_explore+0x8f8 usb_bus_explore+0x150 usb_process+0x124 fork_exit+0xc0
swi_exit+0

Once we noticed that hw.busdma.zone0.free_pages was steadily decrementing -
eventually down to zero - we started investigating (dtrace was helpful here)
why there appeared to be a leak of bounce pages.

That's when we found what appears to be the vulnerability in
bus_dmamap_unload().  This code has been this way for more than a decade, b=
ut
it takes a lot of transactions for this to occur and was particularly hard =
to
find.  For a long time, we would work around the problem by detecting the
symptoms and just reboot this system to recover - hardly ideal.  It would t=
ake
weeks to months depending on USB traffic load.  Adjusting the FTDI latency
timer to 0 ms (force packet delivery on every high speed microframe) finally
made this happen more quickly.

Even if someone else has enough traffic to experience the same problem,=20

I will submit a patch for review.  Early results seem promising (in particu=
lar
the free bounce page accounting now does not show what looks like a leak).

This was originally noticed quite a while ago on 11.x, but it has been
confirmed on 13.x as well.

As an indication of system load, with the 0 ms latency timer we see more th=
an
100 bounce pages per second (based on hw.busdma.zone0.total_bounced), and t=
he
load due to interrupts is about 15%.  The high rate of bounce page and
interrupt activity gives lots of good opportunity for preemption at just the
right time to trigger the accounting leak.

Work sponsored by: Microchip Technology, Inc.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264836-7>