From owner-freebsd-usb@FreeBSD.ORG Tue Jun 14 00:58:46 2011 Return-Path: Delivered-To: freebsd-usb@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEAAF106564A for ; Tue, 14 Jun 2011 00:58:46 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 856788FC18 for ; Tue, 14 Jun 2011 00:58:46 +0000 (UTC) Received: (qmail 51250 invoked by uid 0); 14 Jun 2011 00:58:44 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 14 Jun 2011 00:58:44 -0000 Received: (qmail 51244 invoked by uid 90); 14 Jun 2011 00:58:44 -0000 Received: from unknown (HELO freemac) (spork@bway.net@96.57.144.66) by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 14 Jun 2011 00:58:44 -0000 Date: Mon, 13 Jun 2011 20:58:44 -0400 (EDT) From: Charles Sprickman X-X-Sender: spork@freemac To: Hans Petter Selasky In-Reply-To: <201106131108.42959.hselasky@c2i.net> Message-ID: References: <201106122245.44386.hselasky@c2i.net> <201106131108.42959.hselasky@c2i.net> User-Agent: Alpine 2.00 (OSX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-usb@freebsd.org Subject: Re: ucom/uftdi high interrupt load X-BeenThere: freebsd-usb@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD support for USB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2011 00:58:46 -0000 On Mon, 13 Jun 2011, Hans Petter Selasky wrote: > On Sunday 12 June 2011 23:50:24 Charles Sprickman wrote: >> On Sun, 12 Jun 2011, Hans Petter Selasky wrote: >>> On Saturday 11 June 2011 23:43:11 Charles Sprickman wrote: >>>> Hello, >>>> >>>> We ran into an odd problem last week with our serial consoles after >>>> moving the USB to serial adapters from an old 4.11 box to a box running >>>> 8.1. We have two boxes that incorporate (I assume) hubs and a bunch of >>>> FTDI serial interfaces. One has 16 ports, the other 8. Each is >>>> plugged directly into a USB port on the rear of the mainboard. We run >>>> conserver[1] to handle access to the serial ports. From what I've >>>> observed, this application opens the ports when the daemon starts - it >>>> logs any output (handy for panics, or anything else that might spit >>>> interesting info to the console) and waits for clients to connect to >>>> it. >>>> >>>> Everything had been working fine for a few weeks. The box was rebooted >>>> recently to enable PostgreSQL to start normally (bumped SHM stuff in >>>> loader.conf). After six days, we found that the consoles were >>>> unresponsive. Restarting conserver brought us this each time we >>>> connected to a console for full read/write access: >>>> >>>> [Thu Jun 9 10:04:59 2011] conserver (50113): ERROR: [h22] >>>> open(/dev/ttyU4): Interrupted system call: forcing down >>>> [Thu Jun 9 10:04:59 2011] conserver (50112): ERROR: [h21] >>>> open(/dev/ttyU11): Interrupted system call: forcing down >>>> >>>> All devices still appeared in /dev. Stopping conserver and confirming >>>> it and all child processes were gone and then using picocom and cu >>>> yielded no response on the serial ports. >>>> >>>> We also found (after the fact) that around the time the consoles became >>>> unresponsive, cpu usage went to nearly 90% and was mostly in the kernel >>>> process "intr": >>>> >>>> root 12 70.5 0.0 0 136 ?? WL Fri12AM 120:01.47 [intr] >>>> >>>> A graph showing cpu usage (red is "system"): >>>> http://i.imgur.com/0yO5l.png >>>> >>>> I should note that we know the cpu spike and devices becoming >>>> unresponsive can be correlated because one of the serial ports runs a >>>> temperature monitor which is tied into our monitoring. When the data >>>> goes stale, we get notified. >>>> >>>> Issuing a "usbconfig -u 0 reset" caused all devices except for the root >>>> hub to disappear and not come back. CPU usage also dipped a bit after >>>> that. Rebooting was the only way to resolve the issue - perhaps >>>> plugging and unplugging would have worked, but that's a bit too complex >>>> for our remote hands. >>>> >>>> I can supply full dmesg and more, but for now, here's a summary of the >>>> usb info from dmesg: >>>> >>>> FreeBSD 8.1-RELEASE #7: Wed Dec 22 00:49:50 EST 2010 >>>> >>>> ohci0: mem 0xfe9fc000-0xfe9fcfff irq 10 >>>> at device 15.2 on pci0 >>>> ohci0: [ITHREAD] >>>> ... >>>> usbus0: on ohci0 >>>> usbus0: 12Mbps Full Speed USB v1.0 >>>> ugen0.1: <(0x1166)> at usbus0 >>>> ... >>>> uhub0: <(0x1166) OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on >>>> usbus0 >>>> uhub0: 4 ports with 4 removable, self powered >>>> ugen0.2: at usbus0 >>>> uhub1: on >>>> usbus0 uhub1: 7 ports with 7 removable, self powered >>>> ugen0.3: at usbus0 >>>> uftdi0: on usbus0 >>>> uftdi1: on usbus0 >>>> ugen0.4: at usbus0 >>>> uftdi2: on usbus0 >>>> uftdi3: on usbus0 >>>> ugen0.5: at usbus0 >>>> uftdi4: on usbus0 >>>> uftdi5: on usbus0 >>>> ugen0.6: at usbus0 >>>> uftdi6: on usbus0 >>>> uftdi7: on usbus0 >>>> ugen0.7: at usbus0 >>>> uftdi8: on usbus0 >>>> uftdi9: on usbus0 >>>> ugen0.8: at usbus0 >>>> uftdi10: on usbus0 >>>> uftdi11: on usbus0 >>>> ugen0.9: at usbus0 >>>> uhub2: on >>>> usbus0 uhub2: 4 ports with 4 removable, self powered >>>> ugen0.10: at usbus0 >>>> uftdi12: on usbus0 >>>> uftdi13: on usbus0 >>>> ugen0.11: at usbus0 >>>> ... (mangling below is as it appears in dmesg) >>>> da1 at sym0 bus 0 scbus0 target 1 lun 0uftdi14: >>> ADAPTER> on usbus0 >>>> da1: Fixed Direct Access SCSI-3 device >>>> uftdi15: on usbus0 >>>> ... >>>> Root mount waiting for: usbus0 >>>> ugen0.12: at usbus0 >>>> uhub3: >>>> on usbus0 >>>> Root mount waiting for: usbus0 >>>> uhub3: 7 ports with 7 removable, self powered >>>> ugen0.13: at usbus0 >>>> uftdi16: on usbus0 >>>> Root mount waiting for: usbus0 >>>> ugen0.14: at usbus0 >>>> uftdi17: on usbus0 >>>> Root mount waiting for: usbus0 >>>> ugen0.15: at usbus0 >>>> uftdi18: on usbus0 >>>> >>>> ugen0.16: at usbus0Root mount waiting for: >>>> usbus0 >>>> >>>> uftdi19: on usbus0 >>>> ugen0.17: at usbus0 >>>> uftdi20: on usbus0 >>>> Root mount waiting for: usbus0 >>>> ugen0.18: at usbus0 >>>> uftdi21: on usbus0 >>>> Root mount waiting for: usbus0 >>>> ugen0.19: at usbus0 >>>> uhub4: >>>> on usbus0 >>>> uhub4: 4 ports with 4 removable, self powered >>>> Root mount waiting for: usbus0 >>>> ugen0.20: at usbus0 >>>> uftdi22: on usbus0 >>>> Root mount waiting for: usbus0 >>>> ugen0.21: at usbus0 >>>> uftdi23: on usbus0 >>>> Trying to mount root from zfs:zroot >>>> >>>> Thanks, >>>> >>>> Charles >>> >>> Hi, >>> >>> Try to get output from vmstat -i. >>> >>> Also try to set the: >>> >>> hw.usb.ehci.iaadbug=1 >>> >>> and >>> >>> hw.usb.ehci.lostintrbug=1 >>> >>> in /boot/loader.conf >> >> Quick question - this host only has USB 1.1 - so I don't think I have >> ehci, do I? To be honest, I'm not totally clear on what's what as far as >> uhci, ehci, and ohci. This is what's seen in sysctl output related to >> USB: >> >> [spork@h12 ~]$ sysctl -a|grep usb|more >> hw.pci.usb_early_takeover: 1 >> hw.usb.no_boot_wait: 0 >> hw.usb.debug: 0 >> hw.usb.usb_lang_mask: 255 >> hw.usb.usb_lang_id: 9 >> hw.usb.template: 0 >> hw.usb.power_timeout: 30 >> hw.usb.ucom.cons_baud: 9600 >> hw.usb.ucom.cons_unit: -1 >> dev.usbus.0.%desc: OHCI (generic) USB controller >> dev.usbus.0.%driver: usbus >> dev.usbus.0.%parent: ohci0 >> dev.uhub.0.%parent: usbus0 > > Ok, then those quirks won't help. > > For OHCI, I guess you should check vmstat -i. Oddly enough, the box paniced today, but it appeared to be related to fxp. However in the coredump summary, I have "vmstat -i" output, and ohci seems fairly high in comparison to everything else: vmstat -i interrupt total rate irq4: uart0 106 0 irq10: ohci0 142322001 968176 irq14: ata0 1178 8 irq20: fxp0 3008691 20467 irq21: fxp1 1733357 11791 irq28: sym1 30 0 irq29: sym0 2624749 17855 cpu0: timer 728063100 4952810 cpu1: timer 728044684 4952684 Total 1605797896 10923795 Also, just a brief summary of the panic, since it mentions the interrupt process again: #7 0x8059139b in fxp_new_rfabuf (sc=0x8564c000, rxp=0x8564c1c0) at /usr/src/sys/dev/fxp/if_fxp.c:2611 #8 0x8059285b in fxp_intr (xsc=0x8564c000) at /usr/src/sys/dev/fxp/if_fxp.c:1931 #9 0x8067b1db in intr_event_execute_handlers (p=0x8553d7f8, ie=0x8557d080) at /usr/src/sys/kern/kern_intr.c:1220 #10 0x8067c8eb in ithread_loop (arg=0x856525d0) at /usr/src/sys/kern/kern_intr.c:1233 #11 0x80678f11 in fork_exit (callout=0x8067c880 , arg=0x856525d0, frame=0xd80e7d38) at /usr/src/sys/kern/kern_fork.c:844 #12 0x80931de0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 And also unrelated to usb, but fairly bizarre "netstat -m" output: 18446744073709550887/1355/626/25600 mbuf clusters in use (current/cache/total/max) 18014398509480560K/3497K/2073K bytes allocated to network (current/cache/total) Sorry for all the extra noise, but I'm not adept enough at determining whether this panic was usb related or fxp related. Thanks, Charles > --HPS >