From owner-freebsd-net@FreeBSD.ORG Thu Feb 3 22:04:16 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C9ADB1065672; Thu, 3 Feb 2011 22:04:16 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from imr-ma02.mx.aol.com (imr-ma02.mx.aol.com [64.12.206.40]) by mx1.freebsd.org (Postfix) with ESMTP id 8E1E08FC1C; Thu, 3 Feb 2011 22:04:16 +0000 (UTC) Received: from imo-ma03.mx.aol.com (imo-ma03.mx.aol.com [64.12.78.138]) by imr-ma02.mx.aol.com (8.14.1/8.14.1) with ESMTP id p13Lre2A000511; Thu, 3 Feb 2011 16:53:40 -0500 Received: from dieterbsd@engineer.com by imo-ma03.mx.aol.com (mail_out_v42.9.) id n.eb6.64b9962 (44667); Thu, 3 Feb 2011 16:53:37 -0500 (EST) Received: from smtprly-me01.mx.aol.com (smtprly-me01.mx.aol.com [64.12.95.102]) by cia-mc01.mx.aol.com (v129.8) with ESMTP id MAILCIAMC016-b2944d4b23dd35d; Thu, 03 Feb 2011 16:53:36 -0500 Received: from web-mmc-d01 (web-mmc-d01.sim.aol.com [205.188.103.67]) by smtprly-me01.mx.aol.com (v129.8) with ESMTP id MAILSMTPRLYME014-b2944d4b23dd35d; Thu, 03 Feb 2011 16:53:33 -0500 To: bug-followup@FreeBSD.org Content-Transfer-Encoding: quoted-printable Date: Thu, 03 Feb 2011 16:53:32 -0500 X-AOL-IP: 67.206.163.54 X-MB-Message-Source: WebUI Received: from 67.206.163.54 by web-mmc-d01.sysops.aol.com (205.188.103.67) with HTTP (WebMailUI); Thu, 03 Feb 2011 16:53:32 -0500 MIME-Version: 1.0 From: dieterbsd@engineer.com X-MB-Message-Type: User Content-Type: text/plain; charset="us-ascii"; format=flowed X-Mailer: Mail.com Webmail 33189-STANDARD Message-Id: <8CD9203EF6F6EE0-1760-B81@web-mmc-d01.sysops.aol.com> X-Spam-Flag: NO X-AOL-SENDER: dieterbsd@engineer.com Cc: freebsd-net@freebsd.org, freebsd@sopwith.solgatos.com, freebsd-firewire@freebsd.org, freebsd-drivers@freebsd.org, freebsd-usb@freebsd.org Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Feb 2011 22:04:16 -0000 New data: USB causes the same problem that firewire does. FreeBSD 8.0 amd64 nfe0: port 0xb400-0xb407=20 mem 0xfebf9000-0xfebf9fff irq 23 at device 10.0 on pci0 bge0: mem 0xfe4f0000-0xfe4fffff irq 19 at device 0.0 on pci5 fwohci1: mem 0xfdeff000-0xfdefffff irq 19 at device=20 8.0 on pci2 ehci0: mem 0xfebfe000-0xfebfe0ff=20 irq 22 at device 2.1 on pci0 Plugging in a USB_to_RS-232 bridge generates: ugen0.2: at usbus0 uplcom0: on usbus0 And networking is locked out for too long, and data is lost. (The node=20 sending data via Ethernet has too small a transmit buffer, but is a closed source=20 closed hardware black box and cannot be fixed. Once data is lost it is lost forever,=20 it cannot be recreated, so this is very bad.) The FreeBSD box receiving data has a=20 very large receive buffer, but if another device driver locks out networking it=20 doesn't help. Changing the printf(9)s to log(9)s fixes the problem, but this is not a=20 good workaround, there are LOTS of printfs in the kernel, I keep hitting new ones. =20 Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause the problem, but=20 printfs from device drivers do. (usb, firewire, ...) My theory is there is a locking conflict. Here is lock profiling data for USB_RS-232 bridge being plugged in=20 printfs locking out networking: debug.lock.prof.enable: 0 debug.lock.prof.reset: 0 debug.lock.prof.stats: max wait_max total wait_total count avg wait_avg=20 cnt_hold cnt_lock name 251240 0 499982 0 303 1650 0 0= =20 0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:0123456789ABCDEF -=20 USB device SX lock) 104255 0 110429 0 2 55214 0 0= =20 0 /usr/src/sys/dev/usb/usb_device.c:2459 (sx:123456789ABCDEF - USB= =20 config SX lock) 104217 0 29670949 0 47087 630 0 0= =20 0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) =20 <---------<<<< 101962 0 103089 0 27 3818 0 0= =20 0 /usr/src/sys/kern/kern_condvar.c:145 (sleep mutex:Giant) 51071 0 81088 0 3 27029 0 0= =20 0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx) 23049 0 31351 0 2 15675 0 0= =20 0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs) 12408 0 978544 0 971 1007 0 0= =20 0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs) 11466 0 744372 0 11567 64 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait) 11141 0 501155 0 3850 130 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait) 6115 0 61720 0 26 2373 0 0= =20 0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:123456789ABCDEF - USB= =20 device SX lock) 5551 0 16749 0 19 881 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer) 3043 0 3749 0 6 624 0 0= =20 0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1321 (sleep mutex:struct mount= =20 mtx) 2333 0 9901 0 36 275 0 0= =20 0 /usr/src/sys/kern/vfs_mount.c:2231 (sleep mutex:struct mount mtx) 2055 0 9462 0 100 94 0 0= =20 0 /usr/src/sys/kern/subr_hints.c:117 (sleep mutex:kernel=20 environment) 1649 0 79272 0 3846 20 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:2963 (sleep mutex:vm object) 1636 0 36196 0 15515 2 0 0= =20 0 /usr/src/sys/vm/vm_page.c:1052 (sleep mutex:vm page queue free=20 mutex) 1205 0 55601 0 15411 3 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:2545 (sleep mutex:bufobj interlock) 1076 0 1076 0 1 1076 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:ufs) 1069 0 31117 0 30 1037 0 0= =20 0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:nfe0) 848 0 3752 0 1359 2 0 0= =20 0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1297 (sleep mutex:vnode=20 interlock) 754 0 10016 0 3846 2 0 0= =20 0 /usr/src/sys/ufs/ffs/ffs_softdep.c:4274 (sleep mutex:Softdep=20 Lock) 739 0 2463 0 4 615 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:devfs) 540 0 3370 0 1365 2 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:3179 (sleep mutex:vnode interlock) 525 0 80852 0 160 505 0 0= =20 0 /usr/src/sys/dev/uart/uart_cpu.h:92 (spin mutex:uart_hwmtx) 512 0 2065 0 23 89 0 0= =20 0 /usr/src/sys/kern/vfs_default.c:599 (lockmgr:bufwait) 487 0 97058 0 481 201 0 0= =20 0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait) 467 0 2069 0 12 172 0 0= =20 0 /usr/src/sys/dev/usb/usb_busdma.c:547 (sleep mutex:uplcom) 455 0 2978 0 18 165 0 0= =20 0 /usr/src/sys/dev/usb/usb_transfer.c:286 (sleep mutex:uplcom) 423 0 727 0 2 363 0 0= =20 0 /usr/src/sys/vm/uma_core.c:1565 (sleep mutex:UMA lock) 367 0 230166 0 47087 4 0 0= =20 0 /usr/src/sys/netinet/tcp_usrreq.c:708 (rw:tcpinp) 358 0 1387 0 11 126 0 0= =20 0 /usr/src/sys/kern/kern_synch.c:241 (sleep mutex:Giant) 339 0 72085 0 46119 1 0 0= =20 0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:so_rcv) 332 0 225434 0 93206 2 0 0= =20 0 /usr/src/sys/kern/uipc_socket.c:1441 (sleep mutex:so_rcv) 332 0 19529 0 11551 1 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1927 (sleep mutex:bufobj interlock) 331 0 162523 0 94887 1 0 0= =20 0 /usr/src/sys/netinet/tcp_output.c:280 (sleep mutex:so_snd) 330 0 73512 0 47087 1 0 0= =20 0 /usr/src/sys/kern/uipc_socket.c:1843 (sleep mutex:so_rcv) 329 0 88069 0 47771 1 0 0= =20 0 /usr/src/sys/kern/uipc_socket.c:1687 (sleep mutex:so_rcv) 329 0 348 0 12 29 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1955 (sleep mutex:Syncer mtx) 320 0 66185 0 3353 19 0 0= =20 0 /usr/src/sys/kern/vfs_cluster.c:868 (lockmgr:bufwait) 303 0 7410 0 4360 1 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:3944 (sleep mutex:bufobj interlock) debug.lock.prof.rejected: 0 debug.lock.prof.skipcount: 0 debug.lock.prof.skipspin: 0 Here is lock profiling data for firewire bus reset printfs locking out=20 Ethernet: max wait_max total wait_total count avg wait_avg=20 cnt_hold cnt_lock name 373345 0 397495 0 643 618 0 0= =20 0 /usr/src/sys/kern/vfs_lookup.c:497 (lockmgr:ufs) 373325 0 390815 0 3140 124 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:2218 (sleep mutex:vnode interlock) 359514 0 32199808 0 45582 706 0 0= =20 0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) =20 <---------<<<< 63827 320 746196 740 70067 10 0 0= =20 4 /usr/src/sys/dev/firewire/fwohci.c:2107 (sleep mutex:firewire) 49927 0 165929 0 8 20741 0 0= =20 0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs) 33358 0 1707461 0 1106 1543 0 0= =20 0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs) 30294 0 401224 0 2641 151 0 0= =20 0 /usr/src/sys/vm/vm_map.c:3526 (sx:user map) 28432 0 934681 0 74 12630 0 0= =20 0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx) 28296 0 66892 0 2641 25 0 0= =20 0 /usr/src/sys/vm/vm_fault.c:297 (sleep mutex:vm object) 25020 0 26667 0 45 592 0 0= =20 0 /usr/src/sys/kern/kern_exit.c:825 (sx:proctree) 19027 0 1225496 0 11937 102 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait) 18887 0 906297 0 4140 218 0 0= =20 0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait) 12101 0 23459 0 1675 14 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:2342 (sleep mutex:vnode interlock) 7448 0 170841 0 2229 76 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:2083 (lockmgr:ufs) 7342 0 21792 0 2169 10 0 0= =20 0 /usr/src/sys/kern/vfs_cache.c:390 (rw:Name Cache) 4648 0 10831 0 11 984 0 0= =20 0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer) 3536 0 35352 0 9218 3 0 0= =20 0 /usr/src/sys/amd64/amd64/pmap.c:3989 (sleep mutex:pmap) 3039 0 34283 0 2641 12 0 0= =20 0 /usr/src/sys/vm/vm_fault.c:937 (sleep mutex:vm object) 2825 0 53949 0 3495 15 0 0= =20 0 /usr/src/sys/vm/vm_fault.c:1010 (sleep mutex:vm object) 2286 0 7169 0 547 13 0 0= =20 0 /usr/src/sys/kern/kern_sig.c:983 (sleep mutex:process lock) 2083 4010 9037 5380 2143 4 2 0= =20 2 /usr/src/sys/kern/vfs_subr.c:2118 (sleep mutex:vnode interlock) 2083 57 2089 57 4 522 14 0= =20 1 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:pipe mutex) 2022 0 181449 0 515 352 0 0= =20 0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait) 1997 0 3397 0 2 1698 0 0= =20 0 /usr/src/sys/kern/kern_sysctl.c:1513 (sx:sysctl mem) 1994 0 4890 0 117 41 0 0= =20 0 /usr/src/sys/kern/kern_sysctl.c:1521 (sx:sysctl lock) 1971 0 4156 0 13 319 0 0= =20 0 /usr/src/sys/kern/kern_sysctl.c:1417 (sleep mutex:Giant) 1965 0 31838 0 3471 9 0 0= =20 0 /usr/src/sys/vm/vm_object.c:482 (sleep mutex:vm object) 1926 0 17695 0 83 213 0 0= =20 0 /usr/src/sys/vm/vm_object.c:541 (sleep mutex:vm object) There is at least one line in common that looks networking related,=20 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) int sblock(struct sockbuf *sb, int flags) { KASSERT((flags & SBL_VALID) =3D=3D flags, ("sblock: flags invalid (0x%x)", flags)); if (flags & SBL_WAIT) { if ((sb->sb_flags & SB_NOINTR) || (flags & SBL_NOINTR)) { sx_xlock(&sb->sb_sx); return (0); } return (sx_xlock_sig(&sb->sb_sx)); } else { if (sx_try_xlock(&sb->sb_sx) =3D=3D 0) return (EWOULDBLOCK); return (0); } } It is not clear to me what this code is doing. (comments might help) It is not clear to me why networking and a USB_to_RS-232 bridge would have a lock conflict. It is not clear to me why networking and firewire would have a lock conflict. How do we fix this?