From owner-freebsd-net@FreeBSD.ORG Thu Jun 28 14:29:34 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A352F16A46B for ; Thu, 28 Jun 2007 14:29:34 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from s200aog13.obsmtp.com (s200aog13.obsmtp.com [207.126.144.127]) by mx1.freebsd.org (Postfix) with SMTP id DFAE613C44B for ; Thu, 28 Jun 2007 14:29:22 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from source ([217.206.187.80]) by eu1sys200aob013.postini.com ([207.126.147.11]) with SMTP; Thu, 28 Jun 2007 14:29:16 UTC Received: from [10.0.0.89] (bill.mintel.co.uk [10.0.0.89]) by rodney.mintel.co.uk (Postfix) with ESMTP id 10CF718141E; Thu, 28 Jun 2007 15:29:16 +0100 (BST) Message-ID: <4683C578.6070009@tomjudge.com> Date: Thu, 28 Jun 2007 15:28:08 +0100 From: Tom Judge User-Agent: Thunderbird 1.5.0.12 (X11/20070604) MIME-Version: 1.0 To: Tom Judge References: <46680DB1.9050905@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD9030414B1EC@NT-IRVA-0750.brcm.ad.broadcom.com> <466873FA.9030800@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD9030423EE13@NT-IRVA-0750.brcm.ad.broadcom.com> <46823A78.7020501@tomjudge.com> In-Reply-To: <46823A78.7020501@tomjudge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net , David Christensen Subject: Re: Problems with BCE network adapter (Dell PE2950) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jun 2007 14:29:34 -0000 Dave, Sorry for the top post, but I have just managed to repeat is exact crash twice on a new PE 1950 system. I have core files available. It seems that after a couple of reboots the problem goes away. The system actually crashed 4 times but 2 of the cores where corrupt. It also seems that the system will be stable if the following message is not produced shortly after /etc/rc.d/netif start: bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 > 0x01FE)! I have attached the chip information bellow. Any help with this would be appreciated as we now have 21 systems PE[12]950 systems which randomly crash due to the original error bce0: discard frame w/o leading ethernet header (len 4294967292 pkt len 4294967292) Tom PE 2950 Chips: bce0@pci9:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11 hdr=0x00 vendor = 'Broadcom Corporation' class = network subclass = ethernet -- bce1@pci5:0:0: class=0x020000 card=0x01b21028 chip=0x164c14e4 rev=0x11 hdr=0x00 vendor = 'Broadcom Corporation' class = network subclass = ethernet PE1950 Chips: bce0@pci9:0:0: class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12 hdr=0x00 vendor = 'Broadcom Corporation' class = network subclass = ethernet -- bce1@pci5:0:0: class=0x020000 card=0x01b31028 chip=0x164c14e4 rev=0x12 hdr=0x00 vendor = 'Broadcom Corporation' class = network subclass = ethernet Tom Judge wrote: > David Christensen wrote: >> Tom, >> >> There's already some debug code to watch for unusual size packets. >> If you can recompile the driver from HEAD with the attached diffs >> we can printout the first 128 bytes of any unusual sized packets. >> >> This does enabled other debugging code so performance will drop >> but that should be OK since this doesn't present as a performance >> problem. >> >> Dave >> > > I am currently running the driver from RELENG_6 (With the MSI code > backed out and your patch applied by hand) on a 6.2-p5 amd64 system > (Dell PE2950) and have managed to get the following crash. > > The crash was caused by "cat * >/dev/null" in an NFS mounted directory. > > I'm not sure if this is the same crash but some other boxes (identical) > to this one have crashed first time they are rebooted with the new > driver. Unfortunately I have not managed to get a dump from one of these > crashes yet. > > Also I am seeing a lot of these messages on boxes running this driver: > > bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5! > > It seems to be caused by NFS traffic. > > I still have the core file if you need any more information. > > Tom > > kgdb /usr/obj/usr/src/sys/PE2950/kernel.debug vmcore.0 > [GDB will not be able to debug user-mode threads: > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd". > > Unread portion of the kernel message buffer: > bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xD0F5! > bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x2F0A! > > bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFFB > > 0x01FE)! > bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0xF043! > bce0: /usr/src/sys/dev/bce/if_bce.c(3489): Too many free rx_bd (0xFFF9 > > 0x01FE)! > bce0: bce_rx_intr(): Invalid TCP/UDP checksum = 0x8C5F! > bce0: /usr/src/sys/dev/bce/if_bce.c(3973): Unexpected mbuf found in > rx_bd[0x005A]! > bce0: ---------------------------- Driver State > ---------------------------- > bce0: 0xFFFFFFFF:8B92A000 - (sc) driver softc structure virtual address > bce0: 0xFFFFFF00:F4000000 - (sc->bce_vhandle) PCI BAR virtual address > bce0: 0xFFFFFF00:009E3680 - (sc->status_block) status block virtual address > bce0: 0xFFFFFF00:009D6400 - (sc->stats_block) statistics block virtual > address > bce0: 0xFFFFFFFF:8B92A1B0 - (sc->tx_bd_chain) tx_bd chain virtual adddress > bce0: 0xFFFFFFFF:8B92A1E8 - (sc->rx_bd_chain) rx_bd chain virtual address > bce0: 0xFFFFFFFF:8B92B260 - (sc->tx_mbuf_ptr) tx mbuf chain virtual address > bce0: 0xFFFFFFFF:8B92D260 - (sc->rx_mbuf_ptr) rx mbuf chain virtual address > bce0: 0x0000357F - (sc->interrupts_generated) h/w intrs > bce0: 0x00002981 - (sc->rx_interrupts) rx interrupts handled > bce0: 0x0000212A - (sc->tx_interrupts) tx interrupts handled > bce0: 0x0000706B - (sc->last_status_idx) status block index > bce0: 0x0000675E - (sc->tx_prod) tx producer index > bce0: 0x00006707 - (sc->tx_cons) tx consumer index > bce0: 0x001B39EA - (sc->tx_prod_bseq) tx producer bseq index > bce0: 0x0000F25C - (sc->rx_prod) rx producer index > bce0: 0x0000F059 - (sc->rx_cons) rx consumer index > bce0: 0x0B850C00 - (sc->rx_prod_bseq) rx producer bseq index > bce0: 0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated > bce0: 0x0000FFF8 - (sc->free_rx_bd) free rx_bd's > bce0: 0x00000000/000001FE - (sc->rx_low_watermark) rx low watermark > bce0: 0x0000001D - (sc->txmbuf_alloc) tx mbufs allocated > bce0: 0x000000AB - (sc->rx_mbuf_alloc) rx mbufs allocated > bce0: 0x00000057 - (sc->used_tx_bd) used tx_bd's > bce0: 0x000001FE/000001FE - (sc->tx_hi_watermark) tx hi watermark > bce0: 0x00000000 - (sc->mbuf_alloc_failed) failed mbuf alloc > bce0: > ------------------------------------------------------------------------ > bce0: ---------------------------- Status Block > ---------------------------- > bce0: attn_bits = 0x00000001, attn_bits_ack = 0x00000001, index = 0x70BF > bce0: rx_cons0 = 0x0000F061, tx_cons0 = 0x0000675E > bce0: status_idx = 0x70BF > bce0: > ------------------------------------------------------------------------ > > > Fatal trap 3: breakpoint instruction fault while in kernel mode > cpuid = 4; apic id = 04 > instruction pointer = 0x8:0xffffffff801ee956 > stack pointer = 0x10:0xffffffffb6d60b40 > frame pointer = 0x10:0x5a > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, IOPL = 0 > current process = 27 (irq16: bce0 bce1) > trap number = 3 > panic: breakpoint instruction fault > cpuid = 4 > Uptime: 3m10s > Dumping 8191 MB (3 chunks) > chunk 0: 1MB (156 pages) ... ok > chunk 1: 3327MB (851624 pages) 3311 3295 3279 3263 3247 3231 3215 3199 > 3183 31 > > #0 doadump () at pcpu.h:172 > 172 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:172 > #1 0x0000000000000004 in ?? () > #2 0xffffffff8029e0e7 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:409 > #3 0xffffffff8029e781 in panic (fmt=0xffffff021ef0a4c0 > "?\206?\036\002?????\036\002???") at /usr/src/sys/kern/kern_shutdown.c:565 > #4 0xffffffff803f9e3f in trap_fatal (frame=0xffffff021ef0a4c0, > eva=18446742983307069104) at /usr/src/sys/amd64/amd64/trap.c:660 > #5 0xffffffff803fa2e2 in trap (frame= > {tf_rdi = 0, tf_rsi = -2139025408, tf_rdx = 1, tf_rcx = 1915683, > tf_r8 = 1048064, tf_r9 = 10, tf_rax = 79, tf_rbx = -1953325056, tf_rbp = > 90, tf_r10 = -1227486624, tf_r11 = 4294967208, tf_r12 = -1953325056, > tf_r13 = 90, tf_r14 = 61537, tf_r15 = 61530, tf_trapno = 3, tf_addr = 0, > tf_flags = -1099501259136, tf_err = 0, tf_rip = -2145457834, tf_cs = 8, > tf_rflags = 642, tf_rsp = -1227486384, tf_ss = 16}) at > /usr/src/sys/amd64/amd64/trap.c:469 > #6 0xffffffff803e55fb in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:168 > #7 0xffffffff801ee956 in bce_breakpoint (sc=0xffffffff8b92a000) at > cpufunc.h:63 > #8 0xffffffff801ef0f6 in bce_intr (xsc=0x0) at > /usr/src/sys/dev/bce/if_bce.c:3970 > #9 0xffffffff80284919 in ithread_loop (arg=0xffffff00009e4000) at > /usr/src/sys/kern/kern_intr.c:682 > #10 0xffffffff802830b7 in fork_exit (callout=0xffffffff802847d0 > , arg=0xffffff00009e4000, frame=0xffffffffb6d60c50) at > /usr/src/sys/kern/kern_fork.c:821 > #11 0xffffffff803e595e in fork_trampoline () at > /usr/src/sys/amd64/amd64/exception.S:394 > #12 0x0000000000000000 in ?? () > #13 0x0000000000000000 in ?? () > #14 0x0000000000000001 in ?? () > #15 0x0000000000000000 in ?? () > #16 0x0000000000000000 in ?? () > #17 0x0000000000000000 in ?? () > #18 0x0000000000000000 in ?? () > #19 0x0000000000000000 in ?? () > > #44 0x00000000007f3000 in ?? () > #45 0xffffff021ef286b0 in ?? () > #46 0x0000000000000104 in ?? () > #47 0x0000000000000000 in ?? () > #48 0xffffff021ef286b0 in ?? () > #49 0xffffff021ef68000 in ?? () > #50 0xffffffffb6d60848 in ?? () > #51 0xffffff021ef0a4c0 in ?? () > #52 0xffffffff802b4856 in sched_switch (td=0xffffff00009e4000, > newtd=0x0, flags=0) at /usr/src/sys/kern/sched_4bsd.c:973 > > #124 0x0000000000000000 in ?? () > Cannot access memory at address 0xffffffffb6d61000 > (kgdb) frame 8 > #8 0xffffffff801ef0f6 in bce_intr (xsc=0x0) at > /usr/src/sys/dev/bce/if_bce.c:3970 > 3970 DBRUNIF((!(rxbd->rx_bd_flags & > RX_BD_FLAGS_END)), > (kgdb) list > 3965 > 3966 /* The mbuf is stored with the last rx_bd entry > of a packet. */ > 3967 if (sc->rx_mbuf_ptr[sw_chain_cons] != NULL) { > 3968 > 3969 /* Validate that this is the last rx_bd. */ > 3970 DBRUNIF((!(rxbd->rx_bd_flags & > RX_BD_FLAGS_END)), > 3971 BCE_PRINTF("%s(%d): Unexpected > mbuf found in rx_bd[0x%04X]!\n", > 3972 __FILE__, __LINE__, sw_chain_cons); > 3973 bce_breakpoint(sc)); > 3974 > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"