From owner-freebsd-net@FreeBSD.ORG Wed Jul 11 09:19:25 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5E53D16A46D for ; Wed, 11 Jul 2007 09:19:25 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from s200aog13.obsmtp.com (s200aog13.obsmtp.com [207.126.144.127]) by mx1.freebsd.org (Postfix) with SMTP id 18E3F13C45D for ; Wed, 11 Jul 2007 09:19:21 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from source ([217.206.187.80]) by eu1sys200aob013.postini.com ([207.126.147.11]) with SMTP; Wed, 11 Jul 2007 09:19:16 UTC Received: from [10.0.0.89] (bill.mintel.co.uk [10.0.0.89]) by rodney.mintel.co.uk (Postfix) with ESMTP id 9D1AB18141B; Wed, 11 Jul 2007 10:19:15 +0100 (BST) Message-ID: <46949ED5.6000706@tomjudge.com> Date: Wed, 11 Jul 2007 10:11:49 +0100 From: Tom Judge User-Agent: Thunderbird 1.5.0.12 (X11/20070604) MIME-Version: 1.0 To: Tom Judge References: <46680DB1.9050905@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD9030414B1EC@NT-IRVA-0750.brcm.ad.broadcom.com> <466873FA.9030800@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD9030423EE13@NT-IRVA-0750.brcm.ad.broadcom.com> <46823A78.7020501@tomjudge.com> <4683C578.6070009@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD90304571430@NT-IRVA-0750.brcm.ad.broadcom.com> <09BFF2FA5EAB4A45B6655E151BBDD903045714EF@NT-IRVA-0750.brcm.ad.broadcom.com> <4684D5C0.3040709@tomjudge.com> <09BFF2FA5EAB4A45B6655E151BBDD90304571BF1@NT-IRVA-0750.brcm.ad.broadcom.com> <468A2D55.60301@tomjudge.com> <46937D97.7030507@tomjudge.com> <46938E75.80900@tomjudge.com> <4693C88F.8050204@tomjudge.com> In-Reply-To: <4693C88F.8050204@tomjudge.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Sepherosa Ziehau , freebsd-net , David Christensen Subject: Re: Problems with BCE network adapter (Dell PE2950) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2007 09:19:25 -0000 Tom Judge wrote: > Tom Judge wrote: >> Tom Judge wrote: >>> Tom Judge wrote: >>>> David Christensen wrote: >>>>>> Sorry for the top post, please try following patch: >>>>>> http://people.freebsd.org/~sephe/if_bce.c.diff >>>>>> >>>>>> This is probably the cause; I noticed it when bce(4) was ported to >>>>>> DragonFly. >>>>>> >>>>> >>>>> Thanks Sephe, I think you're on to something. I have some >>>>> debug code in the driver to simulate mbuf allocation >>>>> failures and when I enable that I start receiving the same >>>>> error messages Tom reported (along with various kernel >>>>> panics), but when I include your change the system seems >>>>> to keep humming along. I'll certainly add your code into an update >>>>> shortly. >>>>> >>>>> Dave >>>>> >>>> >>>> I'm not going to have a chance to test this patch until next week >>>> but I will let you know what the results are. >>>> >>>> Tom >>> >>> >>> So here goes, after 2 days testing we have come up with the >>> following data. >>> >>> The configuration >>> >>> [PE[12]950] ----> [PowerConnect 5324] >>> >>> The system is running 8192 byte Jumbo Frames. >>> >>> sultan# ifconfig bce0 >>> bce0: flags=8847 mtu 8192 >>> options=3b >>> inet 172.31.0.28 netmask 0xffffff00 broadcast 172.31.0.255 >>> inet 172.31.0.163 netmask 0xffffffff broadcast 172.31.0.163 >>> ether 00:19:b9:e4:4d:cc >>> media: Ethernet autoselect (1000baseTX ) >>> status: active >>> >>> >>> After applying both David and Sephe's patches I have yet to get a >>> system in a state where it is stable with jumbo frames enabled, the >>> systems crash almost immediately after the switch changes the port >>> state (Spanning tree) from LEARNING to FORWARDING. The output from >>> this crash can be found attached as crash-1.txt.gz. >>> >>> If the frame size is left at 1500 then the interface seems stable, >>> however I can't fully test this as the interface is connected to a >>> GigE only network with an mtu of 8192. >>> >>> If BCE_DEBUG is remove from if_bcereg.h then the system just exhibits >>> the original problem and may or may not crash. >>> >>> The next test was to try the kernel with BCE_DEBUG and with the >>> following extra patch (so that the driver does not jump to the >>> breakpoint when an unexpected mbuf is found in the rx buffer). >>> >>> --- if_bce.c (revision 62) >>> +++ if_bce.c (revision 66) >>> @@ -4050,7 +4050,8 @@ >>> DBRUNIF((!(rxbd->rx_bd_flags & >>> RX_BD_FLAGS_END)), >>> BCE_PRINTF("%s(%d): Unexpected mbuf >>> found in rx_bd[0x%04X]!\n", >>> __FILE__, __LINE__, sw_chain_cons); >>> - bce_breakpoint(sc)); >>> + bce_dump_mbuf(sc, m)); >>> +// bce_breakpoint(sc)); >>> >>> /* >>> * ToDo: If the received packet is small enough >>> >>> >>> With this patch the system boots and does not crash straight away, >>> however it is almost completely unusable. The output with this >>> kernel can be found attached as crash-2.txt.gz. Also this causes the >>> following new error message: >>> >>> fgrep -n leak crash-2.txt >>> 3194:bce0: /usr/src/sys/dev/bce/if_bce.c(3842): Memory leak! Lost 114 >>> mbufs from rx chain! >>> >>> Has no one else come across this problem, or are Jumbo frames not >>> widely used? >>> >>> Tom >>> >> It would seem that the crash can be simulated just by increasing the >> MTU above 1500 (tested in single user mode). >> > > > Ok so I think I have fix the problem with the rx_bd tracking. I have > ported rboyer's patch to NetBSD's bnx driver to FreeBSD (patch > attached). The patch seems to get rid of two problems: > > 1) Unexpected mbuf in rx_bd > 2) Too many free rx_bd's > > > However I am still faced with the problem of frames with missing > ethernet headers: > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(0), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:7B69AC00, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86F76000 > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F76000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: discard frame w/o leading ethernet header (len 4294967292 pkt len > 4294967292) > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(0), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:5EB48B00, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86F73000 > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F73000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: discard frame w/o leading ethernet header (len 4294967292 pkt len > 4294967292) > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(27745), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:5E9DDC00, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86EF8000 > 0x00: 2C 6F 75 3D 50 65 72 73 6F 6E 61 6C 2C 6F 75 3D > 0x10: 47 72 6F 75 70 73 2C 6F 3D 4D 69 6E 74 65 6C 30 > 0x20: 28 30 11 04 02 63 6E 31 0B 04 09 63 62 75 74 74 > 0x30: 72 6F 73 65 30 13 04 09 67 69 64 4E 75 6D 62 65 > 0x40: 72 31 06 04 04 31 31 30 38 30 5E 02 01 02 64 59 > 0x50: 04 31 63 6E 3D 6D 63 61 68 6D 2C 6F 75 3D 4C 6F > 0x60: 6E 64 6F 6E 2C 6F 75 3D 50 65 72 73 6F 6E 61 6C > 0x70: 2C 6F 75 3D 47 72 6F 75 70 73 2C 6F 3D 4D 69 6E > bce0: - m_pkthl er0ror > 9 67 69 64 4E 75 6D 62 65 72 31 06 04 04 31 30 > 0x70: 33 37 30 62 02 01 02 64 5D 04 33 63 6E 3D 72 63 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86E8C000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: /usr/src/sys/dev/bce/if_bce.c(4081): Unexpected mbuf found in > rx_bd[0x002A]! > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(28515), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:5AB4C800, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86F28000 > 0x00: 30 0E 04 02 63 6E 31 08 04 06 63 6F 68 61 72 61 > 0x10: 30 13 04 09 67 69 64 4E 75 6D 62 65 72 31 06 04 > 0x20: 04 31 30 37 32 30 65 02 01 02 64 60 04 35 63 6E > 0x30: 3D 6A 70 69 65 6B 61 72 73 2C 6F 75 3D 43 68 69 > 0x40: 63 61 67 6F 2C 6F 75 3D 50 65 72 73 6F 6E 61 6C > 0x50: 2C 6F 75 3D 47 72 6F 75 70 73 2C 6F 3D 4D 69 6E > 0x60: 74 65 6C 30 27 30 10 04 02 63 6E 31 0A 04 08 6A > 0x70: 70 69 65 6B 61 72 73 30 13 04 09 67 69 64 4E 75 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F28000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: /usr/src/sys/dev/bce/if_bce.c(4081): Unexpected mbuf found in > rx_bd[0x002E]! > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(28460), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:5EB9F200, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86F70000 > 0x00: 04 32 63 6E 3D 69 6E 65 73 73 2C 6F 75 3D 43 68 > 0x10: 69 63 61 67 6F 2C 6F 75 3D 50 65 72 73 6F 6E 61 > 0x20: 6C 2C 6F 75 3D 47 72 6F 75 70 73 2C 6F 3D 4D 69 > 0x30: 6E 74 65 6C 30 24 30 0D 04 02 63 6E 31 07 04 05 > 0x40: 69 6E 65 73 73 30 13 04 09 67 69 64 4E 75 6D 62 > 0x50: 65 72 31 06 04 04 31 31 34 32 30 67 02 01 02 64 > 0x60: 62 04 36 63 6E 3D 70 6D 63 6E 61 6D 61 72 61 2C > 0x70: 6F 75 3D 43 68 69 63 61 67 6F 2C 6F 75 3D 50 65 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F70000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: /usr/src/sys/dev/bce/if_bce.c(4081): Unexpected mbuf found in > rx_bd[0x0032]! > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(28787), Max(9022) > bce0: mbuf: vaddr = 0xFFFFFF00:5AB4CA00, m_len = 9216, m_flags = ( M_EXT > M_PKTHDR ) m_data = 0xFFFFFFFF:86F6D000 > 0x00: 02 01 02 64 57 04 30 63 6E 3D 73 70 79 65 2C 6F > 0x10: 75 3D 4C 6F 6E 64 6F 6E 2C 6F 75 3D 50 65 72 73 > 0x20: 6F 6E 61 6C 2C 6F 75 3D 47 72 6F 75 70 73 2C 6F > 0x30: 3D 4D 69 6E 74 65 6C 30 23 30 0C 04 02 63 6E 31 > 0x40: 06 04 04 73 70 79 65 30 13 04 09 67 69 64 4E 75 > 0x50: 6D 62 65 72 31 06 04 04 31 32 30 39 30 59 02 01 > 0x60: 02 64 54 04 2F 63 6E 3D 71 61 2C 6F 75 3D 43 68 > 0x70: 69 63 61 67 6F 2C 6F 75 3D 50 65 72 73 6F 6E 61 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F6D000, ext_size = 9216, type = > EXT_JUMBO9 > bce0: /usr/src/sys/dev/bce/if_bce.c(4128): Unusual frame size found. > Min(60), Actual(12855), Max(9022) > bce0: mbuf: vaddr =0 67 02 01 02 64 > 0x60: 62 04 36 63 6E 3D 70 6D 63 6E 61 6D 61 72 61 2C > 0x70: 6F 75 3D 43 68 69 63 61 67 6F 2C 6F 75 3D 50 65 > bce0: - m_pkthdr: flags = ( ) csum_flags = ( ) > bce0: - m_ext: vaddr = 0xFFFFFFFF:86F70000, ext_size = 9216, type = > EXT_JUMBO9 > > > > if_bnx.c - 1.4 -> 1.5 LOG: > > RX buffers are malloced memory of 9216 bytes. This can require from 1 to > 4 DMA memory segments, depending on how the buffer is in memory. > When receiving a packet, we allocate a new one to remplace the one we've > used. It can need more segments than the one it remplace, leading to > corrution of the RX descriptors, and a panic in bus_dmamap_sync() > (DIAGNOSTIC > kernels) or possibly memory corruption. > > Fix: > - bce_get_buf() allocates as many buffer as possible, checking the number > of free RX descriptors. Because one receive buffer is not guaranteed to > be remplaced on receive, call bce_get_buf() from bce_tick() too. > This also improve error handling from bce_get_buf(). > - use MCLGET() instead of MEXTMALLOC() if we're running with the standard > ethernet MTU. This gives us more receive buffers and waste less memory. > > > Seem to be moving in the right direction slowly. > > It seems I missed the rx_bd error, it is still present with this patch. Tom