Date: Tue, 23 Jul 2013 14:10:30 +0200 From: =?ISO-8859-1?Q?S=E9bastien_RICCIO?= <sr@swisscenter.com> To: Steven Hartland <killing@multiplay.co.uk>, freebsd-net@freebsd.org Subject: Re: FreeBSD 9.1 and BCM57711 issues (broadcom 10ge ethernet card) Message-ID: <51EE72B6.6020905@swisscenter.com> In-Reply-To: <7D8CE344ACD04EC8B8C7DC813D1EA3F6@multiplay.co.uk> References: <51EE68E9.5010805@swisscenter.com> <7D8CE344ACD04EC8B8C7DC813D1EA3F6@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Steve, Not yet. As the gol was to prepare a system to be production ready, I went with the latest stable... well at least I thought that 9.1 was the latest stable. I'm going to try it. But that would mean that the driver in the 9.2-prerelease kernel has been modified. Going to take a look at the change logs if I can find it :) Thanks for your help. Cheers, Sébastien On 23.07.2013 13:56, Steven Hartland wrote: > Have you tried a more recent version e.g. 9.2-PRERELEASE or 9/stable? > > Regards > Steve > > ----- Original Message ----- From: "Sébastien RICCIO" > <sr@swisscenter.com> > To: <freebsd-net@freebsd.org> > Sent: Tuesday, July 23, 2013 12:28 PM > Subject: FreeBSD 9.1 and BCM57711 issues (broadcom 10ge ethernet card) > > > Hi freebsd-net! > > We recently installed FreeBSD 9.1 64bit on a Dell PowerEdge R510 system > in which we have two BCM57711 (for a total of four 10Gbit interfaces.) > > We're planning to use it as a storage filer using ZFS/NFS. > > Actually in test, the filer is connected with two 10gigs interfaces to a > 10ge Dell PowerConnect switch that serves some linux clients using 10ge > cards too. > > We get into a lot of troubles trying to get something working out of > this setup. > > -- > > First issue: > > Without any special tweaking, when we're reading or writing to the NFS > server from a client, the network card crashes and become. In the logs I > can see: > > Jul 19 11:49:26 filer-01-a kernel: bxe0: ---------- Begin crash dump > ---------- > Jul 19 11:49:26 filer-01-a kernel: bxe0: > ------------------------------ Idle Check ------------------------------ > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39 > CID_CAM 0x7 Value is 0xc > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit > is not equal to initial credit. Values are 0xf8 0x140 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is > not equal to initial credit. Values are 0x5a1c 0x8000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM > is not empty. Value is 0x1 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM > is not empty. Value is 0x1 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty. > Value is 0x3 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is > not 64. Value is 0x30 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4 > error(s) and 0 warning(s)! > Jul 19 11:49:26 filer-01-a kernel: bxe0: > ------------------------------------------------------------------------ > Jul 19 11:49:26 filer-01-a kernel: bxe0: > ------------------------------ Idle Check ------------------------------ > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CFC: AC > 1 - LCID 39 > CID_CAM 0x7 Value is 0xc > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: VOQ_0, VOQ credit > is not equal to initial credit. Values are 0xf8 0x140 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING QM: P0 Byte credit is > not equal to initial credit. Values are 0x5a1c 0x8000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING CCM: XX protection CAM > is not empty. Value is 0x1 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING XCM: XX protection CAM > is not empty. Value is 0x1 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING BRB1: BRB is not empty. > Value is 0x4 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING TCM: FIC0_INIT_CRD is > not 64. Value is 0x30 > Jul 19 11:49:26 filer-01-a kernel: bxe0: WARNING PRS: TCM current credit > is not 0. Value is 0x10 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR TSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR CSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: ERROR XSEM: interrupt status 0 > is not 0. Value is 0x10000 > Jul 19 11:49:26 filer-01-a kernel: bxe0: bxe_idle_chk(): Failed with 4 > error(s) and 0 warning(s)! > Jul 19 11:49:26 filer-01-a kernel: bxe0: > ------------------------------------------------------------------------ > Jul 19 11:49:26 filer-01-a kernel: bxe0: ---------- End crash dump > ---------- > > A reboot of the system is even not enough. After rebooting the system, I > can't even ping any hosts on the network. It seems that it leaves the > card in a bogus state that requires a complete power cycle to get the > cards back in business. > > We found out that disabling: tso4 txcsum rxcsum on the cards prevent > this from happening. > > So although I think it's not, let's say we have a fix for this setting > in rc.conf something like this: > ifconfig_bxe0="inet 10.50.50.11 netmask 255.255.255.0 mtu 9000 -tso4 > -txcsum -rxcsum" > > -- > > Second issue, > > Issuing an ifconfig mtu 9000 on the interfaces randomly produce this > error: > > Jul 19 09:47:03 filer-01-a kernel: bxe0: > /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot > fill fp[04] RX chain. > Jul 19 09:47:03 filer-01-a kernel: bxe0: > /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting! > Jul 19 09:47:12 filer-01-a kernel: bxe3: > /usr/src/sys/dev/bxe/if_bxe.c(10934): Memory allocation failure! Cannot > fill fp[04] RX chain. > Jul 19 09:47:12 filer-01-a kernel: bxe3: > /usr/src/sys/dev/bxe/if_bxe.c(3921): NIC initialization failed, aborting! > > That sounds quite bad and, I can't reproduce it with mtu 1500 setting. > (But does it makes sens to use a MTU of 1500 on a 10gig local > network...?) > > -- > > Third issue, > > part 1) > > We've tried two interfaces (each interface with an mtu of 9000) using > lagg, like this: > > ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 9000 > ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 9000 > ifconfig lagg0 create > ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2 > 10.50.50.11/24 > > This instantanely crashes the kernel and cause a machine reboot. The log > says: > > Jul 19 09:47:12 filer-01-a kernel: > Jul 19 09:47:12 filer-01-a kernel: > Jul 19 09:47:12 filer-01-a kernel: Fatal trap 12: page fault while in > kernel mode > Jul 19 09:47:12 filer-01-a kernel: cpuid = 0; apic id = 20 > Jul 19 09:47:12 filer-01-a kernel: fault virtual address = 0x6d > Jul 19 09:47:12 filer-01-a kernel: fault code = supervisor > read data, page not present > Jul 19 09:47:12 filer-01-a kernel: instruction pointer = > 0x20:0xffffffff808d5879 > Jul 19 09:47:12 filer-01-a kernel: stack pointer = > 0x28:0xffffff80003227f0 > --*** BOOOM REBOOT ***-- > Jul 19 09:49:49 filer-01-a syslogd: kernel boot file is > /boot/kernel/kernel > > /var/crash/core.txt.0 returns: > > Unread portion of the kernel message buffer: > Fatal trap 12: page fault while in kernel mode > cpuid = 5; apic id = 33 > fault virtual address = 0x6d > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff808d5879 > stack pointer = 0x28:0xffffff80003227f0 > frame pointer = 0x28:0xffffff8000322820 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (swi6: task queue) > trap number = 12 > panic: page fault > cpuid = 5 > KDB: stack backtrace: > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > #1 0xffffffff808ea8be at panic+0x1ce > #2 0xffffffff80bd8240 at trap_fatal+0x290 > #3 0xffffffff80bd857d at trap_pfault+0x1ed > #4 0xffffffff80bd8b9e at trap+0x3ce > #5 0xffffffff80bc315f at calltrap+0x8 > #6 0xffffffff8045da8c at bxe_free_buf_rings+0x4c > #7 0xffffffff8046c0d5 at bxe_init_locked+0x125 > #8 0xffffffff80470cfe at bxe_ioctl+0x4fe > #9 0xffffffff8099d08f at if_setlladdr+0x1ff > #10 0xffffffff8174c94a at lagg_port_setlladdr+0x8a > #11 0xffffffff8092cf55 at taskqueue_run_locked+0x85 > #12 0xffffffff8092d0da at taskqueue_run+0x3a > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > #14 0xffffffff808c0076 at ithread_loop+0xa6 > #15 0xffffffff808bb9ef at fork_exit+0x11f > #16 0xffffffff80bc368e at fork_trampoline+0xe > Uptime: 39m41s > Dumping 1505 out of 32735 > MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91% > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /boot/kernel/zfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /boot/kernel/opensolaris.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/opensolaris.ko > Reading symbols from /boot/kernel/if_lagg.ko...Reading symbols from > /boot/kernel/if_lagg.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/if_lagg.ko > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:448 > #2 0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>) > at /usr/src/sys/kern/kern_shutdown.c:636 > #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is > not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:857 > #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff8000322740, > usermode=0) > at /usr/src/sys/amd64/amd64/trap.c:773 > #5 0xffffffff80bd8b9e in trap (frame=0xffffff8000322740) > at /usr/src/sys/amd64/amd64/trap.c:456 > #6 0xffffffff80bc315f in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:228 > #7 0xffffffff808d5879 in free (addr=0xffffff80083e5000, > mtp=0xffffffff81198ba0) at uma_int.h:413 > #8 0xffffffff8045da8c in bxe_free_buf_rings (sc=0xffffff8000c1c000) > at /usr/src/sys/dev/bxe/if_bxe.c:3787 > #9 0xffffffff8046c0d5 in bxe_init_locked (sc=0x0, load_mode=0) > at /usr/src/sys/dev/bxe/if_bxe.c:4063 > #10 0xffffffff80470cfe in bxe_ioctl (ifp=0xfffffe000ec59000, > command=Variable "command" is not available. > ) > at /usr/src/sys/dev/bxe/if_bxe.c:9668 > #11 0xffffffff8099d08f in if_setlladdr (ifp=0xfffffe000ec59000, > lladdr=0xfffffe00125da4c8 "", len=6) at /usr/src/sys/net/if.c:3304 > #12 0xffffffff8174c94a in lagg_port_setlladdr (arg=Variable "arg" is not > available. > ) > at /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:495 > #13 0xffffffff8092cf55 in taskqueue_run_locked (queue=0xfffffe000e833980) > at /usr/src/sys/kern/subr_taskqueue.c:308 > #14 0xffffffff8092d0da in taskqueue_run (queue=0xfffffe000e833980) > at /usr/src/sys/kern/subr_taskqueue.c:322 > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is > not available. > ) > at /usr/src/sys/kern/kern_intr.c:1262 > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe000e66c140) > at /usr/src/sys/kern/kern_intr.c:1275 > #17 0xffffffff808bb9ef in fork_exit ( > callout=0xffffffff808bffd0 <ithread_loop>, arg=0xfffffe000e66c140, > frame=0xffffff8000322c40) at /usr/src/sys/kern/kern_fork.c:992 > #18 0xffffffff80bc368e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:602 > #19 0x0000000000000000 in ?? () > #20 0x0000000000000000 in ?? () > #21 0x0000000000000001 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000005 in ?? () > #44 0xffffffff81244180 in tdq_cpu () > #45 0xfffffe000e698000 in ?? () > #46 0x0000000000000000 in ?? () > #47 0xffffff8000322b30 in ?? () > #48 0xffffff8000322ad8 in ?? () > #49 0xfffffe000e6728e0 in ?? () > #50 0xffffffff8091352e in sched_switch (td=0x0, newtd=0xfffffe000e66c140, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1921 > Previous frame inner to this frame (corrupt stack?) > (kgdb) > > Okay guess it has something to do again with the MTU 9000 but this time > it does completly panic the kernel. This is no good. > > > Part 2) Trying bonding with normal MTU 1500 > > ifconfig bxe0 up -tso4 -txcsum -rxcsum mtu 1500 > ifconfig bxe2 up -tso4 -txcsum -rxcsum mtu 1500 > ifconfig lagg0 create > ifconfig lagg0 up laggproto failover laggport bxe0 laggport bxe2 > 10.50.50.11/24 > > This time. No error messages, no crash. Yiha! > > But no. Even everything seems to be correct, the bonding is not working. > We can't ping any host on the network. > Also the lagg0 says: No carrier > > see: > > bxe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu > 1500 > options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM> > ether 00:10:18:98:35:f8 > inet6 fe80::210:18ff:fe98:35f8%bxe0 prefixlen 64 scopeid 0x3 > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > bxe2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu > 1500 > options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM> > ether 00:10:18:98:35:f8 > inet6 fe80::210:18ff:fe95:eaa0%bxe2 prefixlen 64 scopeid 0x5 > nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > media: Ethernet autoselect (10Gbase-SR <full-duplex>) > status: active > lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu > 1500 > options=b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM> > ether 00:10:18:98:35:f8 > inet6 fe80::7a2b:cbff:fe1a:eab1%lagg0 prefixlen 64 scopeid 0x14 > inet 10.50.50.11 netmask 0xffffff00 broadcast 10.50.50.255 > nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> > media: Ethernet autoselect > status: no carrier > laggproto failover lagghash l2,l3,l4 > laggport: bxe2 flags=0<> > laggport: bxe0 flags=1<MASTER> > > Please note that priore to installing freebsd, the machine was running a > Debian 7 GNU/Linux 64 bit OS where we had the cards bonded and MTU'ed to > 9000 without any crash or stability issue. > So it looks to me that there is something really wrong with the broadcom > driver on freebsd 9.1, at least with the NIC's used in Dell servers. > > Provided that broadcom themselves doesn't supply drivers for freebsd Is > there any possible fix ? > > Thanks for your attention and your help. > > Cheers, > Sébastien > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51EE72B6.6020905>