Date: Tue, 7 Sep 2010 17:29:17 -0700 From: Pyun YongHyeon <pyunyh@gmail.com> To: "Mahlon E. Smith" <mahlon@martini.nu>, Jeremy Chadwick <freebsd@jdc.parodius.com>, freebsd-stable@freebsd.org Subject: Re: Network memory allocation failures Message-ID: <20100908002917.GO1439@michelle.cdnetworks.com> In-Reply-To: <20100907233257.GA94092@martini.nu> References: <20100907210813.GI49065@martini.nu> <20100907222403.GA18595@icarus.home.lan> <20100907233257.GA94092@martini.nu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 07, 2010 at 04:32:57PM -0700, Mahlon E. Smith wrote: > On Tue, Sep 07, 2010, Jeremy Chadwick wrote: > > > > This could be a bce(4) bug, meaning the "failed to allocate memory" > > message could be indicating DMA failure or something else from the card, > > and not necessarily related to mbufs. > > > > There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE) > > that aren't in 8.1-RELEASE, but I don't know if those are responsible > > for your problem. > > Hmm, well -- I'm definitely not opposed to jumping to -STABLE if it > might fix it. > > > > Please provide output from the following: > > > > * uname -a (if desired, XXX out hostname) > > FreeBSD jessage 8.1-RELEASE FreeBSD 8.1-RELEASE #2: Fri Aug 20 14:30:31 PDT 2010 root@jessage:/usr/src/sys/amd64/compile/R810 amd64 > > Custom kernel, with additions to GENERIC (nothing removed): > > device carp > device snp > options HZ=1000 > options DEVICE_POLLING bce(4) does not support polling(4) so you can completely remove configuration of HZ and DEVICE_POLLING. In fact, there is no reason to use polling(4) at all on intelligent controllers like bce(4). polling(4) is mainly for dumb controllers that lack efficient interrupt moderation. > options ALTQ > options ALTQ_CBQ > options ALTQ_PRIQ > options SC_DISABLE_REBOOT > options PANIC_REBOOT_WAIT_TIME=5 > > ALTQ and friends not actually active on the machine. I was fighting a > different battle when running GENERIC, so I can't honestly recall if this > problem existed then -- I'll make sure it is still happening under > GENERIC for a baseline, to eliminate any potential weirdness with > DEVICE_POLLING or the HZ timing. > > > > * vmstat -i > > interrupt total rate > irq19: ehci0 1547103 0 > irq21: uhci1 uhci3+ 29 0 > irq23: atapci0 35 0 > irq32: mfi0 68104468 43 > cpu0: timer 3093305346 1986 > irq256: bce0 46587008 29 > cpu19: timer 3103614834 1992 > cpu1: timer 3093298527 1986 > cpu4: timer 3093297557 1986 > cpu10: timer 3089824707 1983 > cpu12: timer 3097896788 1989 > cpu16: timer 3097897232 1989 > cpu22: timer 3103615267 1992 > cpu2: timer 3093297601 1986 > cpu5: timer 3093298349 1986 > cpu3: timer 3093298637 1986 > cpu6: timer 3089823402 1983 > cpu18: timer 3103614571 1992 > cpu13: timer 3097897961 1989 > cpu20: timer 3103615299 1992 > cpu23: timer 3103614783 1992 > cpu9: timer 3089821582 1983 > cpu17: timer 3097898138 1989 > cpu11: timer 3089821712 1983 > cpu14: timer 3097897190 1989 > cpu7: timer 3089821360 1983 > cpu21: timer 3103615012 1992 > cpu15: timer 3097898081 1989 > cpu8: timer 3089824487 1983 > Total 74424047066 47788 > > > > * ifconfig -a (if desired, XXX out IPs and MACs) > > bce0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> > ether 00:25:64:fd:0b:24 > inet 10.5.2.69 netmask 0xfffffc00 broadcast 10.5.3.255 > media: Ethernet autoselect (1000baseT <full-duplex>) > status: active > bce1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> > ether 00:25:64:fd:0b:26 > media: Ethernet autoselect (none) > status: no carrier > bce2: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> > ether 00:25:64:fd:0b:28 > media: Ethernet autoselect (none) > status: no carrier > bce3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> > ether 00:25:64:fd:0b:2a > media: Ethernet autoselect (none) > status: no carrier > lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 > options=3<RXCSUM,TXCSUM> > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 > inet6 ::1 prefixlen 128 > inet 127.0.0.1 netmask 0xff000000 > nd6 options=3<PERFORMNUD,ACCEPT_RTADV> > vboxnet0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 > ether 0a:00:27:00:00:00 > > > > * netstat -inbd (if desired, XXX out MACs) > > Name Mtu Network Address Ipkts Ierrs Idrop Ibytes Opkts Oerrs Obytes Coll Drop > bce0 1500 <Link#1> 00:25:64:fd:0b:24 14467627 0 0 6346549588 11846499 0 4646920777 0 0 > bce0 1500 10.5.0.0/22 10.5.2.69 1987644 - - 371635478 415087 - 74168123 - - > bce1* 1500 <Link#2> 00:25:64:fd:0b:26 0 0 0 0 0 0 0 0 0 > bce2* 1500 <Link#3> 00:25:64:fd:0b:28 0 0 0 0 0 0 0 0 0 > bce3* 1500 <Link#4> 00:25:64:fd:0b:2a 0 0 0 0 0 0 0 0 0 > lo0 16384 <Link#5> 25561 0 0 47338756 25561 0 47338756 0 0 > lo0 16384 fe80:5::1/64 fe80:5::1 0 - - 0 0 - 0 - - > lo0 16384 ::1/128 ::1 0 - - 0 0 - 0 - - > lo0 16384 127.0.0.0/8 127.0.0.1 25561 - - 47338756 25561 - 47338756 - - > vboxn 1500 <Link#6> 0a:00:27:00:00:00 0 0 0 0 0 0 0 0 0 > > > > > * pciconf -lvc (only the bceX entry please) > > bce0@pci0:1:0:0: class=0x020000 card=0x02d41028 chip=0x163914e4 rev=0x20 hdr=0x00 > vendor = 'Broadcom Corporation' > device = 'NetXtreme II Gigabit Ethernet (BCM5709)' > class = network > subclass = ethernet > cap 01[48] = powerspec 3 supports D0 D3 current D0 > cap 03[50] = VPD > cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message > cap 11[a0] = MSI-X supports 9 messages in map 0x10 > cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x2(x4) > > > > Also check dmesg to see if there's any error messages that correlate > > when the problem occurs. > > All quiet on that front. > Based on your outputs, I don't see abnormal things in bce(4). Why do you think bce(4) is the cause of problem? You may see more detailed MAC statistics if controller saw some kind of memory related failure from the output of "sysctl dev.bce.0".
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100908002917.GO1439>