Date: Wed, 30 Mar 2011 10:31:45 -0700 From: YongHyeon PYUN <pyunyh@gmail.com> To: Yamagi Burmeister <lists@yamagi.org> Cc: freebsd-net@freebsd.org, yongari@freebsd.org Subject: Re: Kernel memory corruption(?) with age(4) Message-ID: <20110330173145.GB8601@michelle.cdnetworks.com> In-Reply-To: <alpine.BSF.2.00.1103301620110.17846@saya.home.yamagi.org> References: <alpine.BSF.2.00.1103301620110.17846@saya.home.yamagi.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote: > Hi, > I recently got four about two years old Asus M3A-H/HDMI mainboards with > an integrated Attansic L2 ethernet controller. This NIC is supported by > age(4) and recognized by freebsd: > > ---- > > age0: <Attansic Technology Corp, L1 Gigabit Ethernet> > mem 0xfeac0000-0xfeafffff irq 18 at device 0.0 on pci2 > age0: 1280 Tx FIFO, 2364 Rx FIFO > age0: Using 1 MSI messages. > age0: 4GB boundary crossed, switching to 32bit DMA addressing mode. > miibus0: <MII bus> on age0 > atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus0 > atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, > 1000baseT-FDX-master, auto > age0: Ethernet address: 00:23:54:31:a0:12 > age0: [FILTER] > > ---- > > age0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 > options=c319b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4, > WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE> > ether 00:23:54:31:a0:12 > inet6 fe80::223:54ff:fe31:a012%age0 prefixlen 64 scopeid 0x1 > nd6 options=3<PERFORMNUD,ACCEPT_RTADV> > media: Ethernet autoselect (none) > status: no carrier > > ---- > > All for boxes are unstable if the Attansic NIC is in use, no one of them > survived more than 60 minutes of ~20mb/s network traffic. I managed to > get some coredumps and extracted the backtraces. Since everytime one of > the boxes paniced I got different panic message and a different backtrace > with a different subsystem involved I suspected broken hardware. I > plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the > problem, in fact the boxes run rock solid for several days. Next I set > up a Windows 7, installed the Attansic vendor driver and did another > run. All went smooth, no crash for nearly 24 hours. > > My guess is kernel memory corruption by age(4), which would explain all > the different backtraces and the different panic messages. This problem > is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled > and disabled. I'm willing to debug this, but I really don't know how. So > any help or a pointer into the right direction would be appreciated. > AFAIK this is the first report for possible memory corruption triggered by age(4). I'm still not sure whether it's caused by age(4) but you can disable RX checksum offloading and see whether that makes any difference. Since I have no longer access to the hardware it would be even better if you can tell me which traffic pattern triggered the issue.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110330173145.GB8601>