Date: Wed, 30 Mar 2011 13:28:58 -0700 From: YongHyeon PYUN <pyunyh@gmail.com> To: Yamagi Burmeister <lists@yamagi.org> Cc: freebsd-net@freebsd.org, yongari@freebsd.org Subject: Re: Kernel memory corruption(?) with age(4) Message-ID: <20110330202858.GC8601@michelle.cdnetworks.com> In-Reply-To: <alpine.BSF.2.00.1103302137330.1646@maka.home.yamagi.org> References: <alpine.BSF.2.00.1103301620110.17846@saya.home.yamagi.org> <20110330173145.GB8601@michelle.cdnetworks.com> <alpine.BSF.2.00.1103302137330.1646@maka.home.yamagi.org>
index | next in thread | previous in thread | raw e-mail
On Wed, Mar 30, 2011 at 09:50:12PM +0200, Yamagi Burmeister wrote: > On Wed, 30 Mar 2011, YongHyeon PYUN wrote: > > >On Wed, Mar 30, 2011 at 04:22:23PM +0200, Yamagi Burmeister wrote: > > > >>All for boxes are unstable if the Attansic NIC is in use, no one of them > >>survived more than 60 minutes of ~20mb/s network traffic. I managed to > >>get some coredumps and extracted the backtraces. Since everytime one of > >>the boxes paniced I got different panic message and a different backtrace > >>with a different subsystem involved I suspected broken hardware. I > >>plugged a em(4) NIC into the PCI slot and wasn't able to reproduce the > >>problem, in fact the boxes run rock solid for several days. Next I set > >>up a Windows 7, installed the Attansic vendor driver and did another > >>run. All went smooth, no crash for nearly 24 hours. > >> > >>My guess is kernel memory corruption by age(4), which would explain all > >>the different backtraces and the different panic messages. This problem > >>is reproducible in at least FreeBSD 7.4 and 8.2 and with TSO4 enabled > >>and disabled. I'm willing to debug this, but I really don't know how. So > >>any help or a pointer into the right direction would be appreciated. > >> > > > >AFAIK this is the first report for possible memory corruption > >triggered by age(4). I'm still not sure whether it's caused by > >age(4) but you can disable RX checksum offloading and see whether > >that makes any difference. > >Since I have no longer access to the hardware it would be even > >better if you can tell me which traffic pattern triggered the > >issue. > > Okay, I did a test run with RX checksum, TX checksum and both disabled. > In all three cases the crash occurs within about 20 minutes. I'm either > not sure that age(4) is the problem but it has definedly something to do > with the problem, since with another nic driver the same scenario is > rock solid... > OK. > The workload: It's a NFS3 server (FreeBSDs non-experimental > implementation), serving and receiving file with about 250 to 500 > megabytes at about 20mb/s. The clients are FreeBSD 7 and 8 systems and > are mounting the shares via TCP. The connection is 1000mbit/s via a > "dumb" gigabit switch. > That's too broad to narrow down the issue. :-( I'm not sure but your box seem to have more than 4GB memory. Could you limit the available memory to 3GB via loader.conf and test it again?home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110330202858.GC8601>
