Date: Tue, 7 Sep 2010 15:24:03 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: "Mahlon E. Smith" <mahlon@martini.nu> Cc: Yong-Hyeon PYUN <pyunyh@gmail.com>, freebsd-stable@freebsd.org Subject: Re: Network memory allocation failures Message-ID: <20100907222403.GA18595@icarus.home.lan> In-Reply-To: <20100907210813.GI49065@martini.nu> References: <20100907210813.GI49065@martini.nu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 07, 2010 at 02:08:13PM -0700, Mahlon E. Smith wrote: > I picked up a couple of Dell R810 monsters a couple of months ago. 96G > of RAM, 24 core. With the aid of this list, got 8.1-RELEASE on there, > and they are trucking along merrily as VirtualBox hosts. > > I'm seeing memory allocation errors when sending data over the network. > It is random at best, however I can reproduce it pretty reliably. > > Sending 100M to a remote machine. Note the 2nd scp attempt worked. > Most small files can make it through unmolested. > > obb# dd if=/dev/random of=100M-test bs=1M count=100 > 100+0 records in > 100+0 records out > 104857600 bytes transferred in 2.881689 secs (36387551 bytes/sec) > obb# rsync -av 100M-test skin:/tmp/ > sending incremental file list > 100M-test > Write failed: Cannot allocate memory > rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) > rsync: connection unexpectedly closed (28 bytes received so far) [sender] > rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7] > obb# scp 100M-test skin:/tmp/ > 100M-test 52% 52MB 52.1MB/s 00:00 ETAWrite failed: Cannot allocate memory > lost connection > obb# scp 100M-test skin:/tmp/ > 100M-test 100% 100MB 50.0MB/s 00:02 > obb# scp 100M-test skin:/tmp/ > 100M-test 0% 0 0.0KB/s --:-- ETAWrite failed: Cannot allocate memory > lost connection > > Fetching a file, however, works. > > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > ... > > > I've ruled out bad hardware (mainly due to the behavior being > *identical* on the sister machine, in a completely different data > center.) It's a broadcom (bce) NIC. This could be a bce(4) bug, meaning the "failed to allocate memory" message could be indicating DMA failure or something else from the card, and not necessarily related to mbufs. There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE) that aren't in 8.1-RELEASE, but I don't know if those are responsible for your problem. Please provide output from the following: * uname -a (if desired, XXX out hostname) * vmstat -i * ifconfig -a (if desired, XXX out IPs and MACs) * netstat -inbd (if desired, XXX out MACs) * pciconf -lvc (only the bceX entry please) Also check dmesg to see if there's any error messages that correlate when the problem occurs. I'm also CC'ing Yong-Hyeon PYUN who might have some ideas. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100907222403.GA18595>