Date: Wed, 06 Jul 2011 12:23:39 +1000 From: "Peter Ross" <Peter.Ross@bogen.in-berlin.de> To: "Jeremy Chadwick" <freebsd@jdc.parodius.com> Cc: freebsd-stable List <freebsd-stable@freebsd.org>, Scott Sipe <cscotts@gmail.com> Subject: Re: scp: Write Failed: Cannot allocate memory Message-ID: <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de>
next in thread | raw e-mail | index | archive | help
Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: > On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >> I'm running virtualbox 3.2.12_1 if that has anything to do with it. >> >> sysctl vfs.zfs.arc_max: 6200000000 >> >> While I'm trying to scp, kstat.zfs.misc.arcstats.size is hovering =20 >> right around that value, sometimes above, sometimes below (that's =20 >> as it should be, right?). I don't think that it dies when crossing =20 >> over arc_max. I can run the same scp 10 times and it might fail 1-3 =20 >> times, with no correlation to the arcstats.size being above/below =20 >> arc_max that I can see. >> >> Scott >> >> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >> >>> Hi all, >>> >>> just as an addition: an upgrade to last Friday's FreeBSD-Stable =20 >>> and to VirtualBox 4.0.8 does not fix the problem. >>> >>> I will experiment a bit more tomorrow after hours and grab some statisti= cs. >>> >>> Regards >>> Peter >>> >>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>: >>> >>>> Hi all, >>>> >>>> I noticed a similar problem last week. It is also very similar to =20 >>>> one reported last year: >>>> >>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708= .html >>>> >>>> My server is a Dell T410 server with the same bge card (the same =20 >>>> pciconf -lvc output as described by Mahlon: >>>> >>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711= .html >>>> >>>> Yours, Scott, is a em(4).. >>>> >>>> Another similarity: In all cases we are using VirtualBox. I just =20 >>>> want to mention it, in case it matters. I am still running =20 >>>> VirtualBox 3.2. >>>> >>>> Most of the time kstat.zfs.misc.arcstats.size was reaching =20 >>>> vfs.zfs.arc_max then, but I could catch one or two cases then the =20 >>>> value was still below. >>>> >>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it does not hel= p. >>>> >>>> BTW: It looks as ARC only gives back the memory when I destroy =20 >>>> the ZFS (a cloned snapshot containing virtual machines). Even if =20 >>>> nothing happens for hours the buffer isn't released.. >>>> >>>> My machine was still running 8.2-PRERELEASE so I am upgrading. >>>> >>>> I am happy to give information gathered on old/new kernel if it helps. >>>> >>>> Regards >>>> Peter >>>> >>>> Quoting "Scott Sipe" <cscotts@gmail.com>: >>>> >>>>> >>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>> >>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wrote: >>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>> I'm running 8.2-RELEASE and am having new problems with scp. =20 >>>>>>>> When scping >>>>>>>> files to a ZFS directory on the FreeBSD server -- most =20 >>>>>>>> notably large files >>>>>>>> -- the transfer frequently dies after just a few seconds. In =20 >>>>>>>> my last test, I >>>>>>>> tried to scp an 800mb file to the FreeBSD system and the =20 >>>>>>>> transfer died after >>>>>>>> 200mb. It completely copied the next 4 times I tried, and =20 >>>>>>>> then died again on >>>>>>>> the next attempt. >>>>>>>> >>>>>>>> On the client side: >>>>>>>> >>>>>>>> "Connection to home closed by remote host. >>>>>>>> lost connection" >>>>>>>> >>>>>>>> In /var/log/auth.log: >>>>>>>> >>>>>>>> Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write failed: =20 >>>>>>>> Cannot allocate >>>>>>>> memory >>>>>>>> >>>>>>>> I've never seen this before and have used scp before to =20 >>>>>>>> transfer large files >>>>>>>> without problems. This computer has been used in production =20 >>>>>>>> for months and >>>>>>>> has a current uptime of 36 days. I have not been able to =20 >>>>>>>> notice any problems >>>>>>>> copying files to the server via samba or netatalk, or any problems = in >>>>>>>> apache. >>>>>>>> >>>>>>>> Uname: >>>>>>>> >>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat Feb 19 =20 >>>>>>>> 01:02:54 EST >>>>>>>> 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>>> >>>>>>>> I've attached my dmesg and output of vmstat -z. >>>>>>>> >>>>>>>> I have not restarted the sshd daemon or rebooted the computer. >>>>>>>> >>>>>>>> Am glad to provide any other information or test anything else. >>>>>>>> >>>>>>>> {snip vmstat -z and dmesg} >>>>>>> >>>>>>> You didn't provide details about your networking setup (rc.conf, >>>>>>> ifconfig -a, etc.). netstat -m would be useful too. >>>>>>> >>>>>>> Next, please see this thread circa September 2010, titled "Network >>>>>>> memory allocation failures": >>>>>>> >>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thr= ead.html#58708 >>>>>>> >>>>>>> The user in that thread is using rsync, which relies on scp by defau= lt. >>>>>>> I believe this problem is similar, if not identical, to yours. >>>>>>> >>>>>> >>>>>> Please also provide your output of ( /usr/bin/limits -a ) for the ser= ver >>>>>> end and the client. >>>>>> >>>>>> I am not quite sure I agree with the need for ifconfig -a but some >>>>>> information about the networking driver your using for the interface >>>>>> would be helpful, uptime of the boxes. And configuration of the pool. >>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should probably >>>>>> prop this information up somewhere so you can reference by URL whenev= er >>>>>> needed. >>>>>> >>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1) can be made = to >>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy is >>>>>> stating here but correct me if I am wrong. It does use ssh(1) by >>>>>> default. >>>>>> >>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8) for /tmp >>>>>> type filesystems that rsync(1) may be just filling up your temp ram a= rea >>>>>> and causing the connection abort which would be expected. ( df =20 >>>>>> -h ) would >>>>>> help here. >>>>> >>>>> Hello, >>>>> >>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday were 3 =20 >>>>> different OSX computers (over gigabit). The FreeBSD server has =20 >>>>> 12gb of ram and no bce adapter. For what it's worth, the server =20 >>>>> is backed up remotely every night with rsync (remote FreeBSD =20 >>>>> uses rsync to pull) to an offsite (slow cable connection) =20 >>>>> FreeBSD computer, and I have not seen any errors in the nightly =20 >>>>> rsync. >>>>> >>>>> Sorry for the omission of networking info, here's the output of =20 >>>>> the requested commands and some that popped up in the other =20 >>>>> thread: >>>>> >>>>> http://www.cap-press.com/misc/ >>>>> >>>>> In rc.conf: ifconfig_em1=3D"inet 10.1.1.1 netmask 255.255.0.0" >>>>> >>>>> Scott > > Just to make it crystal clear to everyone: > > There is no correlation between this problem and use of ZFS. People are > attempting to correlate "cannot allocate memory" messages with "anything > on the system that uses memory". The VM is much more complex than that. > > Given the nature of this problem, it's much more likely the issue is > "somewhere" within a networking layer within FreeBSD, whether it be > driver-level or some sort of intermediary layer. > > Two people who have this issue in this thread are both using VirtualBox. > Can one, or both, of you remove VirtualBox from the configuration > entirely (kernel, etc. -- not sure what is required) and then see if the > issue goes away? On the machine in question I only can do it after hours so I will do =20 it tonight. I was _successfully_ sending the file over the loopback interface using cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > /dev/null" I did it, btw, with the IPv6 localhost address first (accidently), and =20 then using IPv4. Both worked. It always fails if I am sending it through the bce(4) interface, even =20 if my target is the VirtualBox bridged to the bce card (so it does not =20 "leave" the computer physically). Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and kldstat output= . I have another box where I do not see that problem. It copies files =20 happily over the net using ssh. It is an an older HP ML 150 with 3GB RAM only but with a bge(4) driver =20 instead. It runs the same last week's RELENG_8. I installed VirtualBox =20 and enabled vboxnet (so it loads the kernel modules). But I do not run =20 VirtualBox on it (because it hasn't enough RAM). Regards Peter DellT410one# uname -a FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Jun =20 30 17:07:18 EST 2011 =20 root@DellT410one.vv.fda:/usr/obj/usr/src/sys/GENERIC amd64 DellT410one# ifconfig -a bce0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> =20 metric 0 mtu 1500 =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS= UM,TSO4,VLAN_HWTSO,LINKSTATE> =09ether 84:2b:2b:68:64:e4 =09inet 192.168.50.220 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.221 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.223 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.224 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.225 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.226 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.227 netmask 0xffffff00 broadcast 192.168.50.255 =09inet 192.168.50.219 netmask 0xffffff00 broadcast 192.168.50.255 =09media: Ethernet autoselect (1000baseT <full-duplex>) =09status: active bce1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS= UM,TSO4,VLAN_HWTSO,LINKSTATE> =09ether 84:2b:2b:68:64:e5 =09media: Ethernet autoselect lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 =09options=3D3<RXCSUM,TXCSUM> =09inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb =09inet6 ::1 prefixlen 128 =09inet 127.0.0.1 netmask 0xff000000 =09nd6 options=3D3<PERFORMNUD,ACCEPT_RTADV> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 =09ether 0a:00:27:00:00:00 DellT410one# netstat -rn Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 192.168.50.201 UGS 0 52195 bce0 127.0.0.1 link#11 UH 0 6 lo0 192.168.50.0/24 link#1 U 0 1118212 bce0 192.168.50.219 link#1 UHS 0 9670 lo0 192.168.50.220 link#1 UHS 0 8347 lo0 192.168.50.221 link#1 UHS 0 103024 lo0 192.168.50.223 link#1 UHS 0 43614 lo0 192.168.50.224 link#1 UHS 0 8358 lo0 192.168.50.225 link#1 UHS 0 8438 lo0 192.168.50.226 link#1 UHS 0 8338 lo0 192.168.50.227 link#1 UHS 0 8333 lo0 192.168.165.0/24 192.168.50.200 UGS 0 3311 bce0 192.168.166.0/24 192.168.50.200 UGS 0 699 bce0 192.168.167.0/24 192.168.50.200 UGS 0 3012 bce0 192.168.168.0/24 192.168.50.200 UGS 0 552 bce0 Internet6: Destination Gateway Flags =20 Netif Expire ::1 ::1 UH =20 lo0 fe80::%lo0/64 link#11 U =20 lo0 fe80::1%lo0 link#11 UHS =20 lo0 ff01::%lo0/32 fe80::1%lo0 U =20 lo0 ff02::%lo0/32 fe80::1%lo0 U =20 lo0 DellT410one# kldstat Id Refs Address Size Name 1 19 0xffffffff80100000 dbf5d0 kernel 2 3 0xffffffff80ec0000 4c358 vboxdrv.ko 3 1 0xffffffff81012000 131998 zfs.ko 4 1 0xffffffff81144000 1ff1 opensolaris.ko 5 2 0xffffffff81146000 2940 vboxnetflt.ko 6 2 0xffffffff81149000 8e38 netgraph.ko 7 1 0xffffffff81152000 153c ng_ether.ko 8 1 0xffffffff81154000 e70 vboxnetadp.ko DellT410one# pciconf -lv .. bce0@pci0:1:0:0: class=3D0x020000 card=3D0x028d1028 chip=3D0x163b14e4= =20 rev=3D0x20 hdr=3D0x00 vendor =3D 'Broadcom Corporation' class =3D network subclass =3D ethernet bce1@pci0:1:0:1: class=3D0x020000 card=3D0x028d1028 chip=3D0x163b14e4= =20 rev=3D0x20 hdr=3D0x00 vendor =3D 'Broadcom Corporation' class =3D network subclass =3D ethernet
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110706122339.61453nlqra1vqsrv>