Date: Wed, 06 Jul 2011 18:21:41 +1000 From: "Peter Ross" <Peter.Ross@bogen.in-berlin.de> To: "Jeremy Chadwick" <freebsd@jdc.parodius.com> Cc: Yong-Hyeon Pyun <pyunyh@gmail.com>, "Vogel, Jack" <jack.vogel@intel.com>, freebsd-stable List <freebsd-stable@freebsd.org>, davidch@freebsd.org, Scott Sipe <cscotts@gmail.com> Subject: Re: scp: Write Failed: Cannot allocate memory Message-ID: <20110706182141.13056plxp148y61h@webmail.in-berlin.de> In-Reply-To: <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de> References: <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>: > Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>: > >> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >> >>> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: >>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>> >>>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: >>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>>>> >>>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: >>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>>>>>> >>>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with i= t. >>>>>>>>>> >>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000 >>>>>>>>>> >>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>>>>>>>>> hovering right around that value, sometimes above, sometimes >>>>>>>>>> below (that's as it should be, right?). I don't think that it >>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times >>>>>>>>>> and it might fail 1-3 times, with no correlation to the >>>>>>>>>> arcstats.size being above/below arc_max that I can see. >>>>>>>>>> >>>>>>>>>> Scott >>>>>>>>>> >>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> just as an addition: an upgrade to last Friday's >>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>>>>>>>>> problem. >>>>>>>>>>> >>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab >>>>>>>> some statistics. >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I noticed a similar problem last week. It is also very >>>>>>>>>>>> similar to one reported last year: >>>>>>>>>>>> >>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septembe= r/058708.html >>>>>>>>>>>> >>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the >>>>>>>>>>>> same pciconf -lvc output as described by Mahlon: >>>>>>>>>>>> >>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septembe= r/058711.html >>>>>>>>>>>> >>>>>>>>>>>> Yours, Scott, is a em(4).. >>>>>>>>>>>> >>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I >>>>>>>>>>>> just want to mention it, in case it matters. I am still >>>>>>>>>>>> running VirtualBox 3.2. >>>>>>>>>>>> >>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases >>>>>>>>>>>> then the value was still below. >>>>>>>>>>>> >>>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it >>>>>> does not help. >>>>>>>>>>>> >>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I >>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual >>>>>>>>>>>> machines). Even if nothing happens for hours the buffer >>>>>>>>>>>> isn't released.. >>>>>>>>>>>> >>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading. >>>>>>>>>>>> >>>>>>>>>>>> I am happy to give information gathered on old/new kernel =20 >>>>>>>>>>>> if it helps. >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wro= te: >>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems >>>>>>>>>>>>>>>> with scp. When scping >>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server -- >>>>>>>>>>>>>>>> most notably large files >>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few >>>>>>>>>>>>>>>> seconds. In my last test, I >>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and >>>>>>>>>>>>>>>> the transfer died after >>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I >>>>>>>>>>>>>>>> tried, and then died again on >>>>>>>>>>>>>>>> the next attempt. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On the client side: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "Connection to home closed by remote host. >>>>>>>>>>>>>>>> lost connection" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In /var/log/auth.log: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write >>>>>>>>>>>>>>>> failed: Cannot allocate >>>>>>>>>>>>>>>> memory >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've never seen this before and have used scp before >>>>>>>>>>>>>>>> to transfer large files >>>>>>>>>>>>>>>> without problems. This computer has been used in >>>>>>>>>>>>>>>> production for months and >>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been >>>>>>>>>>>>>>>> able to notice any problems >>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or >>>>>>>> any problems in >>>>>>>>>>>>>>>> apache. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Uname: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat >>>>>>>>>>>>>>>> Feb 19 01:02:54 EST >>>>>>>>>>>>>>>> 2011 root@xeon:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the comput= er. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am glad to provide any other information or test =20 >>>>>>>>>>>>>>>> anything else. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> {snip vmstat -z and dmesg} >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You didn't provide details about your networking setup =20 >>>>>>>>>>>>>>> (rc.conf, >>>>>>>>>>>>>>> ifconfig -a, etc.). netstat -m would be useful too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Next, please see this thread circa September 2010, =20 >>>>>>>>>>>>>>> titled "Network >>>>>>>>>>>>>>> memory allocation failures": >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septe= mber/thread.html#58708 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on >>>>>>>> scp by default. >>>>>>>>>>>>>>> I believe this problem is similar, if not identical, to your= s. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a ) >>>>>>>> for the server >>>>>>>>>>>>>> end and the client. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig =20 >>>>>>>>>>>>>> -a but some >>>>>>>>>>>>>> information about the networking driver your using for =20 >>>>>>>>>>>>>> the interface >>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration >>>>>> of the pool. >>>>>>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You =20 >>>>>>>>>>>>>> should probably >>>>>>>>>>>>>> prop this information up somewhere so you can reference by >>>>>>>> URL whenever >>>>>>>>>>>>>> needed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1) >>>>>>>> can be made to >>>>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what =20 >>>>>>>>>>>>>> Jeremy is >>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(1)= by >>>>>>>>>>>>>> default. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or =20 >>>>>>>>>>>>>> mdmfs(8) for /tmp >>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your >>>>>>>> temp ram area >>>>>>>>>>>>>> and causing the connection abort which would be >>>>>>>>>>>>>> expected. ( df -h ) would >>>>>>>>>>>>>> help here. >>>>>>>>>>>>> >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday >>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD >>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's >>>>>>>>>>>>> worth, the server is backed up remotely every night with >>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite >>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not >>>>>>>>>>>>> seen any errors in the nightly rsync. >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry for the omission of networking info, here's the >>>>>>>>>>>>> output of the requested commands and some that popped up >>>>>>>>>>>>> in the other thread: >>>>>>>>>>>>> >>>>>>>>>>>>> http://www.cap-press.com/misc/ >>>>>>>>>>>>> >>>>>>>>>>>>> In rc.conf: ifconfig_em1=3D"inet 10.1.1.1 netmask 255.255.0.0= " >>>>>>>>>>>>> >>>>>>>>>>>>> Scott >>>>>>>>> >>>>>>>>> Just to make it crystal clear to everyone: >>>>>>>>> >>>>>>>>> There is no correlation between this problem and use of ZFS. =20 >>>>>>>>> People are >>>>>>>>> attempting to correlate "cannot allocate memory" messages =20 >>>>>>>>> with "anything >>>>>>>>> on the system that uses memory". The VM is much more =20 >>>>>>>>> complex than that. >>>>>>>>> >>>>>>>>> Given the nature of this problem, it's much more likely the issue = is >>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it b= e >>>>>>>>> driver-level or some sort of intermediary layer. >>>>>>>>> >>>>>>>>> Two people who have this issue in this thread are both using =20 >>>>>>>>> VirtualBox. >>>>>>>>> Can one, or both, of you remove VirtualBox from the configuration >>>>>>>>> entirely (kernel, etc. -- not sure what is required) and =20 >>>>>>>>> then see if the >>>>>>>>> issue goes away? >>>>>>>> >>>>>>>> On the machine in question I only can do it after hours so I will d= o >>>>>>>> it tonight. >>>>>>>> >>>>>>>> I was _successfully_ sending the file over the loopback =20 >>>>>>>> interface using >>>>>>>> >>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > /dev/null= " >>>>>>>> >>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently), >>>>>>>> and then using IPv4. Both worked. >>>>>>>> >>>>>>>> It always fails if I am sending it through the bce(4) interface, >>>>>>>> even if my target is the VirtualBox bridged to the bce card (so it >>>>>>>> does not "leave" the computer physically). >>>>>>>> >>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and >>>>>>>> kldstat output. >>>>>>>> >>>>>>>> I have another box where I do not see that problem. It copies files >>>>>>>> happily over the net using ssh. >>>>>>>> >>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4) >>>>>>>> driver instead. It runs the same last week's RELENG_8. I installed >>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules). Bu= t >>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM). >>>>>>>> >>>>>>>> Regards >>>>>>>> Peter >>>>>>>> >>>>>>>> DellT410one# uname -a >>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Ju= n >>>>>>>> 30 17:07:18 EST 2011 >>>>>>>> root@DellT410one.vv.fda:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>>> DellT410one# ifconfig -a >>>>>>>> bce0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> >>>>>>>> metric 0 mtu 1500 >>>>>>>> =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,= VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> >>>>>>>> =09ether 84:2b:2b:68:64:e4 >>>>>>>> =09inet 192.168.50.220 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.221 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.223 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.224 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.225 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.226 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.227 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09inet 192.168.50.219 netmask 0xffffff00 broadcast 192.168.50.255 >>>>>>>> =09media: Ethernet autoselect (1000baseT <full-duplex>) >>>>>>>> =09status: active >>>>>>>> bce1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 >>>>>>>> =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,= VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE> >>>>>>>> =09ether 84:2b:2b:68:64:e5 >>>>>>>> =09media: Ethernet autoselect >>>>>>>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 >>>>>>>> =09options=3D3<RXCSUM,TXCSUM> >>>>>>>> =09inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb >>>>>>>> =09inet6 ::1 prefixlen 128 >>>>>>>> =09inet 127.0.0.1 netmask 0xff000000 >>>>>>>> =09nd6 options=3D3<PERFORMNUD,ACCEPT_RTADV> >>>>>>>> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 15= 00 >>>>>>>> =09ether 0a:00:27:00:00:00 >>>>>>>> DellT410one# netstat -rn >>>>>>>> Routing tables >>>>>>>> >>>>>>>> Internet: >>>>>>>> Destination Gateway Flags Refs Use =20 >>>>>>>> Netif Expire >>>>>>>> default 192.168.50.201 UGS 0 52195 bce0 >>>>>>>> 127.0.0.1 link#11 UH 0 6 lo0 >>>>>>>> 192.168.50.0/24 link#1 U 0 1118212 bce0 >>>>>>>> 192.168.50.219 link#1 UHS 0 9670 lo0 >>>>>>>> 192.168.50.220 link#1 UHS 0 8347 lo0 >>>>>>>> 192.168.50.221 link#1 UHS 0 103024 lo0 >>>>>>>> 192.168.50.223 link#1 UHS 0 43614 lo0 >>>>>>>> 192.168.50.224 link#1 UHS 0 8358 lo0 >>>>>>>> 192.168.50.225 link#1 UHS 0 8438 lo0 >>>>>>>> 192.168.50.226 link#1 UHS 0 8338 lo0 >>>>>>>> 192.168.50.227 link#1 UHS 0 8333 lo0 >>>>>>>> 192.168.165.0/24 192.168.50.200 UGS 0 3311 bce0 >>>>>>>> 192.168.166.0/24 192.168.50.200 UGS 0 699 bce0 >>>>>>>> 192.168.167.0/24 192.168.50.200 UGS 0 3012 bce0 >>>>>>>> 192.168.168.0/24 192.168.50.200 UGS 0 552 bce0 >>>>>>>> >>>>>>>> Internet6: >>>>>>>> Destination Gateway >>>>>>>> Flags Netif Expire >>>>>>>> ::1 ::1 UH >>>>>>>> lo0 >>>>>>>> fe80::%lo0/64 link#11 U >>>>>>>> lo0 >>>>>>>> fe80::1%lo0 link#11 UHS >>>>>>>> lo0 >>>>>>>> ff01::%lo0/32 fe80::1%lo0 U >>>>>>>> lo0 >>>>>>>> ff02::%lo0/32 fe80::1%lo0 U >>>>>>>> lo0 >>>>>>>> DellT410one# kldstat >>>>>>>> Id Refs Address Size Name >>>>>>>> 1 19 0xffffffff80100000 dbf5d0 kernel >>>>>>>> 2 3 0xffffffff80ec0000 4c358 vboxdrv.ko >>>>>>>> 3 1 0xffffffff81012000 131998 zfs.ko >>>>>>>> 4 1 0xffffffff81144000 1ff1 opensolaris.ko >>>>>>>> 5 2 0xffffffff81146000 2940 vboxnetflt.ko >>>>>>>> 6 2 0xffffffff81149000 8e38 netgraph.ko >>>>>>>> 7 1 0xffffffff81152000 153c ng_ether.ko >>>>>>>> 8 1 0xffffffff81154000 e70 vboxnetadp.ko >>>>>>>> DellT410one# pciconf -lv >>>>>>>> .. >>>>>>>> bce0@pci0:1:0:0: class=3D0x020000 card=3D0x028d1028 >>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>>>> vendor =3D 'Broadcom Corporation' >>>>>>>> class =3D network >>>>>>>> subclass =3D ethernet >>>>>>>> bce1@pci0:1:0:1: class=3D0x020000 card=3D0x028d1028 >>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>>>> vendor =3D 'Broadcom Corporation' >>>>>>>> class =3D network >>>>>>>> subclass =3D ethernet >>>>>>> >>>>>>> Could you please provide "pciconf -lvcb" output instead, =20 >>>>>>> specific to the >>>>>>> bce chips? Thanks. >>>>>> >>>>>> Her it is: >>>>>> >>>>>> bce0@pci0:1:0:0: class=3D0x020000 card=3D0x028d1028 >>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>> vendor =3D 'Broadcom Corporation' >>>>>> class =3D network >>>>>> subclass =3D ethernet >>>>>> bar [10] =3D type Memory, range 64, base 0xda000000, size >>>>>> 33554432, enabled >>>>>> cap 01[48] =3D powerspec 3 supports D0 D3 current D0 >>>>>> cap 03[50] =3D VPD >>>>>> cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1 mess= age >>>>>> cap 11[a0] =3D MSI-X supports 9 messages in map 0x10 >>>>>> cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link x4(x4) >>>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4 >>>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected >>>>>> ecap 0004[150] =3D unknown 1 >>>>>> ecap 0002[160] =3D VC 1 max VC0 >>>>> >>>>> Thanks Peter. >>>>> >>>>> Adding Yong-Hyeon and David to the discussion, since they've both work= ed >>>>> on the bce(4) driver in recent months (most of the changes made recent= ly >>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains >>>>> em(4). Brief history for the devs: >>>>> >>>>> The issue is described "Network memory allocation failures" and was >>>>> reported last year, but two users recently (Scott and Peter) have >>>>> reported the issue again: >>>>> >>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/threa= d.html#58708 >>>>> >>>>> And was mentioned again by Scott here, which also contains some >>>>> technical details: >>>>> >>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.htm= l >>>>> >>>>> What's interesting is that Scott's issue is identical in form but he's >>>>> using em(4), which isn't known to behave like this. Both individuals >>>>> are using VirtualBox, though we're not sure at this point if that is t= he >>>>> piece which is causing the anomaly. >>>>> >>>>> Relevant details of Scott's system (em-based): >>>>> >>>>> http://www.cap-press.com/misc/ >>>>> >>>>> Relevant details of Peter's system (bce-based): >>>>> >>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.htm= l >>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.htm= l >>>>> >>>>> I think the biggest complexity right now is figuring out how/why scp >>>>> fails intermittently in this nature. The errno probably "trickles dow= n" >>>>> to userland from the kernel, but the condition regarding why it happen= s >>>>> is unknown. >>>> >>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a >>>> jail on that box. >>>> >>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/messages >>>> Apr 13 05:17:41 bind named[23534]: internal_send: >>>> 192.168.50.145#65176: Cannot allocate memory >>>> Jun 21 23:30:44 bind named[39864]: internal_send: >>>> 192.168.50.251#36155: Cannot allocate memory >>>> Jun 24 15:28:00 bind named[39864]: internal_send: >>>> 192.168.50.251#28651: Cannot allocate memory >>>> Jun 28 12:57:52 bind named[2462]: internal_send: >>>> 192.168.165.154#1201: Cannot allocate memory >>>> >>>> My initial guess: it happens sooner or later somehow - whether it is >>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a >>>> lot of traffic over a longer period (a nameserver gets asked again >>>> and again). >>> >>> Scott, are you also using jails? If both of you are: is there any >>> possibility you can remove use of those? I'm not sure how VirtualBox >>> fits into the picture (jails + VirtualBox that is), but I can imagine >>> jails having different environmental constraints that might cause this. >>> >>> Basically the troubleshooting process here is to remove pieces of the >>> puzzle until you figure out which piece is causing the issue. I don't >>> want to get the NIC driver devs all spun up for something that, for >>> example, might be an issue with the jail implementation. >> >> I understand this. As said, I do some afterhours debugging tonight. >> >> The scp/ssh problems are happening _outside_ the jails. The bind =20 >> runs _inside_ the jail. >> >> I wanted to use the _host_ system to send VirtualBox virtual disks =20 >> and filesystems used by jails to archive them and/or having them =20 >> available on other FreeBSD systems (as a cold standby solution). > > I just switched off the VirtualBox (without removing the kernel modules). > > The copy succeeds now. > > Well, it could be a VirtualBox related problem, or is the server =20 > just relieved to have 2GB more memory at hands now? > > Do you have a quick idea to "emulate" the 2GB memory load usually =20 > delivered by VirtualBox? Well, managed that (using lookbusy) Interestingly I could copy a large file (30GB) without problems, as =20 soon as I switched off the VirtualBox. As said, the kernel modules =20 weren't unloaded, they are still there. The copy crashes seconds after I started the VirtualBox. According to =20 vmstat and top I had more free memory (ca. 1.5GB) as I had without =20 VirtualBox and lookbusy (ca. 350MB). So, it looks (to me, at least) as I have a VirtualBox related problem, =20 somehow. Any ideas? I am happy to play a bit more to get it sorted although it =20 has some limits (it is running the company mailserver, after all) Regards Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110706182141.13056plxp148y61h>