From owner-freebsd-stable@FreeBSD.ORG Mon Jul 11 01:59:51 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A6D9106566C; Mon, 11 Jul 2011 01:59:51 +0000 (UTC) (envelope-from Peter.Ross@bogen.in-berlin.de) Received: from einhorn.in-berlin.de (einhorn.in-berlin.de [192.109.42.8]) by mx1.freebsd.org (Postfix) with ESMTP id 335438FC08; Mon, 11 Jul 2011 01:59:49 +0000 (UTC) X-Envelope-From: Peter.Ross@bogen.in-berlin.de Received: from localhost (okapi.in-berlin.de [192.109.42.117]) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id p6B1xlhs013554; Mon, 11 Jul 2011 03:59:48 +0200 Received: from 124-254-118-24-static.bb.ispone.net.au (124-254-118-24-static.bb.ispone.net.au [124.254.118.24]) by webmail.in-berlin.de (Horde Framework) with HTTP; Mon, 11 Jul 2011 11:59:47 +1000 Message-ID: <20110711115947.51686v4930s7ze37@webmail.in-berlin.de> Date: Mon, 11 Jul 2011 11:59:47 +1000 From: "Peter Ross" To: "Scott Sipe" References: <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de> <20110706182141.13056plxp148y61h@webmail.in-berlin.de> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) 4.3.3 X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 Cc: Yong-Hyeon Pyun , freebsd-stable List , davidch@freebsd.org, Jeremy Chadwick , "Vogel, Jack" Subject: Re: scp: Write Failed: Cannot allocate memory X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2011 01:59:51 -0000 Quoting "Scott Sipe" : > On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross =20 > wrote: > >> Quoting "Peter Ross" : >> >> Quoting "Peter Ross" : >>> >>> Quoting "Jeremy Chadwick" : >>>> >>>> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: >>>>> >>>>>> Quoting "Jeremy Chadwick" : >>>>>> >>>>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: >>>>>>> >>>>>>>> Quoting "Jeremy Chadwick" : >>>>>>>> >>>>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: >>>>>>>>> >>>>>>>>>> Quoting "Jeremy Chadwick" : >>>>>>>>>> >>>>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with >>>>>>>>>>>> it. >>>>>>>>>>>> >>>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000 >>>>>>>>>>>> >>>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>>>>>>>>>>> hovering right around that value, sometimes above, sometimes >>>>>>>>>>>> below (that's as it should be, right?). I don't think that it >>>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 time= s >>>>>>>>>>>> and it might fail 1-3 times, with no correlation to the >>>>>>>>>>>> arcstats.size being above/below arc_max that I can see. >>>>>>>>>>>> >>>>>>>>>>>> Scott >>>>>>>>>>>> >>>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> just as an addition: an upgrade to last Friday's >>>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab >>>>>>>>>>>>> >>>>>>>>>>>> some statistics. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> Quoting "Peter Ross" : >>>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I noticed a similar problem last week. It is also very >>>>>>>>>>>>>> similar to one reported last year: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>>>>>>>>> September/058708.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the >>>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>>>>>>>>> September/058711.html >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yours, Scott, is a em(4).. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I >>>>>>>>>>>>>> just want to mention it, in case it matters. I am still >>>>>>>>>>>>>> running VirtualBox 3.2. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases >>>>>>>>>>>>>> then the value was still below. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it >>>>>>>>>>>>>> >>>>>>>>>>>>> does not help. >>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I >>>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual >>>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer >>>>>>>>>>>>>> isn't released.. >>>>>>>>>>>>>> >>>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading= . >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am happy to give information gathered on old/new kernel if = it >>>>>>>>>>>>>> helps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> Peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> Quoting "Scott Sipe" : >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote= : >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems >>>>>>>>>>>>>>>>>> with scp. When scping >>>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server -- >>>>>>>>>>>>>>>>>> most notably large files >>>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few >>>>>>>>>>>>>>>>>> seconds. In my last test, I >>>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and >>>>>>>>>>>>>>>>>> the transfer died after >>>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I >>>>>>>>>>>>>>>>>> tried, and then died again on >>>>>>>>>>>>>>>>>> the next attempt. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On the client side: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "Connection to home closed by remote host. >>>>>>>>>>>>>>>>>> lost connection" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In /var/log/auth.log: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write >>>>>>>>>>>>>>>>>> failed: Cannot allocate >>>>>>>>>>>>>>>>>> memory >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've never seen this before and have used scp before >>>>>>>>>>>>>>>>>> to transfer large files >>>>>>>>>>>>>>>>>> without problems. This computer has been used in >>>>>>>>>>>>>>>>>> production for months and >>>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been >>>>>>>>>>>>>>>>>> able to notice any problems >>>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> any problems in >>>>>>>>>> >>>>>>>>>>> apache. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Uname: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat >>>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST >>>>>>>>>>>>>>>>>> 2011 root@xeon:/usr/obj/usr/src/**sys/GENERIC amd64 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the >>>>>>>>>>>>>>>>>> computer. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am glad to provide any other information or test anything >>>>>>>>>>>>>>>>>> else. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg} >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You didn't provide details about your networking setup >>>>>>>>>>>>>>>>> (rc.conf, >>>>>>>>>>>>>>>>> ifconfig -a, etc.). netstat -m would be useful too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, titled >>>>>>>>>>>>>>>>> "Network >>>>>>>>>>>>>>>>> memory allocation failures": >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-*= * >>>>>>>>>>>>>>>>> September/thread.html#58708 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> scp by default. >>>>>>>>>> >>>>>>>>>>> I believe this problem is similar, if not identical, to yours. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a ) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> for the server >>>>>>>>>> >>>>>>>>>>> end and the client. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a b= ut >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>> information about the networking driver your using for the >>>>>>>>>>>>>>>> interface >>>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> of the pool. >>>>>>>> >>>>>>>>> e.g. ( zpool status -a ;zfs get all ) You should probab= ly >>>>>>>>>>>>>>>> prop this information up somewhere so you can reference by >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> URL whenever >>>>>>>>>> >>>>>>>>>>> needed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> can be made to >>>>>>>>>> >>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy i= s >>>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(= 1) >>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>> default. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8) >>>>>>>>>>>>>>>> for /tmp >>>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> temp ram area >>>>>>>>>> >>>>>>>>>>> and causing the connection abort which would be >>>>>>>>>>>>>>>> expected. ( df -h ) would >>>>>>>>>>>>>>>> help here. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday >>>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD >>>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's >>>>>>>>>>>>>>> worth, the server is backed up remotely every night with >>>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite >>>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not >>>>>>>>>>>>>>> seen any errors in the nightly rsync. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry for the omission of networking info, here's the >>>>>>>>>>>>>>> output of the requested commands and some that popped up >>>>>>>>>>>>>>> in the other thread: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://www.cap-press.com/misc/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In rc.conf: ifconfig_em1=3D"inet 10.1.1.1 netmask 255.255.0= .0" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Scott >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>> Just to make it crystal clear to everyone: >>>>>>>>>>> >>>>>>>>>>> There is no correlation between this problem and use of ZFS. >>>>>>>>>>> People are >>>>>>>>>>> attempting to correlate "cannot allocate memory" messages with >>>>>>>>>>> "anything >>>>>>>>>>> on the system that uses memory". The VM is much more complex th= an >>>>>>>>>>> that. >>>>>>>>>>> >>>>>>>>>>> Given the nature of this problem, it's much more likely the issu= e >>>>>>>>>>> is >>>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it >>>>>>>>>>> be >>>>>>>>>>> driver-level or some sort of intermediary layer. >>>>>>>>>>> >>>>>>>>>>> Two people who have this issue in this thread are both using >>>>>>>>>>> VirtualBox. >>>>>>>>>>> Can one, or both, of you remove VirtualBox from the configuratio= n >>>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and then se= e >>>>>>>>>>> if the >>>>>>>>>>> issue goes away? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On the machine in question I only can do it after hours so I will >>>>>>>>>> do >>>>>>>>>> it tonight. >>>>>>>>>> >>>>>>>>>> I was _successfully_ sending the file over the loopback interface >>>>>>>>>> using >>>>>>>>>> >>>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > >>>>>>>>>> /dev/null" >>>>>>>>>> >>>>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently)= , >>>>>>>>>> and then using IPv4. Both worked. >>>>>>>>>> >>>>>>>>>> It always fails if I am sending it through the bce(4) interface, >>>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so i= t >>>>>>>>>> does not "leave" the computer physically). >>>>>>>>>> >>>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and >>>>>>>>>> kldstat output. >>>>>>>>>> >>>>>>>>>> I have another box where I do not see that problem. It copies fil= es >>>>>>>>>> happily over the net using ssh. >>>>>>>>>> >>>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4) >>>>>>>>>> driver instead. It runs the same last week's RELENG_8. I installe= d >>>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules). >>>>>>>>>> But >>>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM). >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> DellT410one# uname -a >>>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu >>>>>>>>>> Jun >>>>>>>>>> 30 17:07:18 EST 2011 >>>>>>>>>> root@DellT410one.vv.fda:/usr/**obj/usr/src/sys/GENERIC amd64 >>>>>>>>>> DellT410one# ifconfig -a >>>>>>>>>> bce0: flags=3D8943>>>>>>>>> MULTICAST> >>>>>>>>>> metric 0 mtu 1500 >>>>>>>>>> options=3Dc01bb>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_** >>>>>>>>>> HWTSO,LINKSTATE> >>>>>>>>>> ether 84:2b:2b:68:64:e4 >>>>>>>>>> inet 192.168.50.220 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.221 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.223 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.224 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.225 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.226 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.227 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> inet 192.168.50.219 netmask 0xffffff00 broadcast >>>>>>>>>> 192.168.50.255 >>>>>>>>>> media: Ethernet autoselect (1000baseT ) >>>>>>>>>> status: active >>>>>>>>>> bce1: flags=3D8802 metric 0 mtu 15= 00 >>>>>>>>>> options=3Dc01bb>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_** >>>>>>>>>> HWTSO,LINKSTATE> >>>>>>>>>> ether 84:2b:2b:68:64:e5 >>>>>>>>>> media: Ethernet autoselect >>>>>>>>>> lo0: flags=3D8049 metric 0 mtu >>>>>>>>>> 16384 >>>>>>>>>> options=3D3 >>>>>>>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb >>>>>>>>>> inet6 ::1 prefixlen 128 >>>>>>>>>> inet 127.0.0.1 netmask 0xff000000 >>>>>>>>>> nd6 options=3D3 >>>>>>>>>> vboxnet0: flags=3D8802 metric 0 mt= u >>>>>>>>>> 1500 >>>>>>>>>> ether 0a:00:27:00:00:00 >>>>>>>>>> DellT410one# netstat -rn >>>>>>>>>> Routing tables >>>>>>>>>> >>>>>>>>>> Internet: >>>>>>>>>> Destination Gateway Flags Refs Use Net= if >>>>>>>>>> Expire >>>>>>>>>> default 192.168.50.201 UGS 0 52195 bc= e0 >>>>>>>>>> 127.0.0.1 link#11 UH 0 6 l= o0 >>>>>>>>>> 192.168.50.0/24 link#1 U 0 1118212 >>>>>>>>>> bce0 >>>>>>>>>> 192.168.50.219 link#1 UHS 0 9670 l= o0 >>>>>>>>>> 192.168.50.220 link#1 UHS 0 8347 l= o0 >>>>>>>>>> 192.168.50.221 link#1 UHS 0 103024 l= o0 >>>>>>>>>> 192.168.50.223 link#1 UHS 0 43614 l= o0 >>>>>>>>>> 192.168.50.224 link#1 UHS 0 8358 l= o0 >>>>>>>>>> 192.168.50.225 link#1 UHS 0 8438 l= o0 >>>>>>>>>> 192.168.50.226 link#1 UHS 0 8338 l= o0 >>>>>>>>>> 192.168.50.227 link#1 UHS 0 8333 l= o0 >>>>>>>>>> 192.168.165.0/24 192.168.50.200 UGS 0 3311 >>>>>>>>>> bce0 >>>>>>>>>> 192.168.166.0/24 192.168.50.200 UGS 0 699 >>>>>>>>>> bce0 >>>>>>>>>> 192.168.167.0/24 192.168.50.200 UGS 0 3012 >>>>>>>>>> bce0 >>>>>>>>>> 192.168.168.0/24 192.168.50.200 UGS 0 552 >>>>>>>>>> bce0 >>>>>>>>>> >>>>>>>>>> Internet6: >>>>>>>>>> Destination Gateway >>>>>>>>>> Flags Netif Expire >>>>>>>>>> ::1 ::1 U= H >>>>>>>>>> lo0 >>>>>>>>>> fe80::%lo0/64 link#11 U >>>>>>>>>> lo0 >>>>>>>>>> fe80::1%lo0 link#11 U= HS >>>>>>>>>> lo0 >>>>>>>>>> ff01::%lo0/32 fe80::1%lo0 U >>>>>>>>>> lo0 >>>>>>>>>> ff02::%lo0/32 fe80::1%lo0 U >>>>>>>>>> lo0 >>>>>>>>>> DellT410one# kldstat >>>>>>>>>> Id Refs Address Size Name >>>>>>>>>> 1 19 0xffffffff80100000 dbf5d0 kernel >>>>>>>>>> 2 3 0xffffffff80ec0000 4c358 vboxdrv.ko >>>>>>>>>> 3 1 0xffffffff81012000 131998 zfs.ko >>>>>>>>>> 4 1 0xffffffff81144000 1ff1 opensolaris.ko >>>>>>>>>> 5 2 0xffffffff81146000 2940 vboxnetflt.ko >>>>>>>>>> 6 2 0xffffffff81149000 8e38 netgraph.ko >>>>>>>>>> 7 1 0xffffffff81152000 153c ng_ether.ko >>>>>>>>>> 8 1 0xffffffff81154000 e70 vboxnetadp.ko >>>>>>>>>> DellT410one# pciconf -lv >>>>>>>>>> .. >>>>>>>>>> bce0@pci0:1:0:0: class=3D0x020000 card=3D0x028d1028 >>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>>>>>> vendor =3D 'Broadcom Corporation' >>>>>>>>>> class =3D network >>>>>>>>>> subclass =3D ethernet >>>>>>>>>> bce1@pci0:1:0:1: class=3D0x020000 card=3D0x028d1028 >>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>>>>>> vendor =3D 'Broadcom Corporation' >>>>>>>>>> class =3D network >>>>>>>>>> subclass =3D ethernet >>>>>>>>>> >>>>>>>>> >>>>>>>>> Could you please provide "pciconf -lvcb" output instead, specific = to >>>>>>>>> the >>>>>>>>> bce chips? Thanks. >>>>>>>>> >>>>>>>> >>>>>>>> Her it is: >>>>>>>> >>>>>>>> bce0@pci0:1:0:0: class=3D0x020000 card=3D0x028d1028 >>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00 >>>>>>>> vendor =3D 'Broadcom Corporation' >>>>>>>> class =3D network >>>>>>>> subclass =3D ethernet >>>>>>>> bar [10] =3D type Memory, range 64, base 0xda000000, size >>>>>>>> 33554432, enabled >>>>>>>> cap 01[48] =3D powerspec 3 supports D0 D3 current D0 >>>>>>>> cap 03[50] =3D VPD >>>>>>>> cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1 mes= sage >>>>>>>> cap 11[a0] =3D MSI-X supports 9 messages in map 0x10 >>>>>>>> cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link x4(x4= ) >>>>>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4 >>>>>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected >>>>>>>> ecap 0004[150] =3D unknown 1 >>>>>>>> ecap 0002[160] =3D VC 1 max VC0 >>>>>>>> >>>>>>> >>>>>>> Thanks Peter. >>>>>>> >>>>>>> Adding Yong-Hyeon and David to the discussion, since they've both >>>>>>> worked >>>>>>> on the bce(4) driver in recent months (most of the changes made >>>>>>> recently >>>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains >>>>>>> em(4). Brief history for the devs: >>>>>>> >>>>>>> The issue is described "Network memory allocation failures" and was >>>>>>> reported last year, but two users recently (Scott and Peter) have >>>>>>> reported the issue again: >>>>>>> >>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>> September/thread.html#58708 >>>>>>> >>>>>>> And was mentioned again by Scott here, which also contains some >>>>>>> technical details: >>>>>>> >>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>>> July/063172.html >>>>>>> >>>>>>> What's interesting is that Scott's issue is identical in form but he= 's >>>>>>> using em(4), which isn't known to behave like this. Both individual= s >>>>>>> are using VirtualBox, though we're not sure at this point if that is >>>>>>> the >>>>>>> piece which is causing the anomaly. >>>>>>> >>>>>>> Relevant details of Scott's system (em-based): >>>>>>> >>>>>>> http://www.cap-press.com/misc/ >>>>>>> >>>>>>> Relevant details of Peter's system (bce-based): >>>>>>> >>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>>> July/063221.html >>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>>> July/063223.html >>>>>>> >>>>>>> I think the biggest complexity right now is figuring out how/why scp >>>>>>> fails intermittently in this nature. The errno probably "trickles >>>>>>> down" >>>>>>> to userland from the kernel, but the condition regarding why it >>>>>>> happens >>>>>>> is unknown. >>>>>>> >>>>>> >>>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a >>>>>> jail on that box. >>>>>> >>>>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/**message= s >>>>>> Apr 13 05:17:41 bind named[23534]: internal_send: >>>>>> 192.168.50.145#65176: Cannot allocate memory >>>>>> Jun 21 23:30:44 bind named[39864]: internal_send: >>>>>> 192.168.50.251#36155: Cannot allocate memory >>>>>> Jun 24 15:28:00 bind named[39864]: internal_send: >>>>>> 192.168.50.251#28651: Cannot allocate memory >>>>>> Jun 28 12:57:52 bind named[2462]: internal_send: >>>>>> 192.168.165.154#1201: Cannot allocate memory >>>>>> >>>>>> My initial guess: it happens sooner or later somehow - whether it is >>>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a >>>>>> lot of traffic over a longer period (a nameserver gets asked again >>>>>> and again). >>>>>> >>>>> >>>>> Scott, are you also using jails? If both of you are: is there any >>>>> possibility you can remove use of those? I'm not sure how VirtualBox >>>>> fits into the picture (jails + VirtualBox that is), but I can imagine >>>>> jails having different environmental constraints that might cause this= . >>>>> >>>>> Basically the troubleshooting process here is to remove pieces of the >>>>> puzzle until you figure out which piece is causing the issue. I don't >>>>> want to get the NIC driver devs all spun up for something that, for >>>>> example, might be an issue with the jail implementation. >>>>> >>>> >>>> I understand this. As said, I do some afterhours debugging tonight. >>>> >>>> The scp/ssh problems are happening _outside_ the jails. The bind runs >>>> _inside_ the jail. >>>> >>>> I wanted to use the _host_ system to send VirtualBox virtual disks and >>>> filesystems used by jails to archive them and/or having them available= on >>>> other FreeBSD systems (as a cold standby solution). >>>> >>> >>> I just switched off the VirtualBox (without removing the kernel modules)= . >>> >>> The copy succeeds now. >>> >>> Well, it could be a VirtualBox related problem, or is the server just >>> relieved to have 2GB more memory at hands now? >>> >>> Do you have a quick idea to "emulate" the 2GB memory load usually >>> delivered by VirtualBox? >>> >> >> Well, managed that (using lookbusy) >> >> Interestingly I could copy a large file (30GB) without problems, as soon = as >> I switched off the VirtualBox. As said, the kernel modules weren't unload= ed, >> they are still there. >> >> The copy crashes seconds after I started the VirtualBox. According to >> vmstat and top I had more free memory (ca. 1.5GB) as I had without >> VirtualBox and lookbusy (ca. 350MB). >> >> So, it looks (to me, at least) as I have a VirtualBox related problem, >> somehow. >> >> Any ideas? I am happy to play a bit more to get it sorted although it has >> some limits (it is running the company mailserver, after all) >> >> Regards >> Peter >> > > This is it -- I'm seeing the exact same thing. > > Scp dies reliably with VirtualBox running. Quit VirtualBox and I was able = to > scp about 30 large files with no errors. Once I started VirtualBox an > in-progress scp died within seconds. > > Ditto that the Kernel modules merely being loaded don't seem to make a > difference, it's VirtualBox actually running. > > virtualbox-ose-3.2.12_1 Hi, I wonder whether anyone has new ideas. I am puzzled that it happens when VirtualBoxes are running, while the =20 load or unload of the VirtualBox kernel modules doesn't seem to have =20 an effect. Should I describe the case at the -emulation mailing list to get some =20 ideas from the engineers working on VirtualBox? I do not want to create too much noise so I would like to know your =20 thoughts on it first. I experimented a little bit with the ssh code and know which write(2) in /usr/src/crypto/openssh/roaming_common.c (in function roaming_write) =20 returns the ENOMEM (an error it should never return, according to the =20 mainpage;-) but unfortunately I am lost to track it further down in the kernel. I =20 do not know enough about it, to be frankly. Are there any memory stats inside the kernel that could help? Thank you for all ideas Peter