Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jul 2011 11:59:47 +1000
From:      "Peter Ross" <Peter.Ross@bogen.in-berlin.de>
To:        "Scott Sipe" <cscotts@gmail.com>
Cc:        Yong-Hyeon Pyun <pyunyh@gmail.com>, freebsd-stable List <freebsd-stable@freebsd.org>, davidch@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>, "Vogel, Jack" <jack.vogel@intel.com>
Subject:   Re: scp: Write Failed: Cannot allocate memory
Message-ID:  <20110711115947.51686v4930s7ze37@webmail.in-berlin.de>
In-Reply-To: <CA%2B30O_O8b8O29rc6BLnnGVTY3cWzpuKQ1q8FTG1idJKM5ykrvA@mail.gmail.com>
References:  <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de> <20110706182141.13056plxp148y61h@webmail.in-berlin.de> <CA%2B30O_O8b8O29rc6BLnnGVTY3cWzpuKQ1q8FTG1idJKM5ykrvA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting "Scott Sipe" <cscotts@gmail.com>:

> On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross =20
> <Peter.Ross@bogen.in-berlin.de>wrote:
>
>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>
>>  Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>>
>>>  Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>
>>>>  On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>>>>>
>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>
>>>>>>  On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>>>>>
>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>>
>>>>>>>>  On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>>>>>
>>>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>>>>
>>>>>>>>>>  On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with
>>>>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000
>>>>>>>>>>>>
>>>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 time=
s
>>>>>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>>>>>
>>>>>>>>>>>> Scott
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>>>>>>>>>
>>>>>>>>>>>> some statistics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>> September/058708.html<http://lists.freebsd.org/pipermail/free=
bsd-stable/2010-September/058708.html>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>> September/058711.html<http://lists.freebsd.org/pipermail/free=
bsd-stable/2010-September/058711.html>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>>>>>> then the value was still below.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it
>>>>>>>>>>>>>>
>>>>>>>>>>>>> does not help.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>>>>>> isn't released..
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading=
.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am happy to give information gathered on old/new kernel if =
it
>>>>>>>>>>>>>> helps.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote=
:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>>>>>>> with scp. When scping
>>>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>>>>>>> most notably large files
>>>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few
>>>>>>>>>>>>>>>>>> seconds. In my last test, I
>>>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>>>>>>> the transfer died after
>>>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I
>>>>>>>>>>>>>>>>>> tried, and then died again on
>>>>>>>>>>>>>>>>>> the next attempt.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On the client side:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "Connection to home closed by remote host.
>>>>>>>>>>>>>>>>>> lost connection"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In /var/log/auth.log:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>>>>>>> failed: Cannot allocate
>>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've never seen this before and have used scp before
>>>>>>>>>>>>>>>>>> to transfer large files
>>>>>>>>>>>>>>>>>> without problems. This computer has been used in
>>>>>>>>>>>>>>>>>> production for months and
>>>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been
>>>>>>>>>>>>>>>>>> able to notice any problems
>>>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> any problems in
>>>>>>>>>>
>>>>>>>>>>> apache.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Uname:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST
>>>>>>>>>>>>>>>>>> 2011     root@xeon:/usr/obj/usr/src/**sys/GENERIC  amd64
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the
>>>>>>>>>>>>>>>>>> computer.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am glad to provide any other information or test anything
>>>>>>>>>>>>>>>>>> else.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg}
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You didn't provide details about your networking setup
>>>>>>>>>>>>>>>>> (rc.conf,
>>>>>>>>>>>>>>>>> ifconfig -a, etc.).  netstat -m would be useful too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, titled
>>>>>>>>>>>>>>>>> "Network
>>>>>>>>>>>>>>>>> memory allocation failures":
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-*=
*
>>>>>>>>>>>>>>>>> September/thread.html#58708<http://lists.freebsd.org/piper=
mail/freebsd-stable/2010-September/thread.html#58708>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> scp by default.
>>>>>>>>>>
>>>>>>>>>>> I believe this problem is similar, if not identical, to yours.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for the server
>>>>>>>>>>
>>>>>>>>>>> end and the client.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a b=
ut
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> information about the networking driver your using for the
>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> of the pool.
>>>>>>>>
>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should probab=
ly
>>>>>>>>>>>>>>>> prop this information up somewhere so you can reference by
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> URL whenever
>>>>>>>>>>
>>>>>>>>>>> needed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> can be made to
>>>>>>>>>>
>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy i=
s
>>>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(=
1)
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8)
>>>>>>>>>>>>>>>> for /tmp
>>>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> temp ram area
>>>>>>>>>>
>>>>>>>>>>> and causing the connection abort which would be
>>>>>>>>>>>>>>>> expected. ( df -h ) would
>>>>>>>>>>>>>>>> help here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>>>>>>> worth, the server is backed up remotely every night with
>>>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>>>>>>> seen any errors in the nightly rsync.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for the omission of networking info, here's the
>>>>>>>>>>>>>>> output of the requested commands and some that popped up
>>>>>>>>>>>>>>> in the other thread:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In rc.conf:  ifconfig_em1=3D"inet 10.1.1.1 netmask 255.255.0=
.0"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Scott
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>> Just to make it crystal clear to everyone:
>>>>>>>>>>>
>>>>>>>>>>> There is no correlation between this problem and use of ZFS.
>>>>>>>>>>>  People are
>>>>>>>>>>> attempting to correlate "cannot allocate memory" messages with
>>>>>>>>>>> "anything
>>>>>>>>>>> on the system that uses memory".  The VM is much more complex th=
an
>>>>>>>>>>> that.
>>>>>>>>>>>
>>>>>>>>>>> Given the nature of this problem, it's much more likely the issu=
e
>>>>>>>>>>> is
>>>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it
>>>>>>>>>>> be
>>>>>>>>>>> driver-level or some sort of intermediary layer.
>>>>>>>>>>>
>>>>>>>>>>> Two people who have this issue in this thread are both using
>>>>>>>>>>> VirtualBox.
>>>>>>>>>>> Can one, or both, of you remove VirtualBox from the configuratio=
n
>>>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and then se=
e
>>>>>>>>>>> if the
>>>>>>>>>>> issue goes away?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the machine in question I only can do it after hours so I will
>>>>>>>>>> do
>>>>>>>>>> it tonight.
>>>>>>>>>>
>>>>>>>>>> I was _successfully_ sending the file over the loopback interface
>>>>>>>>>> using
>>>>>>>>>>
>>>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat >
>>>>>>>>>> /dev/null"
>>>>>>>>>>
>>>>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently)=
,
>>>>>>>>>> and then using IPv4. Both worked.
>>>>>>>>>>
>>>>>>>>>> It always fails if I am sending it through the bce(4) interface,
>>>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so i=
t
>>>>>>>>>> does not "leave" the computer physically).
>>>>>>>>>>
>>>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>>>>>>> kldstat output.
>>>>>>>>>>
>>>>>>>>>> I have another box where I do not see that problem. It copies fil=
es
>>>>>>>>>> happily over the net using ssh.
>>>>>>>>>>
>>>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>>>>>>> driver instead. It runs the same last week's RELENG_8. I installe=
d
>>>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules).
>>>>>>>>>> But
>>>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> DellT410one# uname -a
>>>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu
>>>>>>>>>> Jun
>>>>>>>>>> 30 17:07:18 EST 2011
>>>>>>>>>> root@DellT410one.vv.fda:/usr/**obj/usr/src/sys/GENERIC  amd64
>>>>>>>>>> DellT410one# ifconfig -a
>>>>>>>>>> bce0: flags=3D8943<UP,BROADCAST,**RUNNING,PROMISC,SIMPLEX,**
>>>>>>>>>> MULTICAST>
>>>>>>>>>> metric 0 mtu 1500
>>>>>>>>>>        options=3Dc01bb<RXCSUM,TXCSUM,**
>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>        ether 84:2b:2b:68:64:e4
>>>>>>>>>>        inet 192.168.50.220 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.221 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.223 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.224 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.225 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.226 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.227 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        inet 192.168.50.219 netmask 0xffffff00 broadcast
>>>>>>>>>> 192.168.50.255
>>>>>>>>>>        media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>>>>>>>        status: active
>>>>>>>>>> bce1: flags=3D8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu 15=
00
>>>>>>>>>>        options=3Dc01bb<RXCSUM,TXCSUM,**
>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>        ether 84:2b:2b:68:64:e5
>>>>>>>>>>        media: Ethernet autoselect
>>>>>>>>>> lo0: flags=3D8049<UP,LOOPBACK,**RUNNING,MULTICAST> metric 0 mtu
>>>>>>>>>> 16384
>>>>>>>>>>        options=3D3<RXCSUM,TXCSUM>
>>>>>>>>>>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>>>>>>>>        inet6 ::1 prefixlen 128
>>>>>>>>>>        inet 127.0.0.1 netmask 0xff000000
>>>>>>>>>>        nd6 options=3D3<PERFORMNUD,ACCEPT_**RTADV>
>>>>>>>>>> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mt=
u
>>>>>>>>>> 1500
>>>>>>>>>>        ether 0a:00:27:00:00:00
>>>>>>>>>> DellT410one# netstat -rn
>>>>>>>>>> Routing tables
>>>>>>>>>>
>>>>>>>>>> Internet:
>>>>>>>>>> Destination        Gateway            Flags    Refs      Use  Net=
if
>>>>>>>>>> Expire
>>>>>>>>>> default            192.168.50.201     UGS         0    52195   bc=
e0
>>>>>>>>>> 127.0.0.1          link#11            UH          0        6    l=
o0
>>>>>>>>>> 192.168.50.0/24    link#1             U           0  1118212
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.50.219     link#1             UHS         0     9670    l=
o0
>>>>>>>>>> 192.168.50.220     link#1             UHS         0     8347    l=
o0
>>>>>>>>>> 192.168.50.221     link#1             UHS         0   103024    l=
o0
>>>>>>>>>> 192.168.50.223     link#1             UHS         0    43614    l=
o0
>>>>>>>>>> 192.168.50.224     link#1             UHS         0     8358    l=
o0
>>>>>>>>>> 192.168.50.225     link#1             UHS         0     8438    l=
o0
>>>>>>>>>> 192.168.50.226     link#1             UHS         0     8338    l=
o0
>>>>>>>>>> 192.168.50.227     link#1             UHS         0     8333    l=
o0
>>>>>>>>>> 192.168.165.0/24   192.168.50.200     UGS         0     3311
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.166.0/24   192.168.50.200     UGS         0      699
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.167.0/24   192.168.50.200     UGS         0     3012
>>>>>>>>>> bce0
>>>>>>>>>> 192.168.168.0/24   192.168.50.200     UGS         0      552
>>>>>>>>>> bce0
>>>>>>>>>>
>>>>>>>>>> Internet6:
>>>>>>>>>> Destination                       Gateway
>>>>>>>>>> Flags      Netif Expire
>>>>>>>>>> ::1                               ::1                           U=
H
>>>>>>>>>> lo0
>>>>>>>>>> fe80::%lo0/64                     link#11                       U
>>>>>>>>>> lo0
>>>>>>>>>> fe80::1%lo0                       link#11                       U=
HS
>>>>>>>>>> lo0
>>>>>>>>>> ff01::%lo0/32                     fe80::1%lo0                   U
>>>>>>>>>> lo0
>>>>>>>>>> ff02::%lo0/32                     fe80::1%lo0                   U
>>>>>>>>>> lo0
>>>>>>>>>> DellT410one# kldstat
>>>>>>>>>> Id Refs Address            Size     Name
>>>>>>>>>> 1   19 0xffffffff80100000 dbf5d0   kernel
>>>>>>>>>> 2    3 0xffffffff80ec0000 4c358    vboxdrv.ko
>>>>>>>>>> 3    1 0xffffffff81012000 131998   zfs.ko
>>>>>>>>>> 4    1 0xffffffff81144000 1ff1     opensolaris.ko
>>>>>>>>>> 5    2 0xffffffff81146000 2940     vboxnetflt.ko
>>>>>>>>>> 6    2 0xffffffff81149000 8e38     netgraph.ko
>>>>>>>>>> 7    1 0xffffffff81152000 153c     ng_ether.ko
>>>>>>>>>> 8    1 0xffffffff81154000 e70      vboxnetadp.ko
>>>>>>>>>> DellT410one# pciconf -lv
>>>>>>>>>> ..
>>>>>>>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>>>  vendor     =3D 'Broadcom Corporation'
>>>>>>>>>>  class      =3D network
>>>>>>>>>>  subclass   =3D ethernet
>>>>>>>>>> bce1@pci0:1:0:1:        class=3D0x020000 card=3D0x028d1028
>>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>>>  vendor     =3D 'Broadcom Corporation'
>>>>>>>>>>  class      =3D network
>>>>>>>>>>  subclass   =3D ethernet
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could you please provide "pciconf -lvcb" output instead, specific =
to
>>>>>>>>> the
>>>>>>>>> bce chips?  Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Her it is:
>>>>>>>>
>>>>>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>  vendor     =3D 'Broadcom Corporation'
>>>>>>>>  class      =3D network
>>>>>>>>  subclass   =3D ethernet
>>>>>>>>  bar   [10] =3D type Memory, range 64, base 0xda000000, size
>>>>>>>> 33554432, enabled
>>>>>>>>  cap 01[48] =3D powerspec 3  supports D0 D3  current D0
>>>>>>>>  cap 03[50] =3D VPD
>>>>>>>>  cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1 mes=
sage
>>>>>>>>  cap 11[a0] =3D MSI-X supports 9 messages in map 0x10
>>>>>>>>  cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link x4(x4=
)
>>>>>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4
>>>>>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>>>> ecap 0004[150] =3D unknown 1
>>>>>>>> ecap 0002[160] =3D VC 1 max VC0
>>>>>>>>
>>>>>>>
>>>>>>> Thanks Peter.
>>>>>>>
>>>>>>> Adding Yong-Hyeon and David to the discussion, since they've both
>>>>>>> worked
>>>>>>> on the bce(4) driver in recent months (most of the changes made
>>>>>>> recently
>>>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains
>>>>>>> em(4).  Brief history for the devs:
>>>>>>>
>>>>>>> The issue is described "Network memory allocation failures" and was
>>>>>>> reported last year, but two users recently (Scott and Peter) have
>>>>>>> reported the issue again:
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/freeb=
sd-stable/2010-September/thread.html#58708>
>>>>>>>
>>>>>>> And was mentioned again by Scott here, which also contains some
>>>>>>> technical details:
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063172.html<http://lists.freebsd.org/pipermail/freebsd-stable/2=
011-July/063172.html>
>>>>>>>
>>>>>>> What's interesting is that Scott's issue is identical in form but he=
's
>>>>>>> using em(4), which isn't known to behave like this.  Both individual=
s
>>>>>>> are using VirtualBox, though we're not sure at this point if that is
>>>>>>> the
>>>>>>> piece which is causing the anomaly.
>>>>>>>
>>>>>>> Relevant details of Scott's system (em-based):
>>>>>>>
>>>>>>> http://www.cap-press.com/misc/
>>>>>>>
>>>>>>> Relevant details of Peter's system (bce-based):
>>>>>>>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063221.html<http://lists.freebsd.org/pipermail/freebsd-stable/2=
011-July/063221.html>
>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>> July/063223.html<http://lists.freebsd.org/pipermail/freebsd-stable/2=
011-July/063223.html>
>>>>>>>
>>>>>>> I think the biggest complexity right now is figuring out how/why scp
>>>>>>> fails intermittently in this nature.  The errno probably "trickles
>>>>>>> down"
>>>>>>> to userland from the kernel, but the condition regarding why it
>>>>>>> happens
>>>>>>> is unknown.
>>>>>>>
>>>>>>
>>>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a
>>>>>> jail on that box.
>>>>>>
>>>>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/**message=
s
>>>>>> Apr 13 05:17:41 bind named[23534]: internal_send:
>>>>>> 192.168.50.145#65176: Cannot allocate memory
>>>>>> Jun 21 23:30:44 bind named[39864]: internal_send:
>>>>>> 192.168.50.251#36155: Cannot allocate memory
>>>>>> Jun 24 15:28:00 bind named[39864]: internal_send:
>>>>>> 192.168.50.251#28651: Cannot allocate memory
>>>>>> Jun 28 12:57:52 bind named[2462]: internal_send:
>>>>>> 192.168.165.154#1201: Cannot allocate memory
>>>>>>
>>>>>> My initial guess: it happens sooner or later somehow - whether it is
>>>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a
>>>>>> lot of traffic over a longer period (a nameserver gets asked again
>>>>>> and again).
>>>>>>
>>>>>
>>>>> Scott, are you also using jails?  If both of you are: is there any
>>>>> possibility you can remove use of those?  I'm not sure how VirtualBox
>>>>> fits into the picture (jails + VirtualBox that is), but I can imagine
>>>>> jails having different environmental constraints that might cause this=
.
>>>>>
>>>>> Basically the troubleshooting process here is to remove pieces of the
>>>>> puzzle until you figure out which piece is causing the issue.  I don't
>>>>> want to get the NIC driver devs all spun up for something that, for
>>>>> example, might be an issue with the jail implementation.
>>>>>
>>>>
>>>> I understand this. As said, I do some afterhours debugging tonight.
>>>>
>>>> The scp/ssh problems are happening _outside_ the jails. The bind runs
>>>> _inside_ the jail.
>>>>
>>>> I wanted to use the _host_ system to send VirtualBox virtual disks and
>>>>  filesystems used by jails to archive them and/or having them available=
 on
>>>> other FreeBSD systems (as a cold standby solution).
>>>>
>>>
>>> I just switched off the VirtualBox (without removing the kernel modules)=
.
>>>
>>> The copy succeeds now.
>>>
>>> Well, it could be a VirtualBox related problem, or is the server just
>>> relieved to have 2GB more memory at hands now?
>>>
>>> Do you have a quick idea to "emulate" the 2GB memory load usually
>>> delivered by VirtualBox?
>>>
>>
>> Well, managed that (using lookbusy)
>>
>> Interestingly I could copy a large file (30GB) without problems, as soon =
as
>> I switched off the VirtualBox. As said, the kernel modules weren't unload=
ed,
>> they are still there.
>>
>> The copy crashes seconds after I started the VirtualBox. According to
>> vmstat and top I had more free memory (ca. 1.5GB) as I had without
>> VirtualBox and lookbusy (ca. 350MB).
>>
>> So, it looks (to me, at least) as I have a VirtualBox related problem,
>> somehow.
>>
>> Any ideas? I am happy to play a bit more to get it sorted although it has
>> some limits (it is running the company mailserver, after all)
>>
>> Regards
>> Peter
>>
>
> This is it -- I'm seeing the exact same thing.
>
> Scp dies reliably with VirtualBox running. Quit VirtualBox and I was able =
to
> scp about 30 large files with no errors. Once I started VirtualBox an
> in-progress scp died within seconds.
>
> Ditto that the Kernel modules merely being loaded don't seem to make a
> difference, it's VirtualBox actually running.
>
> virtualbox-ose-3.2.12_1

Hi,

I wonder whether anyone has new ideas.

I am puzzled that it happens when VirtualBoxes are running, while the =20
load or unload of the VirtualBox kernel modules doesn't seem to have =20
an effect.

Should I describe the case at the -emulation mailing list to get some =20
ideas from the engineers working on VirtualBox?

I do not want to create too much noise so I would like to know your =20
thoughts on it first.

I experimented a little bit with the ssh code and know which write(2) in
/usr/src/crypto/openssh/roaming_common.c (in function roaming_write) =20
returns the ENOMEM (an error it should never return, according to the =20
mainpage;-)

but unfortunately I am lost to track it further down in the kernel. I =20
do not know enough about it, to be frankly.

Are there any memory stats inside the kernel that could help?

Thank you for all ideas
Peter




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110711115947.51686v4930s7ze37>