Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 06 Jul 2011 18:21:41 +1000
From:      "Peter Ross" <Peter.Ross@bogen.in-berlin.de>
To:        "Jeremy Chadwick" <freebsd@jdc.parodius.com>
Cc:        Yong-Hyeon Pyun <pyunyh@gmail.com>, "Vogel, Jack" <jack.vogel@intel.com>, freebsd-stable List <freebsd-stable@freebsd.org>, davidch@freebsd.org, Scott Sipe <cscotts@gmail.com>
Subject:   Re: scp: Write Failed: Cannot allocate memory
Message-ID:  <20110706182141.13056plxp148y61h@webmail.in-berlin.de>
In-Reply-To: <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de>
References:  <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>:

> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>:
>
>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>
>>> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>
>>>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>
>>>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>>
>>>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with i=
t.
>>>>>>>>>>
>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000
>>>>>>>>>>
>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times
>>>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>>>
>>>>>>>>>> Scott
>>>>>>>>>>
>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>>>> some statistics.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>>>
>>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septembe=
r/058708.html
>>>>>>>>>>>>
>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>>>
>>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septembe=
r/058711.html
>>>>>>>>>>>>
>>>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>>>
>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>>>
>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>>>> then the value was still below.
>>>>>>>>>>>>
>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it
>>>>>> does not help.
>>>>>>>>>>>>
>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>>>> isn't released..
>>>>>>>>>>>>
>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading.
>>>>>>>>>>>>
>>>>>>>>>>>> I am happy to give information gathered on old/new kernel =20
>>>>>>>>>>>> if it helps.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick wro=
te:
>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote:
>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>>>>> with scp. When scping
>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>>>>> most notably large files
>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few
>>>>>>>>>>>>>>>> seconds. In my last test, I
>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>>>>> the transfer died after
>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I
>>>>>>>>>>>>>>>> tried, and then died again on
>>>>>>>>>>>>>>>> the next attempt.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On the client side:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "Connection to home closed by remote host.
>>>>>>>>>>>>>>>> lost connection"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In /var/log/auth.log:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>>>>> failed: Cannot allocate
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've never seen this before and have used scp before
>>>>>>>>>>>>>>>> to transfer large files
>>>>>>>>>>>>>>>> without problems. This computer has been used in
>>>>>>>>>>>>>>>> production for months and
>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been
>>>>>>>>>>>>>>>> able to notice any problems
>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or
>>>>>>>> any problems in
>>>>>>>>>>>>>>>> apache.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Uname:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST
>>>>>>>>>>>>>>>> 2011     root@xeon:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the comput=
er.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am glad to provide any other information or test =20
>>>>>>>>>>>>>>>> anything else.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You didn't provide details about your networking setup =20
>>>>>>>>>>>>>>> (rc.conf,
>>>>>>>>>>>>>>> ifconfig -a, etc.).  netstat -m would be useful too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, =20
>>>>>>>>>>>>>>> titled "Network
>>>>>>>>>>>>>>> memory allocation failures":
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-Septe=
mber/thread.html#58708
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on
>>>>>>>> scp by default.
>>>>>>>>>>>>>>> I believe this problem is similar, if not identical, to your=
s.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a )
>>>>>>>> for the server
>>>>>>>>>>>>>> end and the client.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig =20
>>>>>>>>>>>>>> -a but some
>>>>>>>>>>>>>> information about the networking driver your using for =20
>>>>>>>>>>>>>> the interface
>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration
>>>>>> of the pool.
>>>>>>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You =20
>>>>>>>>>>>>>> should probably
>>>>>>>>>>>>>> prop this information up somewhere so you can reference by
>>>>>>>> URL whenever
>>>>>>>>>>>>>> needed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>>>>> can be made to
>>>>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what =20
>>>>>>>>>>>>>> Jeremy is
>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(1)=
 by
>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or =20
>>>>>>>>>>>>>> mdmfs(8) for /tmp
>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your
>>>>>>>> temp ram area
>>>>>>>>>>>>>> and causing the connection abort which would be
>>>>>>>>>>>>>> expected. ( df -h ) would
>>>>>>>>>>>>>> help here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>>>>> worth, the server is backed up remotely every night with
>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>>>>> seen any errors in the nightly rsync.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the omission of networking info, here's the
>>>>>>>>>>>>> output of the requested commands and some that popped up
>>>>>>>>>>>>> in the other thread:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>>>>>>
>>>>>>>>>>>>> In rc.conf:  ifconfig_em1=3D"inet 10.1.1.1 netmask 255.255.0.0=
"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scott
>>>>>>>>>
>>>>>>>>> Just to make it crystal clear to everyone:
>>>>>>>>>
>>>>>>>>> There is no correlation between this problem and use of ZFS. =20
>>>>>>>>>  People are
>>>>>>>>> attempting to correlate "cannot allocate memory" messages =20
>>>>>>>>> with "anything
>>>>>>>>> on the system that uses memory".  The VM is much more =20
>>>>>>>>> complex than that.
>>>>>>>>>
>>>>>>>>> Given the nature of this problem, it's much more likely the issue =
is
>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it b=
e
>>>>>>>>> driver-level or some sort of intermediary layer.
>>>>>>>>>
>>>>>>>>> Two people who have this issue in this thread are both using =20
>>>>>>>>> VirtualBox.
>>>>>>>>> Can one, or both, of you remove VirtualBox from the configuration
>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and =20
>>>>>>>>> then see if the
>>>>>>>>> issue goes away?
>>>>>>>>
>>>>>>>> On the machine in question I only can do it after hours so I will d=
o
>>>>>>>> it tonight.
>>>>>>>>
>>>>>>>> I was _successfully_ sending the file over the loopback =20
>>>>>>>> interface using
>>>>>>>>
>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > /dev/null=
"
>>>>>>>>
>>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently),
>>>>>>>> and then using IPv4. Both worked.
>>>>>>>>
>>>>>>>> It always fails if I am sending it through the bce(4) interface,
>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so it
>>>>>>>> does not "leave" the computer physically).
>>>>>>>>
>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>>>>> kldstat output.
>>>>>>>>
>>>>>>>> I have another box where I do not see that problem. It copies files
>>>>>>>> happily over the net using ssh.
>>>>>>>>
>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>>>>> driver instead. It runs the same last week's RELENG_8. I installed
>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules). Bu=
t
>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> DellT410one# uname -a
>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Ju=
n
>>>>>>>> 30 17:07:18 EST 2011
>>>>>>>> root@DellT410one.vv.fda:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>>>>> DellT410one# ifconfig -a
>>>>>>>> bce0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
>>>>>>>> metric 0 mtu 1500
>>>>>>>> =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,=
VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>>>>>>>> =09ether 84:2b:2b:68:64:e4
>>>>>>>> =09inet 192.168.50.220 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.221 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.223 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.224 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.225 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.226 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.227 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09inet 192.168.50.219 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>>>> =09media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>>>>> =09status: active
>>>>>>>> bce1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>>>>>> =09options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,=
VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>>>>>>>> =09ether 84:2b:2b:68:64:e5
>>>>>>>> =09media: Ethernet autoselect
>>>>>>>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>>>>>>> =09options=3D3<RXCSUM,TXCSUM>
>>>>>>>> =09inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>>>>>> =09inet6 ::1 prefixlen 128
>>>>>>>> =09inet 127.0.0.1 netmask 0xff000000
>>>>>>>> =09nd6 options=3D3<PERFORMNUD,ACCEPT_RTADV>
>>>>>>>> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 15=
00
>>>>>>>> =09ether 0a:00:27:00:00:00
>>>>>>>> DellT410one# netstat -rn
>>>>>>>> Routing tables
>>>>>>>>
>>>>>>>> Internet:
>>>>>>>> Destination        Gateway            Flags    Refs      Use  =20
>>>>>>>> Netif Expire
>>>>>>>> default            192.168.50.201     UGS         0    52195   bce0
>>>>>>>> 127.0.0.1          link#11            UH          0        6    lo0
>>>>>>>> 192.168.50.0/24    link#1             U           0  1118212   bce0
>>>>>>>> 192.168.50.219     link#1             UHS         0     9670    lo0
>>>>>>>> 192.168.50.220     link#1             UHS         0     8347    lo0
>>>>>>>> 192.168.50.221     link#1             UHS         0   103024    lo0
>>>>>>>> 192.168.50.223     link#1             UHS         0    43614    lo0
>>>>>>>> 192.168.50.224     link#1             UHS         0     8358    lo0
>>>>>>>> 192.168.50.225     link#1             UHS         0     8438    lo0
>>>>>>>> 192.168.50.226     link#1             UHS         0     8338    lo0
>>>>>>>> 192.168.50.227     link#1             UHS         0     8333    lo0
>>>>>>>> 192.168.165.0/24   192.168.50.200     UGS         0     3311   bce0
>>>>>>>> 192.168.166.0/24   192.168.50.200     UGS         0      699   bce0
>>>>>>>> 192.168.167.0/24   192.168.50.200     UGS         0     3012   bce0
>>>>>>>> 192.168.168.0/24   192.168.50.200     UGS         0      552   bce0
>>>>>>>>
>>>>>>>> Internet6:
>>>>>>>> Destination                       Gateway
>>>>>>>> Flags      Netif Expire
>>>>>>>> ::1                               ::1                           UH
>>>>>>>> lo0
>>>>>>>> fe80::%lo0/64                     link#11                       U
>>>>>>>> lo0
>>>>>>>> fe80::1%lo0                       link#11                       UHS
>>>>>>>> lo0
>>>>>>>> ff01::%lo0/32                     fe80::1%lo0                   U
>>>>>>>> lo0
>>>>>>>> ff02::%lo0/32                     fe80::1%lo0                   U
>>>>>>>> lo0
>>>>>>>> DellT410one# kldstat
>>>>>>>> Id Refs Address            Size     Name
>>>>>>>> 1   19 0xffffffff80100000 dbf5d0   kernel
>>>>>>>> 2    3 0xffffffff80ec0000 4c358    vboxdrv.ko
>>>>>>>> 3    1 0xffffffff81012000 131998   zfs.ko
>>>>>>>> 4    1 0xffffffff81144000 1ff1     opensolaris.ko
>>>>>>>> 5    2 0xffffffff81146000 2940     vboxnetflt.ko
>>>>>>>> 6    2 0xffffffff81149000 8e38     netgraph.ko
>>>>>>>> 7    1 0xffffffff81152000 153c     ng_ether.ko
>>>>>>>> 8    1 0xffffffff81154000 e70      vboxnetadp.ko
>>>>>>>> DellT410one# pciconf -lv
>>>>>>>> ..
>>>>>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>   vendor     =3D 'Broadcom Corporation'
>>>>>>>>   class      =3D network
>>>>>>>>   subclass   =3D ethernet
>>>>>>>> bce1@pci0:1:0:1:        class=3D0x020000 card=3D0x028d1028
>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>   vendor     =3D 'Broadcom Corporation'
>>>>>>>>   class      =3D network
>>>>>>>>   subclass   =3D ethernet
>>>>>>>
>>>>>>> Could you please provide "pciconf -lvcb" output instead, =20
>>>>>>> specific to the
>>>>>>> bce chips?  Thanks.
>>>>>>
>>>>>> Her it is:
>>>>>>
>>>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>   vendor     =3D 'Broadcom Corporation'
>>>>>>   class      =3D network
>>>>>>   subclass   =3D ethernet
>>>>>>   bar   [10] =3D type Memory, range 64, base 0xda000000, size
>>>>>> 33554432, enabled
>>>>>>   cap 01[48] =3D powerspec 3  supports D0 D3  current D0
>>>>>>   cap 03[50] =3D VPD
>>>>>>   cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1 mess=
age
>>>>>>   cap 11[a0] =3D MSI-X supports 9 messages in map 0x10
>>>>>>   cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link x4(x4)
>>>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4
>>>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>> ecap 0004[150] =3D unknown 1
>>>>>> ecap 0002[160] =3D VC 1 max VC0
>>>>>
>>>>> Thanks Peter.
>>>>>
>>>>> Adding Yong-Hyeon and David to the discussion, since they've both work=
ed
>>>>> on the bce(4) driver in recent months (most of the changes made recent=
ly
>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains
>>>>> em(4).  Brief history for the devs:
>>>>>
>>>>> The issue is described "Network memory allocation failures" and was
>>>>> reported last year, but two users recently (Scott and Peter) have
>>>>> reported the issue again:
>>>>>
>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/threa=
d.html#58708
>>>>>
>>>>> And was mentioned again by Scott here, which also contains some
>>>>> technical details:
>>>>>
>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.htm=
l
>>>>>
>>>>> What's interesting is that Scott's issue is identical in form but he's
>>>>> using em(4), which isn't known to behave like this.  Both individuals
>>>>> are using VirtualBox, though we're not sure at this point if that is t=
he
>>>>> piece which is causing the anomaly.
>>>>>
>>>>> Relevant details of Scott's system (em-based):
>>>>>
>>>>> http://www.cap-press.com/misc/
>>>>>
>>>>> Relevant details of Peter's system (bce-based):
>>>>>
>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.htm=
l
>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.htm=
l
>>>>>
>>>>> I think the biggest complexity right now is figuring out how/why scp
>>>>> fails intermittently in this nature.  The errno probably "trickles dow=
n"
>>>>> to userland from the kernel, but the condition regarding why it happen=
s
>>>>> is unknown.
>>>>
>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a
>>>> jail on that box.
>>>>
>>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/messages
>>>> Apr 13 05:17:41 bind named[23534]: internal_send:
>>>> 192.168.50.145#65176: Cannot allocate memory
>>>> Jun 21 23:30:44 bind named[39864]: internal_send:
>>>> 192.168.50.251#36155: Cannot allocate memory
>>>> Jun 24 15:28:00 bind named[39864]: internal_send:
>>>> 192.168.50.251#28651: Cannot allocate memory
>>>> Jun 28 12:57:52 bind named[2462]: internal_send:
>>>> 192.168.165.154#1201: Cannot allocate memory
>>>>
>>>> My initial guess: it happens sooner or later somehow - whether it is
>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a
>>>> lot of traffic over a longer period (a nameserver gets asked again
>>>> and again).
>>>
>>> Scott, are you also using jails?  If both of you are: is there any
>>> possibility you can remove use of those?  I'm not sure how VirtualBox
>>> fits into the picture (jails + VirtualBox that is), but I can imagine
>>> jails having different environmental constraints that might cause this.
>>>
>>> Basically the troubleshooting process here is to remove pieces of the
>>> puzzle until you figure out which piece is causing the issue.  I don't
>>> want to get the NIC driver devs all spun up for something that, for
>>> example, might be an issue with the jail implementation.
>>
>> I understand this. As said, I do some afterhours debugging tonight.
>>
>> The scp/ssh problems are happening _outside_ the jails. The bind =20
>> runs _inside_ the jail.
>>
>> I wanted to use the _host_ system to send VirtualBox virtual disks =20
>> and  filesystems used by jails to archive them and/or having them =20
>> available on other FreeBSD systems (as a cold standby solution).
>
> I just switched off the VirtualBox (without removing the kernel modules).
>
> The copy succeeds now.
>
> Well, it could be a VirtualBox related problem, or is the server =20
> just relieved to have 2GB more memory at hands now?
>
> Do you have a quick idea to "emulate" the 2GB memory load usually =20
> delivered by VirtualBox?

Well, managed that (using lookbusy)

Interestingly I could copy a large file (30GB) without problems, as =20
soon as I switched off the VirtualBox. As said, the kernel modules =20
weren't unloaded, they are still there.

The copy crashes seconds after I started the VirtualBox. According to =20
vmstat and top I had more free memory (ca. 1.5GB) as I had without =20
VirtualBox and lookbusy (ca. 350MB).

So, it looks (to me, at least) as I have a VirtualBox related problem, =20
somehow.

Any ideas? I am happy to play a bit more to get it sorted although it =20
has some limits (it is running the company mailserver, after all)

Regards
Peter




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110706182141.13056plxp148y61h>