Date: Wed, 6 Jul 2011 12:22:39 -0400 From: Scott Sipe <cscotts@gmail.com> To: Peter Ross <Peter.Ross@bogen.in-berlin.de> Cc: Yong-Hyeon Pyun <pyunyh@gmail.com>, freebsd-stable List <freebsd-stable@freebsd.org>, davidch@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>, "Vogel, Jack" <jack.vogel@intel.com> Subject: Re: scp: Write Failed: Cannot allocate memory Message-ID: <CA%2B30O_O8b8O29rc6BLnnGVTY3cWzpuKQ1q8FTG1idJKM5ykrvA@mail.gmail.com> In-Reply-To: <20110706182141.13056plxp148y61h@webmail.in-berlin.de> References: <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de> <20110706182141.13056plxp148y61h@webmail.in-berlin.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross <Peter.Ross@bogen.in-berlin.de>wrote: > Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>: > > Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>: >> >> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>> >>> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote: >>>> >>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>>> >>>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote: >>>>>> >>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>>>>> >>>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote: >>>>>>>> >>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>: >>>>>>>>> >>>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote: >>>>>>>>>> >>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with >>>>>>>>>>> it. >>>>>>>>>>> >>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000 >>>>>>>>>>> >>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is >>>>>>>>>>> hovering right around that value, sometimes above, sometimes >>>>>>>>>>> below (that's as it should be, right?). I don't think that it >>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 times >>>>>>>>>>> and it might fail 1-3 times, with no correlation to the >>>>>>>>>>> arcstats.size being above/below arc_max that I can see. >>>>>>>>>>> >>>>>>>>>>> Scott >>>>>>>>>>> >>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> just as an addition: an upgrade to last Friday's >>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab >>>>>>>>>>>> >>>>>>>>>>> some statistics. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I noticed a similar problem last week. It is also very >>>>>>>>>>>>> similar to one reported last year: >>>>>>>>>>>>> >>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>>>>>>>> September/058708.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html> >>>>>>>>>>>>> >>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the >>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon: >>>>>>>>>>>>> >>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>>>>>>>> September/058711.html<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.html> >>>>>>>>>>>>> >>>>>>>>>>>>> Yours, Scott, is a em(4).. >>>>>>>>>>>>> >>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I >>>>>>>>>>>>> just want to mention it, in case it matters. I am still >>>>>>>>>>>>> running VirtualBox 3.2. >>>>>>>>>>>>> >>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching >>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases >>>>>>>>>>>>> then the value was still below. >>>>>>>>>>>>> >>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=1 to sysctl.conf but it >>>>>>>>>>>>> >>>>>>>>>>>> does not help. >>>>>>> >>>>>>>> >>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I >>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual >>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer >>>>>>>>>>>>> isn't released.. >>>>>>>>>>>>> >>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am upgrading. >>>>>>>>>>>>> >>>>>>>>>>>>> I am happy to give information gathered on old/new kernel if it >>>>>>>>>>>>> helps. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems >>>>>>>>>>>>>>>>> with scp. When scping >>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server -- >>>>>>>>>>>>>>>>> most notably large files >>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few >>>>>>>>>>>>>>>>> seconds. In my last test, I >>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and >>>>>>>>>>>>>>>>> the transfer died after >>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I >>>>>>>>>>>>>>>>> tried, and then died again on >>>>>>>>>>>>>>>>> the next attempt. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On the client side: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "Connection to home closed by remote host. >>>>>>>>>>>>>>>>> lost connection" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In /var/log/auth.log: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jul 1 14:54:42 freebsd sshd[18955]: fatal: Write >>>>>>>>>>>>>>>>> failed: Cannot allocate >>>>>>>>>>>>>>>>> memory >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've never seen this before and have used scp before >>>>>>>>>>>>>>>>> to transfer large files >>>>>>>>>>>>>>>>> without problems. This computer has been used in >>>>>>>>>>>>>>>>> production for months and >>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been >>>>>>>>>>>>>>>>> able to notice any problems >>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> any problems in >>>>>>>>> >>>>>>>>>> apache. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Uname: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat >>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST >>>>>>>>>>>>>>>>> 2011 root@xeon:/usr/obj/usr/src/**sys/GENERIC amd64 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the >>>>>>>>>>>>>>>>> computer. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am glad to provide any other information or test anything >>>>>>>>>>>>>>>>> else. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg} >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You didn't provide details about your networking setup >>>>>>>>>>>>>>>> (rc.conf, >>>>>>>>>>>>>>>> ifconfig -a, etc.). netstat -m would be useful too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, titled >>>>>>>>>>>>>>>> "Network >>>>>>>>>>>>>>>> memory allocation failures": >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>>>>>>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> scp by default. >>>>>>>>> >>>>>>>>>> I believe this problem is similar, if not identical, to yours. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a ) >>>>>>>>>>>>>>> >>>>>>>>>>>>>> for the server >>>>>>>>> >>>>>>>>>> end and the client. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a but >>>>>>>>>>>>>>> some >>>>>>>>>>>>>>> information about the networking driver your using for the >>>>>>>>>>>>>>> interface >>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration >>>>>>>>>>>>>>> >>>>>>>>>>>>>> of the pool. >>>>>>> >>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should probably >>>>>>>>>>>>>>> prop this information up somewhere so you can reference by >>>>>>>>>>>>>>> >>>>>>>>>>>>>> URL whenever >>>>>>>>> >>>>>>>>>> needed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1) >>>>>>>>>>>>>>> >>>>>>>>>>>>>> can be made to >>>>>>>>> >>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy is >>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use ssh(1) >>>>>>>>>>>>>>> by >>>>>>>>>>>>>>> default. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8) >>>>>>>>>>>>>>> for /tmp >>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your >>>>>>>>>>>>>>> >>>>>>>>>>>>>> temp ram area >>>>>>>>> >>>>>>>>>> and causing the connection abort which would be >>>>>>>>>>>>>>> expected. ( df -h ) would >>>>>>>>>>>>>>> help here. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday >>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD >>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's >>>>>>>>>>>>>> worth, the server is backed up remotely every night with >>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite >>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not >>>>>>>>>>>>>> seen any errors in the nightly rsync. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry for the omission of networking info, here's the >>>>>>>>>>>>>> output of the requested commands and some that popped up >>>>>>>>>>>>>> in the other thread: >>>>>>>>>>>>>> >>>>>>>>>>>>>> http://www.cap-press.com/misc/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> In rc.conf: ifconfig_em1="inet 10.1.1.1 netmask 255.255.0.0" >>>>>>>>>>>>>> >>>>>>>>>>>>>> Scott >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> Just to make it crystal clear to everyone: >>>>>>>>>> >>>>>>>>>> There is no correlation between this problem and use of ZFS. >>>>>>>>>> People are >>>>>>>>>> attempting to correlate "cannot allocate memory" messages with >>>>>>>>>> "anything >>>>>>>>>> on the system that uses memory". The VM is much more complex than >>>>>>>>>> that. >>>>>>>>>> >>>>>>>>>> Given the nature of this problem, it's much more likely the issue >>>>>>>>>> is >>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it >>>>>>>>>> be >>>>>>>>>> driver-level or some sort of intermediary layer. >>>>>>>>>> >>>>>>>>>> Two people who have this issue in this thread are both using >>>>>>>>>> VirtualBox. >>>>>>>>>> Can one, or both, of you remove VirtualBox from the configuration >>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and then see >>>>>>>>>> if the >>>>>>>>>> issue goes away? >>>>>>>>>> >>>>>>>>> >>>>>>>>> On the machine in question I only can do it after hours so I will >>>>>>>>> do >>>>>>>>> it tonight. >>>>>>>>> >>>>>>>>> I was _successfully_ sending the file over the loopback interface >>>>>>>>> using >>>>>>>>> >>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > >>>>>>>>> /dev/null" >>>>>>>>> >>>>>>>>> I did it, btw, with the IPv6 localhost address first (accidently), >>>>>>>>> and then using IPv4. Both worked. >>>>>>>>> >>>>>>>>> It always fails if I am sending it through the bce(4) interface, >>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so it >>>>>>>>> does not "leave" the computer physically). >>>>>>>>> >>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and >>>>>>>>> kldstat output. >>>>>>>>> >>>>>>>>> I have another box where I do not see that problem. It copies files >>>>>>>>> happily over the net using ssh. >>>>>>>>> >>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4) >>>>>>>>> driver instead. It runs the same last week's RELENG_8. I installed >>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules). >>>>>>>>> But >>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM). >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> DellT410one# uname -a >>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu >>>>>>>>> Jun >>>>>>>>> 30 17:07:18 EST 2011 >>>>>>>>> root@DellT410one.vv.fda:/usr/**obj/usr/src/sys/GENERIC amd64 >>>>>>>>> DellT410one# ifconfig -a >>>>>>>>> bce0: flags=8943<UP,BROADCAST,**RUNNING,PROMISC,SIMPLEX,** >>>>>>>>> MULTICAST> >>>>>>>>> metric 0 mtu 1500 >>>>>>>>> options=c01bb<RXCSUM,TXCSUM,** >>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_** >>>>>>>>> HWTSO,LINKSTATE> >>>>>>>>> ether 84:2b:2b:68:64:e4 >>>>>>>>> inet 192.168.50.220 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.221 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.223 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.224 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.225 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.226 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.227 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> inet 192.168.50.219 netmask 0xffffff00 broadcast >>>>>>>>> 192.168.50.255 >>>>>>>>> media: Ethernet autoselect (1000baseT <full-duplex>) >>>>>>>>> status: active >>>>>>>>> bce1: flags=8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu 1500 >>>>>>>>> options=c01bb<RXCSUM,TXCSUM,** >>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_** >>>>>>>>> HWTSO,LINKSTATE> >>>>>>>>> ether 84:2b:2b:68:64:e5 >>>>>>>>> media: Ethernet autoselect >>>>>>>>> lo0: flags=8049<UP,LOOPBACK,**RUNNING,MULTICAST> metric 0 mtu >>>>>>>>> 16384 >>>>>>>>> options=3<RXCSUM,TXCSUM> >>>>>>>>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb >>>>>>>>> inet6 ::1 prefixlen 128 >>>>>>>>> inet 127.0.0.1 netmask 0xff000000 >>>>>>>>> nd6 options=3<PERFORMNUD,ACCEPT_**RTADV> >>>>>>>>> vboxnet0: flags=8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu >>>>>>>>> 1500 >>>>>>>>> ether 0a:00:27:00:00:00 >>>>>>>>> DellT410one# netstat -rn >>>>>>>>> Routing tables >>>>>>>>> >>>>>>>>> Internet: >>>>>>>>> Destination Gateway Flags Refs Use Netif >>>>>>>>> Expire >>>>>>>>> default 192.168.50.201 UGS 0 52195 bce0 >>>>>>>>> 127.0.0.1 link#11 UH 0 6 lo0 >>>>>>>>> 192.168.50.0/24 link#1 U 0 1118212 >>>>>>>>> bce0 >>>>>>>>> 192.168.50.219 link#1 UHS 0 9670 lo0 >>>>>>>>> 192.168.50.220 link#1 UHS 0 8347 lo0 >>>>>>>>> 192.168.50.221 link#1 UHS 0 103024 lo0 >>>>>>>>> 192.168.50.223 link#1 UHS 0 43614 lo0 >>>>>>>>> 192.168.50.224 link#1 UHS 0 8358 lo0 >>>>>>>>> 192.168.50.225 link#1 UHS 0 8438 lo0 >>>>>>>>> 192.168.50.226 link#1 UHS 0 8338 lo0 >>>>>>>>> 192.168.50.227 link#1 UHS 0 8333 lo0 >>>>>>>>> 192.168.165.0/24 192.168.50.200 UGS 0 3311 >>>>>>>>> bce0 >>>>>>>>> 192.168.166.0/24 192.168.50.200 UGS 0 699 >>>>>>>>> bce0 >>>>>>>>> 192.168.167.0/24 192.168.50.200 UGS 0 3012 >>>>>>>>> bce0 >>>>>>>>> 192.168.168.0/24 192.168.50.200 UGS 0 552 >>>>>>>>> bce0 >>>>>>>>> >>>>>>>>> Internet6: >>>>>>>>> Destination Gateway >>>>>>>>> Flags Netif Expire >>>>>>>>> ::1 ::1 UH >>>>>>>>> lo0 >>>>>>>>> fe80::%lo0/64 link#11 U >>>>>>>>> lo0 >>>>>>>>> fe80::1%lo0 link#11 UHS >>>>>>>>> lo0 >>>>>>>>> ff01::%lo0/32 fe80::1%lo0 U >>>>>>>>> lo0 >>>>>>>>> ff02::%lo0/32 fe80::1%lo0 U >>>>>>>>> lo0 >>>>>>>>> DellT410one# kldstat >>>>>>>>> Id Refs Address Size Name >>>>>>>>> 1 19 0xffffffff80100000 dbf5d0 kernel >>>>>>>>> 2 3 0xffffffff80ec0000 4c358 vboxdrv.ko >>>>>>>>> 3 1 0xffffffff81012000 131998 zfs.ko >>>>>>>>> 4 1 0xffffffff81144000 1ff1 opensolaris.ko >>>>>>>>> 5 2 0xffffffff81146000 2940 vboxnetflt.ko >>>>>>>>> 6 2 0xffffffff81149000 8e38 netgraph.ko >>>>>>>>> 7 1 0xffffffff81152000 153c ng_ether.ko >>>>>>>>> 8 1 0xffffffff81154000 e70 vboxnetadp.ko >>>>>>>>> DellT410one# pciconf -lv >>>>>>>>> .. >>>>>>>>> bce0@pci0:1:0:0: class=0x020000 card=0x028d1028 >>>>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00 >>>>>>>>> vendor = 'Broadcom Corporation' >>>>>>>>> class = network >>>>>>>>> subclass = ethernet >>>>>>>>> bce1@pci0:1:0:1: class=0x020000 card=0x028d1028 >>>>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00 >>>>>>>>> vendor = 'Broadcom Corporation' >>>>>>>>> class = network >>>>>>>>> subclass = ethernet >>>>>>>>> >>>>>>>> >>>>>>>> Could you please provide "pciconf -lvcb" output instead, specific to >>>>>>>> the >>>>>>>> bce chips? Thanks. >>>>>>>> >>>>>>> >>>>>>> Her it is: >>>>>>> >>>>>>> bce0@pci0:1:0:0: class=0x020000 card=0x028d1028 >>>>>>> chip=0x163b14e4 rev=0x20 hdr=0x00 >>>>>>> vendor = 'Broadcom Corporation' >>>>>>> class = network >>>>>>> subclass = ethernet >>>>>>> bar [10] = type Memory, range 64, base 0xda000000, size >>>>>>> 33554432, enabled >>>>>>> cap 01[48] = powerspec 3 supports D0 D3 current D0 >>>>>>> cap 03[50] = VPD >>>>>>> cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message >>>>>>> cap 11[a0] = MSI-X supports 9 messages in map 0x10 >>>>>>> cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x4(x4) >>>>>>> ecap 0003[100] = Serial 1 842b2bfffe6864e4 >>>>>>> ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected >>>>>>> ecap 0004[150] = unknown 1 >>>>>>> ecap 0002[160] = VC 1 max VC0 >>>>>>> >>>>>> >>>>>> Thanks Peter. >>>>>> >>>>>> Adding Yong-Hyeon and David to the discussion, since they've both >>>>>> worked >>>>>> on the bce(4) driver in recent months (most of the changes made >>>>>> recently >>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains >>>>>> em(4). Brief history for the devs: >>>>>> >>>>>> The issue is described "Network memory allocation failures" and was >>>>>> reported last year, but two users recently (Scott and Peter) have >>>>>> reported the issue again: >>>>>> >>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-** >>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.html#58708> >>>>>> >>>>>> And was mentioned again by Scott here, which also contains some >>>>>> technical details: >>>>>> >>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>> July/063172.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.html> >>>>>> >>>>>> What's interesting is that Scott's issue is identical in form but he's >>>>>> using em(4), which isn't known to behave like this. Both individuals >>>>>> are using VirtualBox, though we're not sure at this point if that is >>>>>> the >>>>>> piece which is causing the anomaly. >>>>>> >>>>>> Relevant details of Scott's system (em-based): >>>>>> >>>>>> http://www.cap-press.com/misc/ >>>>>> >>>>>> Relevant details of Peter's system (bce-based): >>>>>> >>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>> July/063221.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.html> >>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-** >>>>>> July/063223.html<http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.html> >>>>>> >>>>>> I think the biggest complexity right now is figuring out how/why scp >>>>>> fails intermittently in this nature. The errno probably "trickles >>>>>> down" >>>>>> to userland from the kernel, but the condition regarding why it >>>>>> happens >>>>>> is unknown. >>>>>> >>>>> >>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a >>>>> jail on that box. >>>>> >>>>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/**messages >>>>> Apr 13 05:17:41 bind named[23534]: internal_send: >>>>> 192.168.50.145#65176: Cannot allocate memory >>>>> Jun 21 23:30:44 bind named[39864]: internal_send: >>>>> 192.168.50.251#36155: Cannot allocate memory >>>>> Jun 24 15:28:00 bind named[39864]: internal_send: >>>>> 192.168.50.251#28651: Cannot allocate memory >>>>> Jun 28 12:57:52 bind named[2462]: internal_send: >>>>> 192.168.165.154#1201: Cannot allocate memory >>>>> >>>>> My initial guess: it happens sooner or later somehow - whether it is >>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a >>>>> lot of traffic over a longer period (a nameserver gets asked again >>>>> and again). >>>>> >>>> >>>> Scott, are you also using jails? If both of you are: is there any >>>> possibility you can remove use of those? I'm not sure how VirtualBox >>>> fits into the picture (jails + VirtualBox that is), but I can imagine >>>> jails having different environmental constraints that might cause this. >>>> >>>> Basically the troubleshooting process here is to remove pieces of the >>>> puzzle until you figure out which piece is causing the issue. I don't >>>> want to get the NIC driver devs all spun up for something that, for >>>> example, might be an issue with the jail implementation. >>>> >>> >>> I understand this. As said, I do some afterhours debugging tonight. >>> >>> The scp/ssh problems are happening _outside_ the jails. The bind runs >>> _inside_ the jail. >>> >>> I wanted to use the _host_ system to send VirtualBox virtual disks and >>> filesystems used by jails to archive them and/or having them available on >>> other FreeBSD systems (as a cold standby solution). >>> >> >> I just switched off the VirtualBox (without removing the kernel modules). >> >> The copy succeeds now. >> >> Well, it could be a VirtualBox related problem, or is the server just >> relieved to have 2GB more memory at hands now? >> >> Do you have a quick idea to "emulate" the 2GB memory load usually >> delivered by VirtualBox? >> > > Well, managed that (using lookbusy) > > Interestingly I could copy a large file (30GB) without problems, as soon as > I switched off the VirtualBox. As said, the kernel modules weren't unloaded, > they are still there. > > The copy crashes seconds after I started the VirtualBox. According to > vmstat and top I had more free memory (ca. 1.5GB) as I had without > VirtualBox and lookbusy (ca. 350MB). > > So, it looks (to me, at least) as I have a VirtualBox related problem, > somehow. > > Any ideas? I am happy to play a bit more to get it sorted although it has > some limits (it is running the company mailserver, after all) > > Regards > Peter > This is it -- I'm seeing the exact same thing. Scp dies reliably with VirtualBox running. Quit VirtualBox and I was able to scp about 30 large files with no errors. Once I started VirtualBox an in-progress scp died within seconds. Ditto that the Kernel modules merely being loaded don't seem to make a difference, it's VirtualBox actually running. virtualbox-ose-3.2.12_1 If anybody has any additional tests to run, outputs to send, etc, I'm glad to muck around. Thanks, Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B30O_O8b8O29rc6BLnnGVTY3cWzpuKQ1q8FTG1idJKM5ykrvA>