From owner-freebsd-net@FreeBSD.ORG Fri Jun 12 12:43:18 2015 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DE0A1AB4 for ; Fri, 12 Jun 2015 12:43:18 +0000 (UTC) (envelope-from csforgeron@gmail.com) Received: from mail-qc0-x230.google.com (mail-qc0-x230.google.com [IPv6:2607:f8b0:400d:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8C1951B79 for ; Fri, 12 Jun 2015 12:43:18 +0000 (UTC) (envelope-from csforgeron@gmail.com) Received: by qcbfb9 with SMTP id fb9so1194760qcb.1 for ; Fri, 12 Jun 2015 05:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=vF993GRfKvi+XxsPokrlglAZLs4NxgF06qXM6V74eL4=; b=rR7HlmhWLl2iJKWISrFMosGxfOL35PVgOn8TJQH2nTZ7UOunEJLHDSLK+pQIgxZaWE aOWdktAnJXQt5ZCwUKpZKbuPvaNwJj4eqVAKMtQ7cSVOHJOXr/4Vy/lKm7vGjmCdMHrD qehJxGClk6xywu+n/CDy0Ve6Dg5ctV6wiViqHJtknmNlQ3hqyItlusHUgbd82TwCjuXE PoulRzciTRfqLS0ygvGVry0jXcIdX+JmLBzDkdgEaTncuhDdinO5DMmPV1kpPMqOTdVd YlsqnSXpP1O13e9TSvWOnmLfmflyNUmd4hXhCXatnOPhdhE2wUhmslZaaGlpnaJPjsmh 7lUw== MIME-Version: 1.0 X-Received: by 10.140.151.130 with SMTP id 124mr19252317qhx.18.1434112997487; Fri, 12 Jun 2015 05:43:17 -0700 (PDT) Received: by 10.96.76.104 with HTTP; Fri, 12 Jun 2015 05:43:17 -0700 (PDT) In-Reply-To: <557AD2FA.103@field.hu> References: <374339249.53058039.1433681874571.JavaMail.root@uoguelph.ca> <55744F28.5000402@field.hu> <557AB1BB.60502@field.hu> <557AD10D.5070205@field.hu> <557AD2FA.103@field.hu> Date: Fri, 12 Jun 2015 09:43:17 -0300 Message-ID: Subject: Re: FreeBSD 10.1-REL - network unaccessible after high traffic From: Christopher Forgeron To: Cs Cc: FreeBSD Net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Jun 2015 12:43:19 -0000 Ah, but the 'why' will come later, after we know for sure what the 'what' is in your problem. I'm just pointing out the problems that I'm having, as yours sound similar. Once the box runs out of memory, all sorts of interesting things can happen. Perhaps that's not your case, but it's quite possible. Setup a remote terminal, do the copy again, and send in the last few lines of 'vmstat 5' after it's locked up, perhaps I can help. On Fri, Jun 12, 2015 at 9:39 AM, Cs wrote: > but why is that machine runs fine except the network if it's memory > related? swap didn't increased before the network outage. > > > 2015.06.12. 14:37 keltez=C3=A9ssel, Christopher Forgeron =C3=ADrta: > > rsycn burns memory - I'd say you have a good chance you're running out of > mem before it's replenished. > > For vmstat 5 - Don't run it on console. Connect via a second box with > ssh, and run it there - That way it's the last thing on the ssh terminal > screen when the box dies, and you'll have your proof. > > On Fri, Jun 12, 2015 at 9:31 AM, Cs wrote: > >> machine has been restarted before I could check the "vmstat 5" output. >> Yep, it's rsync. Anyway I disabled the backup transfer it'll solve, but = I >> can't really accept this for solution. >> >> >> 2015.06.12. 14:29 keltez=C3=A9ssel, Christopher Forgeron =C3=ADrta: >> >>> Well, even at low speed it could drop due to memory from what I've seen= . >>> >>> What was the last line from vmstat 5 before it locked up? >>> >>> I find that the em driver isn't crap, but there is a deeper problem >>> inside >>> of FreeBSD that is being exposed now - For me it's due to faster networ= k >>> connections. >>> >>> Are you using rsync to move the files? >>> >>> On Fri, Jun 12, 2015 at 7:17 AM, Cs wrote: >>> >>> it seems it's not memory related. Server just died a few minutes ago >>>> during transporting the backup (400GB) around 800Mbps speed.. >>>> will disable remote backup, it's a shame that em driver is such a crap= . >>>> >>>> >>>> 2015.06.08. 5:01 keltez=C3=A9ssel, Christopher Forgeron =C3=ADrta: >>>> >>>> You know what helped me: >>>>> >>>>> 'vmstat 5' >>>>> >>>>> Leave that running. If the last thing on the console after a >>>>> crash/hang is >>>>> vmstat showing 8k of memory left, then you're in the same problem-par= k >>>>> as >>>>> me. >>>>> >>>>> My 10.1 96GiB RAM box is chewing ~8 GiB of RAM in less than 5 seconds= , >>>>> and >>>>> then crashing/panicking/hanging. >>>>> >>>>> There's others with this issues if you search for it; a sysctl >>>>> to vm.v_free_min to double or triple that value may help, but first >>>>> let us >>>>> know if that's what is bonking your sever. >>>>> >>>>> >>>>> >>>>> On Sun, Jun 7, 2015 at 11:03 AM, Cs wrote: >>>>> >>>>> ok, just lowered it to 1500 but please also note that it was on 150= 0 >>>>> for >>>>> >>>>>> 2 >>>>>> years >>>>>> >>>>>> 2015.06.07. 14 <2015.06.07.%2014>:57 keltez=C3=A9ssel, Rick Macklem = =C3=ADrta: >>>>>> >>>>>> Since disabling TSO didn't help, you could try dropping to 1500mtu >>>>>> >>>>>>> on both interfaces. Some people run into problems when 9K jumbo >>>>>>> clusters >>>>>>> fragment the kernel address space used to allocate mbufs. >>>>>>> >>>>>>> Good luck with it, rick >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>>> It worked fine for two weeks but I had a network outage 2 days ago >>>>>>>> then >>>>>>>> today. Tried to disable rxcsum and txcsum after the first one, >>>>>>>> didn't >>>>>>>> help. Don't know what else to do it's a shame that I can't use thi= s >>>>>>>> card >>>>>>>> with fbsd i REALLY don't want to install linux instead but my >>>>>>>> production >>>>>>>> servers outages are not welcomed by the customers.. >>>>>>>> >>>>>>>> 2015.05.26. 10 <2015.05.26.%2010>:36 keltez=C3=A9ssel, Cs =C3=ADrt= a: >>>>>>>> >>>>>>>> Thanks Mark, good idea. I found this thread which is exactly the >>>>>>>> >>>>>>>>> same >>>>>>>>> problem as mine: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> https://forums.freebsd.org/threads/workaround-freebsd-10-1-sudden= -network-down.49264/ >>>>>>>>> >>>>>>>>> Will see if it helps in a couple weeks. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Csaba >>>>>>>>> >>>>>>>>> 2015.05.26. 10 <2015.05.26.%2010>:30 keltez=C3=A9ssel, Mark Schou= ten >>>>>>>>> =C3=ADrta: >>>>>>>>> >>>>>>>>> Oh, didn't see your lowest remark. Then, the next thing that >>>>>>>>> comes >>>>>>>>> >>>>>>>>>> past here a few times per week is 'Try disabling TSO'. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Met vriendelijke groeten, >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ >>>>>>>>>> Mark Schouten | Tuxis Internet Engineering >>>>>>>>>> KvK: 61527076 | http://www.tuxis.nl/ >>>>>>>>>> T: 0318 200208 | info@tuxis.nl >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Van: Cs >>>>>>>>>> Aan: Mark Schouten >>>>>>>>>> Cc: >>>>>>>>>> Verzonden: 25-5-2015 11:12 >>>>>>>>>> Onderwerp: Re: FreeBSD 10.1-REL - network unaccessible >>>>>>>>>> after >>>>>>>>>> high >>>>>>>>>> traffic >>>>>>>>>> >>>>>>>>>> It was on 1500 for ~3 years :) >>>>>>>>>> Regards, >>>>>>>>>> Csaba >>>>>>>>>> On May 25, 2015, 10:30, at 10:30, Mark Schouten >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Try lowering your mtu to 1500, that worked miracles for me.. >>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Mark Schouten >>>>>>>>>>> Tuxis Internet Engineering >>>>>>>>>>> mark@tuxis.nl / 0318 200208 >>>>>>>>>>> >>>>>>>>>>> On 25 May 2015, at 09:36, "Cs" wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>>> I have two FreeBSd 10.1-RELEASE servers connected to each >>>>>>>>>>>> other. >>>>>>>>>>>> They >>>>>>>>>>>> >>>>>>>>>>>> were connected via cross link, but they are connected to a >>>>>>>>>>>> cisco >>>>>>>>>>>> >>>>>>>>>>> switch >>>>>>>>>>> now (the problem was the same with cross link too). When >>>>>>>>>>> transferring >>>>>>>>>>> huge files (50-500GB backup files) via Gigabit (it is important= !) >>>>>>>>>>> the >>>>>>>>>>> network randomly dies. The backup runs every day/week and >>>>>>>>>>> sometimes the >>>>>>>>>>> connection is ok for months sometimes it happens twice a week. >>>>>>>>>>> When the >>>>>>>>>>> network dies I can log in to the server via IPMI and use the >>>>>>>>>>> console >>>>>>>>>>> everything is OK, but can't send anything out on the network. >>>>>>>>>>> ifconfig >>>>>>>>>>> em0 down/up doesn't help nor netif restart. The problem never >>>>>>>>>>> occured >>>>>>>>>>> when I used 100Mbit connection between them, but it was 3com NI= C >>>>>>>>>>> (xl), >>>>>>>>>>> gigabit adapter is Intel (em0). When I limit the transfer rate >>>>>>>>>>> (rsync >>>>>>>>>>> bandwith limit or ipfw pipe) the problem is much more rare. >>>>>>>>>>> >>>>>>>>>>> I tried to set these tuning parameters on both servers wi= th >>>>>>>>>>> >>>>>>>>>>>> different >>>>>>>>>>>> >>>>>>>>>>>> buffer size but nothing helped: >>>>>>>>>>>> >>>>>>>>>>> # cat /etc/sysctl.conf >>>>>>>>>>> >>>>>>>>>>>> security.bsd.see_other_uids=3D0 >>>>>>>>>>>> net.inet.tcp.recvspace=3D512000 >>>>>>>>>>>> net.route.netisr_maxqlen=3D2048 >>>>>>>>>>>> kern.ipc.nmbclusters=3D1310720 >>>>>>>>>>>> net.inet.tcp.sendbuf_max=3D16777216 >>>>>>>>>>>> net.inet.tcp.recvbuf_max=3D16777216 >>>>>>>>>>>> kern.ipc.soacceptqueue=3D32768 >>>>>>>>>>>> # cat /boot/loader.conf >>>>>>>>>>>> geom_mirror_load=3D"YES" # RAID1 disk driver (see gmirror(8)) >>>>>>>>>>>> ipfw_load=3D"YES" >>>>>>>>>>>> net.inet.ip.fw.default_to_accept=3D1 >>>>>>>>>>>> kern.maxusers=3D4096 >>>>>>>>>>>> accf_data_load=3D"YES" >>>>>>>>>>>> The duplex settings are identical on both servers. >>>>>>>>>>>> Server A: >>>>>>>>>>>> em1: flags=3D8843 metr= ic 0 >>>>>>>>>>>> mtu >>>>>>>>>>>> >>>>>>>>>>>> 9000 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> options=3D4219b >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ether 00:25:90:24:52:66 >>>>>>>>>>> >>>>>>>>>>> inet x.x.x.x netmask 0xfffffe00 broadcast x.x.x.x >>>>>>>>>>>> nd6 options=3D29 >>>>>>>>>>>> media: Ethernet autoselect (1000baseT = ) >>>>>>>>>>>> status: active >>>>>>>>>>>> Server B: >>>>>>>>>>>> em0: flags=3D8843 metr= ic 0 >>>>>>>>>>>> mtu >>>>>>>>>>>> >>>>>>>>>>>> 9000 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> options=3D4219b >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ether 00:30:48:dd:fe:3e >>>>>>>>>>> >>>>>>>>>>> inet x.x.x.x netmask 0xfffffe00 broadcast x.x.x.x >>>>>>>>>>>> nd6 options=3D29 >>>>>>>>>>>> media: Ethernet autoselect (1000baseT = ) >>>>>>>>>>>> status: active >>>>>>>>>>>> Today I tried to set mtu to 9000 but in tcpdump I see tha= t >>>>>>>>>>>> during >>>>>>>>>>>> scp >>>>>>>>>>>> >>>>>>>>>>>> it is still 1500: >>>>>>>>>>>> >>>>>>>>>>> x.x.x.x.222 > x.x.x.x.37612: Flags [.], cksum 0xb6ee >>>>>>>>>>> >>>>>>>>>>>> (incorrect -> >>>>>>>>>>>> >>>>>>>>>>>> 0xda6f), seq 35749, ack 113701596, win 7986, options >>>>>>>>>>>> [nop,nop,TS >>>>>>>>>>>> >>>>>>>>>>> val >>>>>>>>>>> 3103966325 ecr 853712893], length 0 >>>>>>>>>>> >>>>>>>>>>> 09:27:33.912354 IP (tos 0x8, ttl 64, id 1028, offset 0, flags >>>>>>>>>>> >>>>>>>>>>>> [DF], >>>>>>>>>>>> >>>>>>>>>>>> proto TCP (6), length 1500) >>>>>>>>>>>> >>>>>>>>>>> 09:27:33.912358 IP (tos 0x8, ttl 64, id 1029, offset 0, flags >>>>>>>>>>> >>>>>>>>>>>> [DF], >>>>>>>>>>>> >>>>>>>>>>>> proto TCP (6), length 1500) >>>>>>>>>>>> >>>>>>>>>>> Any ideas? Thanks guys! >>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> freebsd-net@freebsd.org mailing list >>>>>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>>>> >>>>>>>>>>>> "freebsd-net-unsubscribe@freebsd.org" >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> >>>>>>>>>> freebsd-net@freebsd.org mailing list >>>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>>>>>> To unsubscribe, send any mail to >>>>>>>>>> "freebsd-net-unsubscribe@freebsd.org" >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> >>>>>>>>>> freebsd-net@freebsd.org mailing list >>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>>>>> To unsubscribe, send any mail to >>>>>>>>> "freebsd-net-unsubscribe@freebsd.org" >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>>> freebsd-net@freebsd.org mailing list >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>>>> To unsubscribe, send any mail to >>>>>>>> "freebsd-net-unsubscribe@freebsd.org" >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> >>>>>>> freebsd-net@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or= g >>>>>> " >>>>>> >>>>>> _______________________________________________ >>>>>> >>>>> freebsd-net@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " >>>>> >>>>> _______________________________________________ >>>> freebsd-net@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>>> >>>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> > > >