From owner-freebsd-net@freebsd.org Sun Mar 25 15:14:10 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B36FF4F06B for ; Sun, 25 Mar 2018 15:14:10 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EADC469414 for ; Sun, 25 Mar 2018 15:14:09 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221] (may be forged)) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id w2PFDtmq028588 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 25 Mar 2018 17:13:55 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: supportsobaka@mail.ru Received: from [10.58.0.4] ([10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id w2PFDj72039400 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sun, 25 Mar 2018 22:13:45 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: FreeBSD 11.1 with Intel(R) PRO/1000 unresponsive to all network interfaces from outside (seems, during idle), but immediately up when ping anything from inside the server To: supportsobaka@mail.ru, freebsd-net@freebsd.org, freebsdnic@mailbox.intel.com References: <1521974300.464728846@f345.i.mail.ru> From: Eugene Grosbein Message-ID: <5AB7BCA5.2020201@grosbein.net> Date: Sun, 25 Mar 2018 22:13:41 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <1521974300.464728846@f345.i.mail.ru> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=2.2 required=5.0 tests=BAYES_00, LOCAL_FROM, RDNS_NONE autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 1.9 RDNS_NONE Delivered to internal network by a host with no rDNS * 2.6 LOCAL_FROM From my domains X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Mar 2018 15:14:10 -0000 25.03.2018 17:38, supportsobaka--- via freebsd-net wrote: > Hello guys, > Need help with issues I never met before for my long experience with FreeBSD. > > A new server in remote DC based of Intel S1200RP with FreeBSD 11.1 uses igb driver for Intel(R) PRO/1000. There is no any load or traffic yet, I'm just configuring it, so I believe that my Kitty (Putty) session is the only one that makes traffic. I lost connection to the server dozens times during last week. > > I never lost connection when I was doing something on the server via remote Kitty terminal, but it was always when I return back to Kitty after some idle. Then, I kicked out of terminal and server doesn't response to pings or 'telnet port' from anywhere. > > The server has IPMI (and so KVM) and I now can see that the server is live and network interface is up. No messages in dmesg when this happens. The network goes up (i.e. pings go trough from outside) immediately after I ping something from inside the server (via IPMI's KVM access) or immediately after I execute netstat -r. > > I now run GENERIC to exclude any issue with my own kernel. > > The problem is 100% repeatable right now while I'm writing this: > > 1) leave Kitty terminal for a period of time (about 10 minutes enough) > 2) come back to terminal, start typing, got kicked off, ping - no response > 3) login to server via KVM (I'm already logged in) and ping any URL from there > 4) server is responsive again > > I run continuous ping to this server last nigh and it never dropped. It looks to me like Intel card goes to some sleep mode during idle (when no traffic comes to the server at all, except Kitty's keep-alive perhaps). > > This is my first experience with FreeBSD 11.1 and ZFS (include root from ZFS). All my previous servers are on FreeBSD 9 and UFS, but not the first with Intel cards. Not sure if filesystem matter in this issue. > > I tried some things described here https://forums.freebsd.org/threads/workaround-freebsd-10-1-sudden-network-down.49264/ - it doesn't help. > > What else information do you need to debug this? It might be that network of your DC provider has famous bug: sometimes its MAC/ARP cache expires MAC address of your machine and does not re-ask it using ARP protocol nor delivers a packet to the server. When you run ping or netstat -r you make some outgoing traffic (ICMP for ping and DNS for netstat) so you forcibly re-fill MAC/ARP caches of DC provider and now things come to normal for some time. There is an easy way to check if this is the case. You can change sysctl net.link.ether.inet.max_age parameter to some low value like 60 (seconds), so your own ARP cache for gateway's MAC address would expire often producing outgoing ARP request that re-fills caches of DC provider too, before it expires. If this helps - use it as workaround and bug DC provider.