From owner-freebsd-net@freebsd.org Mon Aug 10 18:26:01 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 133FE3AC16A for ; Mon, 10 Aug 2020 18:26:01 +0000 (UTC) (envelope-from b@sashk.xyz) Received: from forward102j.mail.yandex.net (forward102j.mail.yandex.net [IPv6:2a02:6b8:0:801:2::102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BQPY42TNNz4fNM for ; Mon, 10 Aug 2020 18:26:00 +0000 (UTC) (envelope-from b@sashk.xyz) Received: from mxback1o.mail.yandex.net (mxback1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::1b]) by forward102j.mail.yandex.net (Yandex) with ESMTP id B3790F2069F for ; Mon, 10 Aug 2020 21:25:56 +0300 (MSK) Received: from sas1-e00c2743cdb8.qloud-c.yandex.net (sas1-e00c2743cdb8.qloud-c.yandex.net [2a02:6b8:c14:3a22:0:640:e00c:2743]) by mxback1o.mail.yandex.net (mxback/Yandex) with ESMTP id OKvCygdUfY-PuV8VBjN; Mon, 10 Aug 2020 21:25:56 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sashk.xyz; s=mail; t=1597083956; bh=w7946K0ohoi8p4kIsQGMQ8LhVjui44jTeFNwMQnsOjE=; h=Subject:From:To:Date:Message-ID; b=VfdiYPZpHcerEq3HvCKRgBsqK2ihCnBbXwibv0nrxuC/tK8tu/kggQLk80LSFRvlo clisX80S53ElTagV9TgK6O+dOR4JwmniQsAQAbCA4BZi6Hi1CHk3DLcLGZ+cVX47Sq KpZuPQZn8BCdRwfRJV1cc1Pu/YMOW4OgBZWiQAcM= Received: by sas1-e00c2743cdb8.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id cvZ0yfzTkE-PtmKqDGK; Mon, 10 Aug 2020 21:25:55 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) To: freebsd-net@freebsd.org From: sashk Subject: sfxge, lagg, cannot flush Tx/Rx queue and disconnects Message-ID: <507ab7cd-b9e6-a6c6-1d8f-b9cc1fd2898d@sashk.xyz> Date: Mon, 10 Aug 2020 14:25:48 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Rspamd-Queue-Id: 4BQPY42TNNz4fNM X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=sashk.xyz header.s=mail header.b=VfdiYPZp; dmarc=none; spf=pass (mx1.freebsd.org: domain of b@sashk.xyz designates 2a02:6b8:0:801:2::102 as permitted sender) smtp.mailfrom=b@sashk.xyz X-Spamd-Result: default: False [-2.89 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.06)[-1.063]; R_DKIM_ALLOW(-0.20)[sashk.xyz:s=mail]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a02:6b8:0::/52]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.97)[-0.967]; RCVD_COUNT_THREE(0.00)[3]; DMARC_NA(0.00)[sashk.xyz]; DKIM_TRACE(0.00)[sashk.xyz:+]; NEURAL_HAM_SHORT(-0.36)[-0.362]; RCVD_IN_DNSWL_NONE(0.00)[2a02:6b8:0:801:2::102:from]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:13238, ipnet:2a02:6b8::/32, country:RU]; MID_RHS_MATCH_FROM(0.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Aug 2020 18:26:01 -0000 Hi, Apologies, first email went out as html letter. Re-sending as plain text. I have a FreeBSD 12.1 system which has Solarflare SFN8522 network controller. Everything works perfectly fine, until at some point I loose connectivity to the server: it will stop responding to pings for some time, then will start and will continue for a long time. lagg0 configured like this in the /etc/rc.conf: ifconfig_sfxge0="up mtu 9000" ifconfig_sfxge1="up mtu 9000" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto failover laggport sfxge0 laggport sfxge1 xxx.xxx.xxx.xxx/24" Output of the pciconf -lv: sfxge0@pci0:133:0:0: class=0x020000 card=0x80171924 chip=0x0a031924 rev=0x02 hdr=0x00 vendor = 'Solarflare Communications' device = 'SFC9220 10/40G Ethernet Controller' class = network subclass = ethernet sfxge1@pci0:133:0:1: class=0x020000 card=0x80171924 chip=0x0a031924 rev=0x02 hdr=0x00 vendor = 'Solarflare Communications' device = 'SFC9220 10/40G Ethernet Controller' class = network subclass = ethernet The simplest fix is to reboot server and everything works as before, but this isn't the best option. When I tried to restart networking, during one of the troubleshooting session, (/etc/rc.d/netif restart) the process got stuck and I saw several message in the logs kernel: sfxge0: Cannot flush Tx queue 23 kernel: sfxge0: Cannot flush Tx queue 15 kernel: sfxge0: Cannot flush Rx queue 23 kernel: sfxge0: Cannot flush Rx queue 15 I don't have access to switch to see what's going on, but from what I hear they don't see anything suspicious, which rolling out switch issue. The latest step in troubleshoot is to disable tso4, tso6 and LRO by running ifconfig sfxge0 -tso4 -tso6 -lro Not sure if that helped yet. Any help would be appreciated. Thanks!