From owner-freebsd-stable@freebsd.org Mon Jul 27 05:00:43 2020 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 79D05376834 for ; Mon, 27 Jul 2020 05:00:43 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [IPv6:2a01:4f8:c2c:26d8::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BFSLL1hDpz4TDC for ; Mon, 27 Jul 2020 05:00:41 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (eg.sd.rdtc.ru [IPv6:2a03:3100:c:13:0:0:0:5]) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id 06R50FSp094828 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 27 Jul 2020 05:00:18 GMT (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: jclarke@marcuscom.com Received: from [10.58.0.10] (dadvw [10.58.0.10]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id 06R50I6x014172 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Mon, 27 Jul 2020 12:00:18 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: Traffic "corruption" in 12-stable To: Joe Clarke , freebsd-stable@freebsd.org References: <9FAE54DE-F409-4A53-B91E-59AE52A86513@marcuscom.com> From: Eugene Grosbein Message-ID: <9d6dc414-2866-e6c8-6b66-22af23efc728@grosbein.net> Date: Mon, 27 Jul 2020 12:00:11 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <9FAE54DE-F409-4A53-B91E-59AE52A86513@marcuscom.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,LOCAL_FROM, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record * -0.0 SPF_PASS SPF: sender matches SPF record * 2.6 LOCAL_FROM From my domains * -1.0 NICE_REPLY_A Looks like a legit reply (A) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on hz.grosbein.net X-Rspamd-Queue-Id: 4BFSLL1hDpz4TDC X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=permerror (mx1.freebsd.org: domain of eugen@grosbein.net uses mechanism not recognized by this client) smtp.mailfrom=eugen@grosbein.net X-Spamd-Result: default: False [-1.21 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.39)[-0.388]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-0.50)[-0.496]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[grosbein.net]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; R_SPF_PERMFAIL(0.00)[empty SPF record]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.22)[-0.222]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jul 2020 05:00:43 -0000 27.07.2020 5:16, Joe Clarke wrote: > About two weeks ago, I upgraded from the latest 11-stable to the latest 12-stable. After that, I periodically see the network throughput come to a near standstill. This FreeBSD machine is an ESXi VM with two interfaces. It acts as a router. It uses vmxnet3 interfaces for both LAN and WAN. It runs ipfw with in-kernel NAT. The LAN side uses a bridge with vmx0 and a tap0 L2 VPN interface. My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses the default 1500. > > Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping times), I know the problem has occurred because my lldpd reports: > > Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0 > > And if I turn on ipfw verbose messages, I see tons of: > > Jul 26 16:02:23 namale kernel: ipfw: pullup failed > > This leads to me to believe packets are being corrupted on ingress. I’ve applied all the recent iflib changes, but the problem persists. What causes it, I don’t know. > > The only thing that changed (and yes, it’s a big one) is I upgraded to 12-stable. Meaning, the rest of the network infra and topology has remained the same. This did not happen at all in 11-stable. > > I’m open to suggestions. First, try: ifconfig $ifname -rxcsum -txcsum