From owner-freebsd-net@freebsd.org Thu Feb 15 02:12:28 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 740D1F13690 for ; Thu, 15 Feb 2018 02:12:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0B0597B5A4 for ; Thu, 15 Feb 2018 02:12:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 3DC8A1A692 for ; Thu, 15 Feb 2018 02:12:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w1F2CRrk068606 for ; Thu, 15 Feb 2018 02:12:27 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w1F2CRCR068604 for freebsd-net@FreeBSD.org; Thu, 15 Feb 2018 02:12:27 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 221919] ixl: TX queue hang when using TSO and having a high and mixed network load Date: Thu, 15 Feb 2018 02:12:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.1-STABLE X-Bugzilla-Keywords: IntelNetworking X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jason@tubnor.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: erj@freebsd.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Feb 2018 02:12:28 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D221919 Jason Tubnor changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jason@tubnor.net --- Comment #13 from Jason Tubnor --- I am also seeing this on our Lenovo SR650 7x06 servers. We too are using 1= 0GbE XL710 cards: Intel(R) Ethernet Controller X710 for 10GbE SFP+ # pciconf -l | grep ixl ixl0@pci0:10:0:0: class=3D0x020000 card=3D0x402117aa chip=3D0x37d1808= 6 rev=3D0x09 hdr=3D0x00 ixl1@pci0:10:0:1: class=3D0x020000 card=3D0x402117aa chip=3D0x37d1808= 6 rev=3D0x09 hdr=3D0x00 ixl2@pci0:10:0:2: class=3D0x020000 card=3D0x402117aa chip=3D0x37d1808= 6 rev=3D0x09 hdr=3D0x00 ixl3@pci0:10:0:3: class=3D0x020000 card=3D0x402117aa chip=3D0x37d1808= 6 rev=3D0x09 hdr=3D0x00 ixl4@pci0:174:0:0: class=3D0x020000 card=3D0x000a8086 chip=3D0x1572808= 6 rev=3D0x01 hdr=3D0x00 ixl5@pci0:174:0:1: class=3D0x020000 card=3D0x00008086 chip=3D0x1572808= 6 rev=3D0x01 hdr=3D0x00 snip from /var/log/messages: Feb 15 09:50:53 server01 kernel: ixl5: Malicious Driver Detection event 2 o= n TX queue 769, pf number 1 Feb 15 09:50:53 server01 kernel: ixl5: MDD TX event is for this function! Feb 15 09:50:54 server01 kernel: ixl5: WARNING: queue 0 appears to be hung! Feb 15 09:50:54 server01 kernel: ixl5: WARNING: Resetting! Feb 15 09:50:57 server01 kernel: WARNING: 192.168.1.14 (iqn.1998-01.com.vmware:HOST-00000000): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 15 09:51:25 server01 kernel: ixl5: Malicious Driver Detection event 2 o= n TX queue 775, pf number 1 Feb 15 09:51:25 server01 kernel: ixl5: MDD TX event is for this function! Feb 15 09:51:29 server01 kernel: WARNING: 192.168.1.14 (iqn.1998-01.com.vmware:HOST-00000000): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 15 09:51:53 server01 kernel: ixl5: WARNING: queue 7 appears to be hung! Feb 15 09:51:53 server01 kernel: ixl5: WARNING: Resetting! Feb 15 09:51:55 server01 kernel: ixl5: Malicious Driver Detection event 2 o= n TX queue 768, pf number 1 Feb 15 09:51:55 server01 kernel: ixl5: MDD TX event is for this function! This is easily able to be reproduced when hooking 10GbE VMWare ESXi hosts u= p to these storage servers via iSCSI. We could trigger it by performing a vMoti= on move from one datastore to another. I do not have a test server that I can test any patches on as 3 of these ex= ist in production running 11.1-RELEASE and cannot afford to have them off-line = or deviate away from the standard supported freebsd-update mechanism. I hope something can be worked out pretty soon and rolled into update as th= is issue for us can't wait for 11.2 or 12. I will be trying out -tso, but was trying to avoid that for performance reasons. Thanks! --=20 You are receiving this mail because: You are on the CC list for the bug.=