From owner-svn-src-head@freebsd.org Sat Sep 16 09:41:30 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 22EA7E0B251; Sat, 16 Sep 2017 09:41:30 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id AB56A6FE88; Sat, 16 Sep 2017 09:41:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 0DD8FD44E1C; Sat, 16 Sep 2017 19:41:18 +1000 (AEST) Date: Sat, 16 Sep 2017 19:41:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Alexander Leidinger cc: Bruce Evans , Scott Long , Sean Bruno , Stephen Hurd , Cy Schubert , Ngie Cooper , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r323516 - in head/sys: dev/bnxt dev/e1000 kern net sys In-Reply-To: <20170916110159.Horde.lN4uQjj9fb7hJ2309eQrexb@webmail.leidinger.net> Message-ID: <20170916192800.E14782@besplex.bde.org> References: <201709130711.v8D7BlTS003204@slippy.cwsent.com> <48654d1f-4cc7-da05-7a73-ef538b431560@freebsd.org> <1EBD0641-002D-409C-B18E-AAB5FCDECEBA@samsco.org> <20170916124826.P1107@besplex.bde.org> <20170916110159.Horde.lN4uQjj9fb7hJ2309eQrexb@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=MlaUAiCI1E6xVCer72cA:9 a=CjuIK1q_8ugA:10 X-Mailman-Approved-At: Sat, 16 Sep 2017 10:17:23 +0000 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Sep 2017 09:41:30 -0000 On Sat, 16 Sep 2017, Alexander Leidinger wrote: > Quoting Bruce Evans (from Sat, 16 Sep 2017 13:46:37 > +1000 (EST)): > >> It gives lesser breakage here: >> - with an old PCI em, an error that occur every few makeworlds over nfs now >> hang the hardware. It used to be recovered from afger about 10 seconds. >> This only happened once. I then applied my old fix which ignores the >> error better so as to recover from it immediately. This seems to work as >> before. > > As I also have an em device which switches into non-working state: what's the > patch you have for this? I would like to see if your change also helps my > device to get back into working shape again. X Index: em_txrx.c X =================================================================== X --- em_txrx.c (revision 323636) X +++ em_txrx.c (working copy) X @@ -640,9 +640,20 @@ X X /* Make sure bad packets are discarded */ X if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) { X +#if 0 X adapter->dropped_pkts++; X - /* XXX fixup if common */ X return (EBADMSG); X +#else X + /* X + * XXX the above error handling is worse than none. X + * First it it drops 'i' packets before the current X + * one and doesn't count them. Then it returns an X + * error. iflib can't really handle this error. X + * It just resets, and this usually drops many more X + * packets (without counting them) and much time. X + */ X + printf("lem: frame error: ignored\n"); X +#endif X } X X ri->iri_frags[i].irf_flid = 0; This is for old em. nfs doesn't seem to notice the dropped packet(s) after this. I think the comment "fixup if common" means "this error should actually be handled if it occurs enough to matter". I removed the increment of the dropped packet count because with the change none are dropped directly here. I think the error is just for this packet but more than 1 packet might be dropped by returning in the old code, but debugging code seem to show no more than 1 packet at a time having an error. I think returning drops good packets after the bad one together with leaving the state inconsistent, and it takes almost a reset to recover. X @@ -703,8 +714,12 @@ X X /* Make sure bad packets are discarded */ X if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) { X +#if 0 X adapter->dropped_pkts++; X return EBADMSG; X +#else X + printf("em: frame error: ignored\n"); X +#endif X } X X ri->iri_frags[i].irf_flid = 0; This is for newer em. I haven't noticed any problems with that (except it has 27 usec higher latency). Bruce