Date: Sat, 16 Sep 2017 19:41:18 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Alexander Leidinger <Alexander@leidinger.net> Cc: Bruce Evans <brde@optusnet.com.au>, Scott Long <scottl@samsco.org>, Sean Bruno <sbruno@freebsd.org>, Stephen Hurd <shurd@freebsd.org>, Cy Schubert <Cy.Schubert@komquats.com>, Ngie Cooper <yaneurabeya@gmail.com>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r323516 - in head/sys: dev/bnxt dev/e1000 kern net sys Message-ID: <20170916192800.E14782@besplex.bde.org> In-Reply-To: <20170916110159.Horde.lN4uQjj9fb7hJ2309eQrexb@webmail.leidinger.net> References: <201709130711.v8D7BlTS003204@slippy.cwsent.com> <48654d1f-4cc7-da05-7a73-ef538b431560@freebsd.org> <1EBD0641-002D-409C-B18E-AAB5FCDECEBA@samsco.org> <20170916124826.P1107@besplex.bde.org> <20170916110159.Horde.lN4uQjj9fb7hJ2309eQrexb@webmail.leidinger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 16 Sep 2017, Alexander Leidinger wrote: > Quoting Bruce Evans <brde@optusnet.com.au> (from Sat, 16 Sep 2017 13:46:37 > +1000 (EST)): > >> It gives lesser breakage here: >> - with an old PCI em, an error that occur every few makeworlds over nfs now >> hang the hardware. It used to be recovered from afger about 10 seconds. >> This only happened once. I then applied my old fix which ignores the >> error better so as to recover from it immediately. This seems to work as >> before. > > As I also have an em device which switches into non-working state: what's the > patch you have for this? I would like to see if your change also helps my > device to get back into working shape again. X Index: em_txrx.c X =================================================================== X --- em_txrx.c (revision 323636) X +++ em_txrx.c (working copy) X @@ -640,9 +640,20 @@ X X /* Make sure bad packets are discarded */ X if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) { X +#if 0 X adapter->dropped_pkts++; X - /* XXX fixup if common */ X return (EBADMSG); X +#else X + /* X + * XXX the above error handling is worse than none. X + * First it it drops 'i' packets before the current X + * one and doesn't count them. Then it returns an X + * error. iflib can't really handle this error. X + * It just resets, and this usually drops many more X + * packets (without counting them) and much time. X + */ X + printf("lem: frame error: ignored\n"); X +#endif X } X X ri->iri_frags[i].irf_flid = 0; This is for old em. nfs doesn't seem to notice the dropped packet(s) after this. I think the comment "fixup if common" means "this error should actually be handled if it occurs enough to matter". I removed the increment of the dropped packet count because with the change none are dropped directly here. I think the error is just for this packet but more than 1 packet might be dropped by returning in the old code, but debugging code seem to show no more than 1 packet at a time having an error. I think returning drops good packets after the bad one together with leaving the state inconsistent, and it takes almost a reset to recover. X @@ -703,8 +714,12 @@ X X /* Make sure bad packets are discarded */ X if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) { X +#if 0 X adapter->dropped_pkts++; X return EBADMSG; X +#else X + printf("em: frame error: ignored\n"); X +#endif X } X X ri->iri_frags[i].irf_flid = 0; This is for newer em. I haven't noticed any problems with that (except it has 27 usec higher latency). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170916192800.E14782>