Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Jan 2019 21:21:04 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Martin Birgmeier <d8zNeCFG@aon.at>
Cc:        net@freebsd.org
Subject:   Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior
Message-ID:  <20190119204156.D929@besplex.bde.org>
In-Reply-To: <bug-235031-7501-goXNmp3zVl@https.bugs.freebsd.org/bugzilla/>
References:  <bug-235031-7501@https.bugs.freebsd.org/bugzilla/> <bug-235031-7501-goXNmp3zVl@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 18 Jan 2019 a bug that doesn't want replies@freebsd.org wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031
>
> Yes; I just thought it was going to help and wanted to make it permanent right
> away. Bad idea.
>
> In the meantime:
>
> [0]# cat /var/db/ntpd.drift
> -6.596
> [0]#
>
> What can you get from the ntp drift?

I doubt that anything can be got from the ntp drift.  Maybe watching
it for several hours would show that it is wild, but wildness shouldn't
affect nfs throughput much.

I use a couple of fixes for iflib and em, but only the following one is
related to nfs on PRO-1000:

XX Index: em_txrx.c
XX ===================================================================
XX --- em_txrx.c	(revision 343087)
XX +++ em_txrx.c	(working copy)
XX @@ -634,9 +634,20 @@
XX 
XX  		/* Make sure bad packets are discarded */
XX  		if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  			adapter->dropped_pkts++;
XX -			/* XXX fixup if common */
XX  			return (EBADMSG);
XX +#else
XX +			/*
XX +			 * XXX the above error handling is worse than none.
XX +			 * First it it drops 'i' packets before the current
XX +			 * one and doesn't count them.  Then it returns an
XX +			 * error.  iflib can't really handle this error.
XX +			 * It just resets, and this usually drops many more
XX +			 * packets (without counting them) and much time.
XX +			 */
XX +			printf("lem: frame error: ignored\n");
XX +#endif
XX  		}
XX 
XX  		ri->iri_frags[i].irf_flid = 0;
XX @@ -697,8 +708,12 @@
XX 
XX  		/* Make sure bad packets are discarded */
XX  		if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  			adapter->dropped_pkts++;
XX  			return EBADMSG;
XX +#else
XX +			printf("em: frame error: ignored\n");
XX +#endif
XX  		}
XX 
XX  		ri->iri_frags[i].irf_flid = 0;

On my system, the bug fixed by this only occurs rarely, and only on
PRO-1000 (not on I218-V going through the same low-end network switch),
and has only been observed under moderately heavy nfs use with lots
of small RPCs and not many i/o's.  When it occurs, nfs with unpatched
em takes many seconds to recover, but with the patch nfs barely notices
the error.  I use nfs over UDP since TCP is significantly slower due
to higher latency once the network latency is low enough (here it is
51 usec for old PRO-1000 and 80 usec for I218-V, with about 20 usec
in the switch and a lower latency old bge NIC on the other side).  UDP
gives worse error recovery.

Your problem looks more like lost interrupts.  All em NICs should interrupt
at the default interrupt moderation rate of 8 kHz under load.  Once there
are are that many interrupts, there is not much else that can go wrong (nfs
would have to be working to generate that many interrupts).

Bugs in iflib are easy to avoid by running FreeBSD-11.  PRO-1000 is supported
by most versions of FreeBSD and doesn't have the bug fixed by the above in
FreeBSD[7-11].

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190119204156.D929>