Date: Sun, 20 Jan 2019 09:15:27 +0100 From: Martin Birgmeier <d8zNeCFG@aon.at> To: Bruce Evans <brde@optusnet.com.au> Cc: Eugene Grosbein <eugen@grosbein.net>, net@freebsd.org Subject: Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior Message-ID: <d67d78e4-8817-11b0-dc6f-9b8b09f7c298@aon.at> In-Reply-To: <20190120163533.K1342@besplex.bde.org> References: <bug-235031-7501@https.bugs.freebsd.org/bugzilla/> <bug-235031-7501-goXNmp3zVl@https.bugs.freebsd.org/bugzilla/> <20190119204156.D929@besplex.bde.org> <3e407ee7-54e3-a6ac-5535-d11aceca9558@grosbein.net> <20190120061258.X3312@besplex.bde.org> <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at> <20190120145627.X1077@besplex.bde.org> <20190120163533.K1342@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I am not using resume at all... just normal startup/shutdown. -- Martin On 20.01.19 07:19, Bruce Evans wrote: > On Sun, 20 Jan 2019, Bruce Evans wrote: > >> [iflib_media_change() is missing iflib_stop(), like iflib_resume() was] >> >> I don't know what the media was after the broken resume. Its reported >> result can't be trusted anyway. To recover from the broken resume, it >> usually worked to repeat down/up a few times. This is consistent with >> bug -- eventually, previous down/up's change the state to close enough >> to stopped. But using the interface in any way (including pinging it >> to see if it is still broken) makes it not so close to being stopped. > > Further debugging after restoring the bug in resume: > - I use mainly zzz to suspend > - the bug usually doesn't break the interface if I copy zzz from nfs to > non-nfs and use the copy. This explains why almost no one except me > noticed the bug -- zzz is usually not on nfs, and other nfs activity > is usually lighter than mine too. (Suspend apparently doesn't do > enough > stopping or syncing generally. It should fsync() all files ...) > - the bug usually does break the interface if zzz is on nfs > - when the bug breaks the interface: > - the media is reported as unchanged > - after DUPs starting with a delay of many seconds and reducing by the > ping interval of 1 second for each until the delay is less than 1 > second, the ping latency stabilizes at quite different values after > each suspend/resume. These values tend to be higher than for media > change (several hundred ms instead of 76 ms). > - my ifconfig excutable is one of several under /sbin which is not > on nfs, > but my ifconfig is actually a shell script in $HOME/bin; the script > selects the correct version of ifconfig for the current kernel; it is > on nfs, and uses utilties on nfs. I sometimes forget this, and then > running plain ifconfig to attempt to recover takes too long, and if I > wait then the nfs activity for finding ifconfig not on nfs tends to > propagate the broken interface (like zzz not on nfs breaks it). > Manually selecting the correct version of ifconfig under /sbin and > using > it tends to work right (like zzz not on nfs). > - even an mtu change is enough to recover. This is not surprising, > since > it does slightly more than down/up as an implementation detail. This > shows that the reported media value is at least used by the reinit > for > the mtu change. > - pinging the interface didn't make it active enough for the > recovery to > not usually work. > > Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d67d78e4-8817-11b0-dc6f-9b8b09f7c298>