FreeBSD Mail Archives

Date:      Sun, 20 Jan 2019 09:15:27 +0100
From:      Martin Birgmeier <d8zNeCFG@aon.at>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Eugene Grosbein <eugen@grosbein.net>, net@freebsd.org
Subject:   Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior
Message-ID:  <d67d78e4-8817-11b0-dc6f-9b8b09f7c298@aon.at>
In-Reply-To: <20190120163533.K1342@besplex.bde.org>
References:  <bug-235031-7501@https.bugs.freebsd.org/bugzilla/> <bug-235031-7501-goXNmp3zVl@https.bugs.freebsd.org/bugzilla/> <20190119204156.D929@besplex.bde.org> <3e407ee7-54e3-a6ac-5535-d11aceca9558@grosbein.net> <20190120061258.X3312@besplex.bde.org> <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at> <20190120145627.X1077@besplex.bde.org> <20190120163533.K1342@besplex.bde.org>

I am not using resume at all... just normal startup/shutdown.

-- Martin

On 20.01.19 07:19, Bruce Evans wrote:
> On Sun, 20 Jan 2019, Bruce Evans wrote:
>
>> [iflib_media_change() is missing iflib_stop(), like iflib_resume() was]
>>
>> I don't know what the media was after the broken resume.  Its reported
>> result can't be trusted anyway.  To recover from the broken resume, it
>> usually worked to repeat down/up a few times.  This is consistent with
>> bug -- eventually, previous down/up's change the state to close enough
>> to stopped.  But using the interface in any way (including pinging it
>> to see if it is still broken) makes it not so close to being stopped.
>
> Further debugging after restoring the bug in resume:
> - I use mainly zzz to suspend
> - the bug usually doesn't break the interface if I copy zzz from nfs to
>   non-nfs and use the copy.  This explains why almost no one except me
>   noticed the bug -- zzz is usually not on nfs, and other nfs activity
>   is usually lighter than mine too.  (Suspend apparently doesn't do
> enough
>   stopping or syncing generally.  It should fsync() all files ...)
> - the bug usually does break the interface if zzz is on nfs
> - when the bug breaks the interface:
>   - the media is reported as unchanged
>   - after DUPs starting with a delay of many seconds and reducing by the
>     ping interval of 1 second for each until the delay is less than 1
>     second, the ping latency stabilizes at quite different values after
>     each suspend/resume.  These values tend to be higher than for media
>     change (several hundred ms instead of 76 ms).
>   - my ifconfig excutable is one of several under /sbin which is not
> on nfs,
>     but my ifconfig is actually a shell script in $HOME/bin; the script
>     selects the correct version of ifconfig for the current kernel; it is
>     on nfs, and uses utilties on nfs.  I sometimes forget this, and then
>     running plain ifconfig to attempt to recover takes too long, and if I
>     wait then the nfs activity for finding ifconfig not on nfs tends to
>     propagate the broken interface (like zzz not on nfs breaks it).
>     Manually selecting the correct version of ifconfig under /sbin and
> using
>     it tends to work right (like zzz not on nfs).
>   - even an mtu change is enough to recover.  This is not surprising,
> since
>     it does slightly more than down/up as an implementation detail.  This
>     shows that the reported media value is at least used by the reinit
> for
>     the mtu change.
>   - pinging the interface didn't make it active enough for the
> recovery to
>     not usually work.
>
> Bruce

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d67d78e4-8817-11b0-dc6f-9b8b09f7c298>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation