From owner-freebsd-net@freebsd.org Sun Jan 20 06:19:21 2019 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83DBD14A70DE for ; Sun, 20 Jan 2019 06:19:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0045E88682 for ; Sun, 20 Jan 2019 06:19:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id B480F14A70DD; Sun, 20 Jan 2019 06:19:20 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F44314A70DC for ; Sun, 20 Jan 2019 06:19:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id F12CE88681 for ; Sun, 20 Jan 2019 06:19:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id C65CF104C3E8; Sun, 20 Jan 2019 17:19:09 +1100 (AEDT) Date: Sun, 20 Jan 2019 17:19:08 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Martin Birgmeier cc: Eugene Grosbein , net@freebsd.org Subject: Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior In-Reply-To: <20190120145627.X1077@besplex.bde.org> Message-ID: <20190120163533.K1342@besplex.bde.org> References: <20190119204156.D929@besplex.bde.org> <3e407ee7-54e3-a6ac-5535-d11aceca9558@grosbein.net> <20190120061258.X3312@besplex.bde.org> <16ce1832-13da-d7bb-cce2-6682e058b5a6@aon.at> <20190120145627.X1077@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=uqnNwRQ6dyAJxq1r3IgA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: F12CE88681 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.96 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.96)[-0.956,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jan 2019 06:19:21 -0000 On Sun, 20 Jan 2019, Bruce Evans wrote: > [iflib_media_change() is missing iflib_stop(), like iflib_resume() was] > > I don't know what the media was after the broken resume. Its reported > result can't be trusted anyway. To recover from the broken resume, it > usually worked to repeat down/up a few times. This is consistent with > bug -- eventually, previous down/up's change the state to close enough > to stopped. But using the interface in any way (including pinging it > to see if it is still broken) makes it not so close to being stopped. Further debugging after restoring the bug in resume: - I use mainly zzz to suspend - the bug usually doesn't break the interface if I copy zzz from nfs to non-nfs and use the copy. This explains why almost no one except me noticed the bug -- zzz is usually not on nfs, and other nfs activity is usually lighter than mine too. (Suspend apparently doesn't do enough stopping or syncing generally. It should fsync() all files ...) - the bug usually does break the interface if zzz is on nfs - when the bug breaks the interface: - the media is reported as unchanged - after DUPs starting with a delay of many seconds and reducing by the ping interval of 1 second for each until the delay is less than 1 second, the ping latency stabilizes at quite different values after each suspend/resume. These values tend to be higher than for media change (several hundred ms instead of 76 ms). - my ifconfig excutable is one of several under /sbin which is not on nfs, but my ifconfig is actually a shell script in $HOME/bin; the script selects the correct version of ifconfig for the current kernel; it is on nfs, and uses utilties on nfs. I sometimes forget this, and then running plain ifconfig to attempt to recover takes too long, and if I wait then the nfs activity for finding ifconfig not on nfs tends to propagate the broken interface (like zzz not on nfs breaks it). Manually selecting the correct version of ifconfig under /sbin and using it tends to work right (like zzz not on nfs). - even an mtu change is enough to recover. This is not surprising, since it does slightly more than down/up as an implementation detail. This shows that the reported media value is at least used by the reinit for the mtu change. - pinging the interface didn't make it active enough for the recovery to not usually work. Bruce