From owner-freebsd-net@FreeBSD.ORG Tue Feb 6 22:19:02 2007 Return-Path: X-Original-To: freebsd-net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8083D16A400 for ; Tue, 6 Feb 2007 22:19:02 +0000 (UTC) (envelope-from oleg@lath.rinet.ru) Received: from lath.rinet.ru (lath.rinet.ru [195.54.192.90]) by mx1.freebsd.org (Postfix) with ESMTP id E93DA13C428 for ; Tue, 6 Feb 2007 22:19:00 +0000 (UTC) (envelope-from oleg@lath.rinet.ru) Received: from lath.rinet.ru (localhost [127.0.0.1]) by lath.rinet.ru (8.13.8/8.13.8) with ESMTP id l16MIxJD067499 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 7 Feb 2007 01:18:59 +0300 (MSK) (envelope-from oleg@lath.rinet.ru) Received: (from oleg@localhost) by lath.rinet.ru (8.13.8/8.13.8/Submit) id l16MIvDA067498; Wed, 7 Feb 2007 01:18:57 +0300 (MSK) (envelope-from oleg) Date: Wed, 7 Feb 2007 01:18:57 +0300 From: Oleg Bulyzhin To: Bruce Evans Message-ID: <20070206221857.GA66675@lath.rinet.ru> References: <20070125170532.c9c2374hkwk4oc4k@server.yirdis.net> <20070205232800.GA45487@lath.rinet.ru> <20070207003539.I31484@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070207003539.I31484@besplex.bde.org> User-Agent: Mutt/1.5.13 (2006-08-11) Cc: Robin Gruyters , freebsd-net@FreeBSD.org Subject: Re: [Fwd: Re: bge Ierr rate increase from 5.3R -> 6.1R] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Feb 2007 22:19:02 -0000 On Wed, Feb 07, 2007 at 01:31:39AM +1100, Bruce Evans wrote: > On Tue, 6 Feb 2007, Oleg Bulyzhin wrote: > > >Sorry for the late reply - i was AFK for some time and didnt read mails. > >Patch against 6.2R attached, please let me know does it helps or not. > > I didn't test the old version of this since I have too many local changes > in my non-6.x versions. I might test the 6.x version. > > I use jdp's quicker fix. It works fine for detecting cable unplug > and replug, but link detection is still very bad at boot time and > after down/up (seems to be worse for down/up than unplug/replug?). > Link detection in -current generally seems to be much worse than > in 5.2. On some systems I use two ping -c2's early in the boot to > wait for the link to actually be up. The first ping tends to fail > and the second tends to work, both long after the lonk claims to > be up. Then other network activity still takes too long to start. > Without the pings, an "ntpdate -b" early in the boot fails about > half the time and gives messed up timing activity when it fails, > and initial nfs mounts tak 30-60 seconds. Later after down/up and > waiting for the "up" message, ttcp -u usually fails to connect the > first time and then works normally with no failure or connection > delay the second time. Could you please give some more details? I'm intersted in: - chip version - are you using 'auto' media or fixed one? - have you tried verbose boot? (MAC's link state (bge_link) reported with it). - is there recipe how to trigger erroneous behaviour? i'll try it on mine bge cards. (i have bcm5721 5700 & 5701, but i didnt notice errors in link handling with -current driver (and i had problems with 5.x driver)) The fact 'ping' workaround does help looks like lost interrupt. > > % Index: sys/dev/bge/if_bgereg.h > % =================================================================== > % RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v > % retrieving revision 1.36.2.8.2.1 > % diff -u -r1.36.2.8.2.1 if_bgereg.h > % --- sys/dev/bge/if_bgereg.h 21 Dec 2006 21:53:54 -0000 1.36.2.8.2.1 > % +++ sys/dev/bge/if_bgereg.h 5 Feb 2007 23:23:22 -0000 > % @@ -2460,8 +2460,13 @@ > % uint32_t bge_tx_buf_ratio; > % int bge_if_flags; > % int bge_txcnt; > % - int bge_link; /* link state */ > % - int bge_link_evt; /* pending link event */ > % + uint32_t bge_sts; > % +#define BGE_STS_LINK 0x00000001 /* MAC link status */ > % +#define BGE_STS_LINK_EVT 0x00000002 /* pending link > event */ > % +#define BGE_STS_AUTOPOLL 0x00000004 /* PHY auto-polling > */ > % +#define BGE_STS_BIT(sc, x) ((sc)->bge_sts & (x)) > % +#define BGE_STS_SETBIT(sc, x) ((sc)->bge_sts |= (x)) > % +#define BGE_STS_CLRBIT(sc, x) ((sc)->bge_sts &= ~(x)) > > Style bugs: > - inconsistents tabs. bge mostly consitently doesn't follow KNF for the > tab after #define. mea culpa. > - I don't like obfuscating bit testing and setting using macros. The > above macros are quite different from the BGE and PCI SETBIT and CLRBIT > macros -- the latter operate on hardware bits and do something useful > by hiding the details of the hardware accesses, and aren't assoicated > with bit testing macros. Apart from the above, testing of hardware bits > and testing an setting of software bits is always done using direct > "&" and "|" operations in bge. I'm going to agree with you. (I was hesitating about using macros here and your arguments look reasonable.) > > % Index: sys/dev/bge/if_bge.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v > % retrieving revision 1.91.2.18.2.1 > % diff -u -r1.91.2.18.2.1 if_bge.c > % --- sys/dev/bge/if_bge.c 21 Dec 2006 21:53:54 -0000 1.91.2.18.2.1 > % +++ sys/dev/bge/if_bge.c 5 Feb 2007 23:23:29 -0000 > % @@ -2645,12 +2648,12 @@ > % > % /* Note link event. It will be processed by POLL_AND_CHECK_STATUS > cmd */ > % if (statusword & BGE_STATFLAG_LINKSTATE_CHANGED) > % - sc->bge_link_evt++; > % + BGE_STS_SETBIT(sc, BGE_STS_LINK_EVT); > % > % if (cmd == POLL_AND_CHECK_STATUS) > % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && > % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || > % - sc->bge_link_evt || sc->bge_tbi) > % + BGE_STS_BIT(sc, BGE_STS_LINK_EVT) || sc->bge_tbi) > % bge_link_upd(sc); > > I suspected this is the cause of slowness for polling in idle (see previous > mail), but since it doesn't do any PCI access directly it should be fast > enough. > The macros make it unclear that it doesn't do any PCI accesses > directly. Agree. > > % @@ -2699,7 +2702,7 @@ > % > % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && > % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || > % - statusword || sc->bge_link_evt) > % + statusword || BGE_STS_BIT(sc, BGE_STS_LINK_EVT)) > % bge_link_upd(sc); > % > % if (ifp->if_drv_flags & IFF_DRV_RUNNING) { > > The software link status handling causes a problem in the interrupt > handler. To avoid pessimizing the case of shared interrupts, the > interrupt handler should be able to read the status word from the > status block and return without doing anything if the interrupt is not > for it. This can be done without acquiring the driver lock (since the > driver lock is neither necessary nor sufficient for accessing the > status block). Software link status gets in the way of this, since > accessing it requires the driver lock. You are right, but at the moment, bge_intr() is locked and does not care about shared interrupts. If we are going to fix this i would vote for using taskqueue api - in this case we can test 'link event' flag inside locked taskqueue thread. >(Note that `statusword' in the > above is now bogusly named. It used to be the status word from the > status block but is now the mac status for a PCI register. It is still > from the status block in similar code in bge_poll(). Not pessimizing > the case of shared interrupts requires using the status block again, > and then the read from the PCI register might become a dummy.) We can not avoid pci register read before syncing status block - we should flush data posted at pci bridge, otherwise we can 'miss' interrupt. > > Elsewhere, you added code to force interrupt after link status changes. > I think this is to get the interrupt handler to look at the software > link status in the above. When I first saw it, I thought that forcing > an interrupt is needed in more places. Forced interrupt used for TBI cards since they do not have PHY auto-polling. > > There are also complications to work around hardware bugs in reporting > the link status. Maybe the complications can be avoided by pushing > them to the tick routine. My version doesn't worry about them except > in a commment and returns early without locking if the status block > doesn't claim to have been updated. > > Bruce -- Oleg. ================================================================ === Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru === ================================================================