Date: Mon, 24 Apr 2006 16:47:36 +0400 From: Oleg Bulyzhin <oleg@freebsd.org> To: Lars Erik Gullerud <lerik@nolink.net> Cc: freebsd-net@freebsd.org Subject: Re: Watchdog timeouts and dead network on bge - 6.1-RC1 Message-ID: <20060424124736.GA72623@lath.rinet.ru> In-Reply-To: <20060423114810.P36951@electra.nolink.net> References: <20060423114810.P36951@electra.nolink.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--K8nIJk4ghYZn606h Content-Type: multipart/mixed; boundary="17pEHd4RhPHOinZp" Content-Disposition: inline --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Apr 23, 2006 at 02:35:24PM +0200, Lars Erik Gullerud wrote: > We recently upgraded one of our 4.11 servers to 6.1-RC1. The server is a= =20 > Dell PE2650, dual Xeons, and has two onboard Broadcom BCM5701 cards, usin= g=20 > the bge driver. >=20 > Some older threads on -net and -current led me to believe that most issue= s=20 > with bge driver in FreeBSD >4 had been sorted. However, after our upgrade= ,=20 > we are seing errors like this: >=20 > Apr 22 18:44:01 nebula kernel: bge0: watchdog timeout -- resetting > Apr 22 18:44:01 nebula kernel: bge0: link state changed to DOWN > Apr 22 18:44:03 nebula kernel: bge0: link state changed to UP >=20 > ...and more importantly - when this happens, the network connection does= =20 > NOT in fact come back up. Logging into the box locally (or via a differen= t=20 > network interface) and manually issuing "ifconfig bge0 down ; ifconfig=20 > bge0 up" DOES get the interface going again, however. >=20 > We have only seen this on very high network loads - the particular messag= e=20 > included above occured while transferring some 120GB of data from a 4.11= =20 > NFS-server to this 6.1-RC1 box. >=20 > Is this a known issue in bge? If so, is anyone working on it? Can we=20 > provide some useful information to whoever this might be? >=20 > We have never had any issues with bge in 4.x, but we really need to get= =20 > this server up to 5.x/6.x at this point in time, any other suggestions on= =20 > knobs or workarounds that can give us bge stability? >=20 > Thanks in advance, >=20 > /leg > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" Could you try attached patch? It should fix problem when link goes UP but network is still down. About bge resets: you should try if_bge.c rev.1.126, it may help. P.S. anyway, please report how is it going. --=20 Oleg. --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="bge_init_intr.diff" Content-Transfer-Encoding: quoted-printable Index: if_bge.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v retrieving revision 1.91.2.13 diff -u -r1.91.2.13 if_bge.c --- if_bge.c 4 Mar 2006 09:34:48 -0000 1.91.2.13 +++ if_bge.c 17 Apr 2006 19:39:15 -0000 @@ -3308,6 +3308,14 @@ =09 bge_ifmedia_upd(ifp); =20 + sc->bge_link_evt++; +#ifdef DEVICE_POLLING + if (!(sc->bge_ifp->if_capenable & IFCAP_POLLING)) +#endif + { + BGE_SETBIT(sc, BGE_MISC_LOCAL_CTL, BGE_MLC_INTR_SET); + } + ifp->if_drv_flags |=3D IFF_DRV_RUNNING; ifp->if_drv_flags &=3D ~IFF_DRV_OACTIVE; =20 --17pEHd4RhPHOinZp-- --K8nIJk4ghYZn606h Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFETMjoryLc73jOEF8RAtiKAJ9pIMnO2PurXk2R56rXPQiHPV7TLgCeLrnQ cGozrySFhNa1bLL1J5sD2/Q= =Wr6Y -----END PGP SIGNATURE----- --K8nIJk4ghYZn606h--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060424124736.GA72623>