Date: Mon, 18 Dec 2006 23:58:24 -0500 From: Scott Long <scottl@samsco.org> To: Bruce Evans <bde@zeta.org.au> Cc: cvs-src@freebsd.org, src-committers@freebsd.org, Scott Long <scottl@freebsd.org>, cvs-all@freebsd.org, Jung-uk Kim <jkim@freebsd.org> Subject: Re: cvs commit: src/sys/dev/bge if_bge.c Message-ID: <45877170.4030307@samsco.org> In-Reply-To: <20061218220448.S1577@epsplex.bde.org> References: <200612132051.kBDKppS4058663@repoman.freebsd.org> <200612131846.33252.jkim@FreeBSD.org> <20061214152805.D2109@besplex.bde.org> <20061216031759.N11941@delplex.bde.org> <20061218220448.S1577@epsplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > On Sat, 16 Dec 2006, I wrote: > >> On Thu, 14 Dec 2006, I wrote: >> >>> On Wed, 13 Dec 2006, Jung-uk Kim wrote: >>> >>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote: >>>>> scottl 2006-12-13 20:51:51 UTC >>>>> >>>>> FreeBSD src repository >>>>> >>>>> Modified files: >>>>> sys/dev/bge if_bge.c >>>>> Log: >>>>> Remove a redundant write of the firmware reset magic number. It >>>>> ... >>>> I am still getting firmware handshake timeouts and/or watchdog >>>> timeouts. Most importantly it panics or get witness warnings (lots >>>> of 'memory modified after free'). Panic goes like this (while >>>> kldunload if_bge with dhclient enabled): >>>> >>>> brgphy0: detached >>>> miibus0: detached >>>> bge0: firmware handshake timed out, found 0x4b657654 >>>> bge0: firmware handshake timed out, found 0x4b657654 >>> >>> I have seen these for debugging the redundant-write problem (not for >>> detach but for bringing up the interface for the first time). My 5701 >>> just hangs if there is any redundant write (2 where the first one was >>> in bge_reset(), or 2 separate, or 2 where the second one was). My >>> 5705 survives two separate sets of 256 repeated writes; however, then >>> the firmware handshake times out; however2, everything works normally >>> after ignoring the the timeout except for printing the message. I >>> just noticed that this error wasn't ignored until recently -- I noticed >>> the return statement being removed but not that it was in a critical >>> area. >> >> The debugging code doesn't seem to have been responsible for this. >> Now, without it I almost (?) always get handshake errors on the 5705, >> but never (?) on the 5701. Apparently, the 3rd write (the one that >> was removed) was the only correctly placed one. > > Avoiding the "write_op" part of the changes fixes the handshake errors > on my non-PCIE 5705. write_op is only used to write the reset value and > one other value to BGE_MISC_CFG. bge_writemem_ind() apparently writes > the reset to nowhere, but bge_writereg() still works. > > %%% > Index: if_bge.c > =================================================================== > RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v > retrieving revision 1.165 > diff -u -2 -r1.165 if_bge.c > --- if_bge.c 15 Dec 2006 00:27:06 -0000 1.165 > +++ if_bge.c 18 Dec 2006 10:44:05 -0000 > @@ -2544,4 +2634,7 @@ > if (sc->bge_flags & BGE_FLAG_PCIE) > write_op = bge_writemem_direct; > + /* XXX bge_writemem_ind is wrong for at least reset of 5705. */ > + else if (sc->bge_asicrev == BGE_ASICREV_BCM5705) > + write_op = bge_writereg_ind; > else > write_op = bge_writemem_ind; > %%% > > The panics might be caused by the change making the reset null. Resetting > might be much more necessary for uninitialization than for initialization. > > The bug caused the following behaviour here: > - the problem with taking a long time to start serving nfs requests (with > /usr nfs-mounted) became larger. Normally, nfs tries to start before > the interface is really up and then it takes about a minute to start. > With the bug, it often got portmap errors and sometimes never started. > - after "ifconfig down", it took a reboot to bring the interface back up. > > Bruce Ok, this looks like a result of me not understanding a bit of the linux code that I read. When doing the reset, the linux equivalent of bge_writemem_ind() is specifically avoided. I'm on vacation for the next 10 days, but I'll try to put together a patch that addresses this and other problems soon. Ping my after the first of the year otherwise. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45877170.4030307>