Date: Wed, 27 Sep 2006 17:28:24 +0200 From: Oliver Brandmueller <ob@e-Gitt.NET> To: freebsd-stable@freebsd.org Subject: Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2 Message-ID: <20060927152824.GJ22229@e-Gitt.NET> In-Reply-To: <451A4189.5020906@samsco.org> References: <451A1375.5080202@gneto.com> <20060927071538.GF22229@e-Gitt.NET> <451A4189.5020906@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--6v9BRtpmy+umdQlo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Scott, On Wed, Sep 27, 2006 at 03:16:57AM -0600, Scott Long wrote: > Well, the best I can say at the moment is, "Wow." =3D-( I guess the=20 > thing to do here is to figure out if the problem lies with the em=20 > interrupt handler not getting run, or the taskqueue not getting run. > Since you've stated that it seems to be related to shared interrupts, > the first possibility is more likely. However, I'm not sure why the > symptom would only be showing up now. The Intel docs say that the > 82547EI are a bit interesting, and I wonder if assumptions that we > make about PCI ordering aren't true (or if there are bugs that make > our assumptions invalid). >=20 > Does this happen after there has been a lot of disk activity, like a > large tar extraction? Are you using the SMBus interface at all, or is > it sitting completely idle? Disk activity does not trigger the problem, I hammered the disk with=20 around 85 MB/s (dd) for about half an hour without seeing any effect. A=20 CPU bound thing like a buildworld triggered the problem. The SMBus Interface is not used at all (it's not even really usable).=20 Anyway, as soon as I unload the ichsmb module I cannot triger the=20 problem anymore. If I load it again, the problem cann again be triggered=20 by a buildworld. Statistical relevance: I did 4 buildworlds, alternating=20 the load/unload of ichsmb - both times with ichsmb loaded I saw 3=20 watchdog timeouts during the buildworld was running, while ichsmb was=20 not loaded I did not see a single watchdog timeout. The use of the=20 interface was around the same during all the time (constant NFS traffic=20 of around 1-2 MBit/s). Since we all seem to see this on only the interfaces sharing interrupts=20 (as I read the other poster's mails) and the problem can be worked=20 around by using polling, it seems to become pretty clear, that it has to=20 to with interrupt handling. The UP/SMP idea seems to be only of interest, because on an UP machine=20 it's more likely to share interrupts than on SMP machines, it has=20 nothing to do with the fact of UP or SMP itself. - Oliver --=20 | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | --6v9BRtpmy+umdQlo Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFGpiYiqtMdzjafykRAoWMAKCmH+zVUeY1R263+zEmQptI0ENY+ACePWhc VZBmot9E+2WoZoEPM1gL1UY= =qHLI -----END PGP SIGNATURE----- --6v9BRtpmy+umdQlo--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060927152824.GJ22229>