Date: Fri, 1 Sep 2006 14:14:17 +0300 From: "Mihail Balikov" <misho@interbgc.com> To: "Rob Watt" <rob@hudson-trading.com> Cc: freebsd-net@freebsd.org Subject: Re: Intel em receive hang and possible pr #72970 Message-ID: <00f901c6cdb7$c3c51460$08e009d9@misho> References: <Pine.OSX.4.64.0608311124590.8120@cpe-72-229-120-238.nyc.res.rr.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This bug should be fixed with cvs rev. 1.129 ----- Original Message ----- From: "Rob Watt" <rob@hudson-trading.com> To: <freebsd-net@freebsd.org> Sent: Thursday, August 31, 2006 8:15 PM Subject: Intel em receive hang and possible pr #72970 > Hi, > > We have experienced a very sporadic problem on 2 amd64 machines running > FreeBSD 6.0-RELEASE. > > The hardware: > > Tyan K8SR motherboard > 2 AMD 275 dual-core processors > Intel Pro 1000 MT dual-port copper server card > Intel Pro 1000 MF dual-port fiber server card > Adaptec 2230S Raid controller > > These machines receive multicast & tcp data on multiple interfaces and > process it & record it to disk and then rebroadcast it on one interface. > > Twice now (once on each machine after a recent upgradee to 6.0-RELEASE) > the 2 fiber em interfaces seemed to stop receiving. Transmits seemed to > still be happening, and the machine itself was not hung. We could > console into it and do anything not network related. > > The first time this happened we opted to quickly disconnect the machine > from the network and move its processes to a backup machine. We did not > see anything interesting with netstat, vmstat, logs, etc (I do not > remember however which exact tests I ran at the time). Everything seemed > normal except that it was not receiving on the 2 fiber interfaces (we did > not actually test the other interfaces, but one of our apps that uses the > copper interfaces was still receiving data). We rebooted the machine and > ran Intel's nic diagnostics. The card passed all of the tests through like > 100 iterations. > > We eventually put the machine back into production. The second machine had > the same problem. Unfortunately I was on vacation when it happened and did > get to do any diagnostics. The developers just put the backup machine > into production and rebooted the one with the problem. > > After poking around in various group/pr postings the most similar problem > that we found was PR #72970. > http://www.freebsd.org/cgi/query-pr.cgi?pr=72970 > > Does it seem that we are encountering that bug? Is that bug fixed in > 6.1-RELEASE, or is there an easy patch to 6.0-RELEASE (i.e. can we only > patch the em driver). > > If it does not seem that we are triggering that bug, does anyone have any > thoughts about what the problem could be? > > We have done fairly intense stress testing in the past on these machines > with tons of network/disk/cpu/memory activity all happening at the same > time, and we've never encountered this bug. The fact that it is not easily > repeatable makes it hard to test for. Any testing suggestions would also > be appreciated. > > Thanks > - > Rob Watt > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > __________ NOD32 1.1672 (20060721) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00f901c6cdb7$c3c51460$08e009d9>