Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Sep 2006 14:14:17 +0300
From:      "Mihail  Balikov" <misho@interbgc.com>
To:        "Rob Watt" <rob@hudson-trading.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Intel em receive hang and possible pr #72970
Message-ID:  <00f901c6cdb7$c3c51460$08e009d9@misho>
References:  <Pine.OSX.4.64.0608311124590.8120@cpe-72-229-120-238.nyc.res.rr.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This bug should be fixed with cvs rev. 1.129

----- Original Message ----- 
From: "Rob Watt" <rob@hudson-trading.com>
To: <freebsd-net@freebsd.org>
Sent: Thursday, August 31, 2006 8:15 PM
Subject: Intel em receive hang and possible pr #72970


> Hi,
>
> We have experienced a very sporadic problem on 2 amd64 machines running
> FreeBSD 6.0-RELEASE.
>
> The hardware:
>
>   Tyan K8SR motherboard
>   2 AMD 275 dual-core processors
>   Intel Pro 1000 MT dual-port copper server card
>   Intel Pro 1000 MF dual-port fiber server card
>   Adaptec 2230S Raid controller
>
> These machines receive multicast & tcp data on multiple interfaces and
> process it & record it to disk and then rebroadcast it on one interface.
>
> Twice now (once on each machine after a recent upgradee to 6.0-RELEASE)
> the 2 fiber em interfaces seemed to stop receiving. Transmits seemed to
> still be happening, and the machine itself was not hung. We could
> console into it and do anything not network related.
>
> The first time this happened we opted to quickly disconnect the machine
> from the network and move its processes to a backup machine. We did not
> see anything interesting with netstat, vmstat, logs, etc (I do not
> remember however which exact tests I ran at the time). Everything seemed
> normal except that it was not receiving on the 2 fiber interfaces (we did
> not actually test the other interfaces, but one of our apps that uses the
> copper interfaces was still receiving data). We rebooted the machine and
> ran Intel's nic diagnostics. The card passed all of the tests through like
> 100 iterations.
>
> We eventually put the machine back into production. The second machine had
> the same problem. Unfortunately I was on vacation when it happened and did
> get to do any diagnostics. The developers just put the backup machine
> into production and rebooted the one with the problem.
>
> After poking around in various group/pr postings the most similar problem
> that we found was PR #72970.
>   http://www.freebsd.org/cgi/query-pr.cgi?pr=72970
>
> Does it seem that we are encountering that bug? Is that bug fixed in
> 6.1-RELEASE, or is there an easy patch to 6.0-RELEASE (i.e. can we only
> patch the em driver).
>
> If it does not seem that we are triggering that bug, does anyone have any
> thoughts about what the problem could be?
>
> We have done fairly intense stress testing in the past on these machines
> with tons of network/disk/cpu/memory activity all happening at the same
> time, and we've never encountered this bug. The fact that it is not easily
> repeatable makes it hard to test for. Any testing suggestions would also
> be appreciated.
>
> Thanks
> -
> Rob Watt
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>
> __________ NOD32 1.1672 (20060721) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00f901c6cdb7$c3c51460$08e009d9>