From owner-freebsd-net@FreeBSD.ORG Fri Sep 1 11:14:21 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7DA6116A4DE for ; Fri, 1 Sep 2006 11:14:21 +0000 (UTC) (envelope-from misho@interbgc.com) Received: from mail.interbgc.com (mx01.cablebg.net [217.9.224.226]) by mx1.FreeBSD.org (Postfix) with SMTP id BD6E543D45 for ; Fri, 1 Sep 2006 11:14:20 +0000 (GMT) (envelope-from misho@interbgc.com) Received: (qmail 15471 invoked from network); 1 Sep 2006 11:14:18 -0000 Received: from misho@interbgc.com by keeper.interbgc.com by uid 1002 with qmail-scanner-1.14 (uvscan: v4.2.40/v4374. spamassassin: 2.63. Clear:SA:0(-1.6/8.0):. Processed in 0.742304 secs); 01 Sep 2006 11:14:18 -0000 X-Spam-Status: No, hits=-1.6 required=8.0 Received: from joiner.interbgc.com (HELO misho) (217.9.224.8) by mx01.interbgc.com with SMTP; 1 Sep 2006 11:14:17 -0000 Message-ID: <00f901c6cdb7$c3c51460$08e009d9@misho> From: "Mihail Balikov" To: "Rob Watt" References: Date: Fri, 1 Sep 2006 14:14:17 +0300 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1807 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1807 Cc: freebsd-net@freebsd.org Subject: Re: Intel em receive hang and possible pr #72970 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mihail Balikov List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Sep 2006 11:14:21 -0000 This bug should be fixed with cvs rev. 1.129 ----- Original Message ----- From: "Rob Watt" To: Sent: Thursday, August 31, 2006 8:15 PM Subject: Intel em receive hang and possible pr #72970 > Hi, > > We have experienced a very sporadic problem on 2 amd64 machines running > FreeBSD 6.0-RELEASE. > > The hardware: > > Tyan K8SR motherboard > 2 AMD 275 dual-core processors > Intel Pro 1000 MT dual-port copper server card > Intel Pro 1000 MF dual-port fiber server card > Adaptec 2230S Raid controller > > These machines receive multicast & tcp data on multiple interfaces and > process it & record it to disk and then rebroadcast it on one interface. > > Twice now (once on each machine after a recent upgradee to 6.0-RELEASE) > the 2 fiber em interfaces seemed to stop receiving. Transmits seemed to > still be happening, and the machine itself was not hung. We could > console into it and do anything not network related. > > The first time this happened we opted to quickly disconnect the machine > from the network and move its processes to a backup machine. We did not > see anything interesting with netstat, vmstat, logs, etc (I do not > remember however which exact tests I ran at the time). Everything seemed > normal except that it was not receiving on the 2 fiber interfaces (we did > not actually test the other interfaces, but one of our apps that uses the > copper interfaces was still receiving data). We rebooted the machine and > ran Intel's nic diagnostics. The card passed all of the tests through like > 100 iterations. > > We eventually put the machine back into production. The second machine had > the same problem. Unfortunately I was on vacation when it happened and did > get to do any diagnostics. The developers just put the backup machine > into production and rebooted the one with the problem. > > After poking around in various group/pr postings the most similar problem > that we found was PR #72970. > http://www.freebsd.org/cgi/query-pr.cgi?pr=72970 > > Does it seem that we are encountering that bug? Is that bug fixed in > 6.1-RELEASE, or is there an easy patch to 6.0-RELEASE (i.e. can we only > patch the em driver). > > If it does not seem that we are triggering that bug, does anyone have any > thoughts about what the problem could be? > > We have done fairly intense stress testing in the past on these machines > with tons of network/disk/cpu/memory activity all happening at the same > time, and we've never encountered this bug. The fact that it is not easily > repeatable makes it hard to test for. Any testing suggestions would also > be appreciated. > > Thanks > - > Rob Watt > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > __________ NOD32 1.1672 (20060721) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > >