From owner-freebsd-stable@FreeBSD.ORG Tue Aug 22 01:51:08 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BCC116A4DF for ; Tue, 22 Aug 2006 01:51:08 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.176]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9DF7043D4C for ; Tue, 22 Aug 2006 01:51:07 +0000 (GMT) (envelope-from pyunyh@gmail.com) Received: by py-out-1112.google.com with SMTP id o67so2636201pye for ; Mon, 21 Aug 2006 18:51:07 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=LJwcfpFDoOfizJ/91mrXdVtaT8Ee3Sr/3gouMeySskf6hkoUMFUV1UMJGmcnTDtGt0pArC7u7aVNlwcaOwcw6CAJryD1s00wWXrjfYtcQyzMI0TFUEzdYjAwVe2cIbhpM426y9nhFoAFiNSqe8dencaspuEyh74kUp4BJOyaRzY= Received: by 10.35.63.2 with SMTP id q2mr14489298pyk; Mon, 21 Aug 2006 18:51:07 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.gmail.com with ESMTP id 12sm211260nzn.2006.08.21.18.51.05; Mon, 21 Aug 2006 18:51:06 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id k7M1on7l013327 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 22 Aug 2006 10:50:49 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id k7M1omRL013326; Tue, 22 Aug 2006 10:50:48 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Tue, 22 Aug 2006 10:50:47 +0900 From: Pyun YongHyeon To: "Patrick M. Hausen" Message-ID: <20060822015047.GA12848@cdnetworks.co.kr> References: <20060821120052.0B25816A526@hub.freebsd.org> <200608211414.16731.matt@chronos.org.uk> <20060821132743.GC45736@hugo10.ka.punkt.de> <44E9B7C1.9010708@goodforbusiness.co.uk> <20060821142613.GI45736@hugo10.ka.punkt.de> <44E9C5B3.90604@goodforbusiness.co.uk> <20060821145328.GL45736@hugo10.ka.punkt.de> <20060821150744.GN45736@hugo10.ka.punkt.de> <20060821195202.GA57333@hugo10.ka.punkt.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060821195202.GA57333@hugo10.ka.punkt.de> User-Agent: Mutt/1.4.2.1i Cc: freebsd-stable@freebsd.org Subject: Re: ICH7 SATA and em interrupt sharing X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Aug 2006 01:51:08 -0000 On Mon, Aug 21, 2006 at 09:52:02PM +0200, Patrick M. Hausen wrote: > And yet more testing ... > > I rebuilt my kernel without USB devices and made sure > atapci1 doesn't share an interrupt with anything: > > pcib1: 16 > pcib2: 20 > em0: 16 > em1: 17 > fxp0: 16 > atapci1: 19 > atkbdc0: 1 > atkbd0: 1 > sio0: 4 > sio1: 3 > ppc0: 7 > > Side note: on this particular box I had to leave the USB devices > enabled in the BIOS setup, otherwise em0 would end up on the same > interrupt as atapci1 |-) > > Then I ran make buildworld and in parallel started to transfer a large > file via FTP (done by fetching a sparse file of 10 GB) maxing out > or 100 Mbit/s LAN. > > *boom* - or so I thought ;-) The ssh session was stuck, the system did > not respond to ICMP echo. OK, wait until tomorrow morning to reset it ... > ... just gave it one more ping an hour later, and the machine was > alive again! It did not panic/reboot, the buildworld was running and > the file transfer was transferring a file. > > In /var/log messages I found: > > Aug 21 21:37:08 tomcat kernel: em0: Missing Tx completion interrupt! > Aug 21 21:39:55 tomcat kernel: em0: Missing Tx completion interrupt! > Aug 21 21:40:29 tomcat kernel: em0: Missing Tx completion interrupt! > > Seems like for some reason the netwok card blocked for a couple > of minutes, then resumed. > > This was all with debug.mpsafenet set to 1. Now I'm running the same > stress test with debug.mpsafenet set to 0 and I haven't seen any > problem/hang at all. > > Wait a minute ... now as I'm typing this message, ssh to the > box hangs again. Damn. > > I think I'll try the fxp interface for production use and disable the > onboard Gigabit NICs. > > Now the ssh session is responding again while the file transfer reports > "Connection reset by peer". > > Dmesg shows: > > em0: Missing Tx completion interrupt! > em0: Missing Tx completion interrupt! > em0: Missing Tx completion interrupt! > em0: Missing Tx completion interrupt! > em0: Missing Tx completion interrupt! > em0: Missing Tx completion interrupt! > Thanks for the testing. The above message means the patch really worked. Otherwise you would have seen (false) watchdog errors on your system. I guess the two possible cause of missing Tx completion interrupts comes from a chipset bug or Tx interrupt moderation mechanism. If Tx interrupt moderation mechanism is the cause of false watchdog triggering we should have to fix all device drivers that have Tx interrupt moderation capability. I'll have to check archives for bge(4). I'll commit the em(4) patch soon. What you see in ssh session and lack of response for ICMP echo request indicates other issues. I can't sure but it may not related with network drivers at all(eg. sharing interrupt with other devices). > I'm still not able to really reproduce the SATA problem others are > reporting, besides forcing em0 to share its interrupt with the > SATA controller. This can easily be avoided - at least with our > hardware. > > > Regards, > > Patrick M. Hausen > Leiter Netzwerke und Sicherheit > -- > punkt.de GmbH Internet - Dienstleistungen - Beratung > Vorholzstr. 25 Tel. 0721 9109 -0 Fax: -100 > 76137 Karlsruhe http://punkt.de > _______________________________________________ -- Regards, Pyun YongHyeon