From owner-freebsd-stable@FreeBSD.ORG Wed Dec 10 12:07:29 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62CDA1065679 for ; Wed, 10 Dec 2008 12:07:29 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.230]) by mx1.freebsd.org (Postfix) with ESMTP id 296F58FC12 for ; Wed, 10 Dec 2008 12:07:29 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so366261rvf.43 for ; Wed, 10 Dec 2008 04:07:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:received:date:from :to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=6D+Ra47muEmnYAQ1QZRzcDlC3iLHShC6NSG1/RcIdBs=; b=U8Y6GB/iw1bYN+E+oYVa9V8o9QXf79d7pmkkMhN3wSFwXTFZHKBdtXVKZioT+bwypU ncdj1z0bDNQOkLz1BCPMNhG7mXUV1rWIgTQ0Cw5nzrpfbUSs8O/j9mfdBiEk0yq28ExE p7JK+sVQjHGX4LOEyN8+/wQSYc+GQ4ryqSg4w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=A+yKhLEZ7T2gZxYLj5Q3My37S6th2a79GKckzBDDbJg4EnPjeqqBoLsrCu7F/KKigq XsofD91ahm5ywzPIdRkr2nLZCZZZyCIklR/aaT+SoybmqVNNv29PjoaH1vzFRZDYfTYJ GiyEeE5kE89Xc+qBW4mNh5Sq+mJfPj0/slJvU= Received: by 10.141.106.14 with SMTP id i14mr630953rvm.143.1228910848644; Wed, 10 Dec 2008 04:07:28 -0800 (PST) Received: from michelle.cdnetworks.co.kr ([211.53.35.84]) by mx.google.com with ESMTPS id l31sm2674990rvb.2.2008.12.10.04.07.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 10 Dec 2008 04:07:27 -0800 (PST) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id mBAC7KmS040243 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Dec 2008 21:07:20 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id mBAC7J55040242; Wed, 10 Dec 2008 21:07:19 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Wed, 10 Dec 2008 21:07:19 +0900 From: Pyun YongHyeon To: Victor Balada Diaz Message-ID: <20081210120719.GK37837@cdnetworks.co.kr> References: <20081209185236.GA1320@alf.bsdes.net> <20081210061226.GC37837@cdnetworks.co.kr> <20081210085934.GB1320@alf.bsdes.net> <20081210102800.GH37837@cdnetworks.co.kr> <20081210113225.GD1320@alf.bsdes.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081210113225.GD1320@alf.bsdes.net> User-Agent: Mutt/1.4.2.1i Cc: freebsd-stable@freebsd.org, freebsd-amd64@freebsd.org Subject: Re: [ATA] and re(4) stability issues X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2008 12:07:29 -0000 On Wed, Dec 10, 2008 at 12:32:25PM +0100, Victor Balada Diaz wrote: > On Wed, Dec 10, 2008 at 07:28:00PM +0900, Pyun YongHyeon wrote: > > On Wed, Dec 10, 2008 at 09:59:35AM +0100, Victor Balada Diaz wrote: > > > On Wed, Dec 10, 2008 at 03:12:26PM +0900, Pyun YongHyeon wrote: > > > > On Tue, Dec 09, 2008 at 07:52:37PM +0100, Victor Balada Diaz wrote: > > > > > Hello, > > > > > > > > > > I got various machines[1] at hetzner.de and I've been having problems > > > > > with interrupts on FreeBSD 7.0 and now FreeBSD 7.1 -BETA2 in amd64. I've > > > > > been trying to narrow the problem so someone more knowledgeable than me > > > > > is able to fix it. This mail is an other attempt to ask a question > > > > > with regards ATA code to see if this time i got something. > > > > > > > > > > For the ones that don't actually know what happened: > > > > > > > > > > With FreeBSD 7.0 -RELEASE for amd64 and default kernel > > > > > the system shared re0 interrupt with OHCI and this caused > > > > > re(4) to corrupt packets and create interrupt storms. Tried > > > > > > > > re(4) in 7.0-RELEASE had bus_dma(9) bug which could be easily > > > > triggered on systems with > 4GB memory. But I dont' know whether > > > > this is related with interrupt storms. > > > > > > > > > updating to 7.1 -BETA2 and still had some problems with it. > > > > > > > > > > I've opened the PR kern/128287[2] and Remko quickly answered > > > > > with a workaround: that workaround was removing USB support from > > > > > my kernel. I did it and re(4) wasn't sharing interrupts anylonger, > > > > > and the interrupt storms were gone. Now sometime later the interface > > > > > goes up and down from time to time, but less often. Also sometimes > > > > > the machine losts the network interface but continues to work. > > > > > > > > > > > > > It seems that your controller supports MSI so you can set a tunable > > > > hw.re.msi_disable to 0 to enable MSI. With MSI you can remove > > > > interrupt sharing(e.g. add hw.re.msi_disable="0" to > > > > /boot/loader.conf file.) However there were several issues on re(4) > > > > w.r.t MSI so it was off by default. > > > > > > This is undocumented and with sysctl -a i can't find the tunable. Is this > > > a HEAD feature or it's also in 7.1 -BETA2? Should i add > > > > Yeah it's an undocmented feature. But most drivers written by me > > have similar kobs. Both HEAD and stable/7 including 7.1 BETA2 have > > the tunable. > > I think it could be great if you could document it or at least > show it by default when you do sysctl -ad with a small description. > If MSI worked as expected I would have documented it as I did in msk(4)/nfe(4)/ale(4)/age(4)/jme(4) etc. Using MSI on RealTek does not seem to stable. I tried hard to fix that but some users still reported watchdog timeouts. Working without documentation and hardware also made it hard to complete the work. This was the main reason why MSI was disabled on re(4). > > > > > hw.re_msi_disable="0" to /boot/loader.conf? > > ^^^^^^^^^^^^^^^^^^^^^ > > Shoule be hw.re.msi_disable="0" > > > > > > > Yes, just add it to /boot/loader.conf. Note, you should not disable > > system-wide MSI control(e.g. hw.pci.enable_msi == 1). > > > > > This was sharing interrupt with USB, does USB need any special MSI handling > > > or with re using MSI is enough to not share the interrupt? > > > > If re(4) can use MSI, you don't need to worry about interrupt > > sharing with USB. Check the output of "vmstat -i". You normally get > > an irq256 or higher for MSI enabled driver. > > > > > > > > > > > > > > > > > I know it continues to work because some days later i can see that > > > > > it tried to deliver the status reports but was unable to resolve the > > > > > aliases hostnames. I can't ping the machine and i know the network > > > > > is OK. If i reboot the machine everything is working again. > > > > > > > > > > > > > Recently I've made small changes to re(4) which may help to detect > > > > link state change event. Would you try re(4) in HEAD? > > > > > > Can i just drop HEAD's /stable/7/sys/dev/re/ in -STABLE and test that > > > > Yes, you can. It should build without problems. Just replace re(4) on > > stable/7 with HEAD version. > > > > > or do i need to test the whole HEAD kernel? > > > > > > > No you don't have to that. > > Backporting the changes i've found that it didn't compile so in > the end i got from HEAD the following files: > > base/head/sys/dev/re/if_re.c > base/head/sys/pci/if_rl.c > base/head/sys/pci/if_rlreg.h > Ah,, sorry about that. Recently there was some changes. I forgot that. > After that i've recompiled 7.1 -BETA2 GENERIC kernel and enabled > the knob you suggested in /boot/loader.conf. > > With the new kernel and MSI the interrupts are like this: > > # vmstat -i > interrupt total rate > irq9: acpi0 1 0 > irq16: ohci0 1 0 > irq17: ohci1 ohci3 1 0 > irq18: ohci2 ohci4 1 0 > irq22: atapci0 19215 15 > cpu0: timer 2502718 1998 > irq256: re0 4967726 3967 > cpu1: timer 2502525 1998 > Total 9992188 7980 > > The high interrupt numbers are because i've been running iperf to > check everything it's fine, not because of interrupt storms. So far > i didn't find any interrupt storms related to USB or re(4) driver > but while doing the tests i've found this error: > > re0: watchdog timeout (missed Tx interrupts) -- recovering > > This didn't create any error on the interfaces (netstat -i). > This was triggered by new code in HEAD. It indicates re(4) missed Tx completion interrupt. It could be a bug in driver or hardware bug. If you can live with that message you can safely ignore that as now re(4) does not reinitialize the hardware if it detect missing Tx completion interrupt. > Also i didn't see any problem with interfaces going up and down, > but that usually happen after some hours of uptime, so i'll let > you know if the error happens again. > Ok. > As these seems to improve the current situation, is there any > chance of merging -current driver in 7.1 before release? > I think re(4) in HEAD needs more testing. As you might know RealTek produced too many chipsets. :-( -- Regards, Pyun YongHyeon